UniProt蛋白质数据库简介

罗静初

期刊检索

关键词检索

新闻公告MORE

主管单位 工业和信息化部 主办单位 哈尔滨工业大学主编任南琪 国际刊号ISSN 1672-5565 国内刊号CN 23-1513/Q

期刊网站二维码

微信公众号二维码

引用本文:	罗静初.UniProt蛋白质数据库简介[J].生物信息学,2019,17(3):131-144.
	LUO Jingchu.A brief introduction to UniProt[J].Chinese Journal of Bioinformatics,2019,17(3):131-144.

【打印本页】【HTML】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

←前一篇|后一篇→

过刊浏览高级检索

本文已被：浏览 8489次下载 7809次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
UniProt蛋白质数据库简介
罗静初
(北京大学生命科学学院,北京100871)

摘要:

UniProt(https://www.uniprot.org/)是国际知名蛋白质数据库,主要包括UniProtKB知识库、UniParc归档库和UniRef参考序列集三部分。UniProtKB知识库是UniProt的核心,除蛋白质序列数据外,还包括大量注释信息。UniProtKB知识库分Swiss-Prot和TrEMBL两个子库。Swiss-Prot子库中50多万条序列均由人工审阅和注释,而TrEMBL子库中1.4亿多条序列是由核酸序列数据库EMBL中的蛋白质编码序列翻译所得,并由计算机根据一定规则进行注释。UniParc归档库将存放于不同数据库中的同一个蛋白质归并到一个记录中以避免冗余,并赋予序列唯一性特定标识符。UniRef参考序列集按相似性程度将UniProtKB和UniParc中的序列分为UniRef100、UniRef90和UniRef50三个数据集。UniProt网站为用户提供了高效实用的高级检索系统和大量帮助文档。UniProt数据库每4周发布新版的同时也发布统计报表,用户可通过统计报表了解该数据库的数据量及更新情况、数据类别和物种分布等基本信息,查看常规注释信息、序列特征注释信息和数据库交叉链接等统计数据。UniProt是目前国际上序列数据最完整、注释信息最丰富的非冗余蛋白质序列数据库,自本世纪初创建以来,为生命科学领域提供了宝贵资源。

关键词: 数据库蛋白质序列蛋白质功能数据库注释数据库交叉链接数据库高级检索

DOI：10.12113/j.issn.1672-5565.201903005

分类号:Q51;TP392

文献标识码:A

基金项目:

A brief introduction to UniProt

LUO Jingchu

(College of Life Sciences, Peking University, Beijing 100871, China)

Abstract:

The Universal Protein Resource (https://www.uniprot.org/, UniProt) is a well-known protein database, which consists of the UniProt knowledgebase (UniProtKB), the UniProt unique protein identifier archive (UniParc), and the UniProt reference sequence clusters (UniRef). Apart from protein sequence data, the UniProtKB has comprehensive annotations and is the core of the database. UniProtKB/Swiss-Prot has more than 500 thousand entries and is a manually reviewed and annotated subset of UniProtKB, while the UniProtKB/TrEMBL contains more than 140 million un-reviewed sequences which are translated from the coding sequences in the nucleotide database EMBL and computationally annotated based on certain rules. UniParc merges the same sequence stored in UniProtKB and other available protein sequence databases into a single record to avoid redundancy and gives each record a permanent and unique identifier. UniRef clusters the UniProtKB and the selected UniParc sequences into three different sets, i.e., UniRef100, UniRef90, and UniRef50, according to their sequence identity. The UniProt website provides users with an easy-to-use and highly efficient interface for advanced search and various help documents. The UniProt database releases statistics published online along with the update of the database every four weeks, which lists useful information such as the number of newly added and updated entries, the sequence types and their taxonomic sources, as well as general annotations, sequence features, and database cross-references. UniProt has been serving the user community of life sciences as the most-comprehensive, well-annotated, non-redundant, and freely-accessible resource of protein sequence and function since it was established at the beginning of this century.

Key words: Database Protein sequence Protein function Database annotation Database cross-reference Database query

期刊检索

关键词检索

新闻公告MORE

友情链接LINKS