eggNOG v4.0Nested orthology inference across 3686 organisms

Authors Organisations
  • Sean Powell(Author)
    European Molecular Biology Laboratory
  • Kristoffer Forslund(Author)
    European Molecular Biology Laboratory
  • Damian Szklarczyk(Author)
    University of Zurich
  • Kalliopi Trachana(Author)
    Institute for Systems Biology
  • Alexander Roth(Author)
    University of Zurich
  • Jaime Huerta-Cepas(Author)
    Centre for Genomic Regulation (CRG)
  • Toni Gabaldón(Author)
  • Thomas Rattei(Author)
  • Chris Creevey(Author)
  • Michael Kuhn(Author)
    TU Dresden
  • Lars J. Jensen(Author)
    University of Zurich
  • Christian von Mering(Author)
    University of Zurich
  • Peer Bork(Author)
    European Molecular Biology Laboratory
    Max-Delbrück-Centre for Molecular Medicine
Type Article
Original languageEnglish
Pages (from-to)D231-D239
Number of pages9
JournalNucleic Acids Research
Volume42
Issue numberD1
Early online date01 Dec 2013
DOI
Publication statusPublished - 01 Jan 2014
Links
Permanent link
Show download statistics
View graph of relations
Citation formats

Abstract

With the increasing availability of various ‘omics data, high-quality orthology assignment is crucial for evolutionary and functional genomics studies. We here present the fourth version of the eggNOG database (available at http://eggnog.embl.de) that derives nonsupervised orthologous groups (NOGs) from complete genomes, and then applies a comprehensive characterization and analysis pipeline to the resulting gene families. Compared with the previous version, we have more than tripled the underlying species set to cover 3686 organisms, keeping track with genome project completions while prioritizing the inclusion of high-quality genomes to minimize error propagation from incomplete proteome sets. Major technological advances include (i) a robust and scalable procedure for the identification and inclusion of high-quality genomes, (ii) provision of orthologous groups for 107 different taxonomic levels compared with 41 in eggNOGv3, (iii) identification and annotation of particularly closely related orthologous groups, facilitating analysis of related gene families, (iv) improvements of the clustering and functional annotation approach, (v) adoption of a revised tree building procedure based on the multiple alignments generated during the process and (vi) implementation of quality control procedures throughout the entire pipeline. As in previous versions, eggNOGv4 provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation. Users can access the complete database of orthologous groups via a web interface, as well as through bulk download.

Keywords

  • Databases, Genetic, Genome, Genome, Microbial, Genomics, Internet, Molecular Sequence Annotation, Multigene Family, Phylogeny, Proteins, Sequence Homology, Amino Acid