Inadvertent paralog inclusion drives artefactual topologies and timetree estimates in phylogenomics

Standard

Inadvertent paralog inclusion drives artefactual topologies and timetree estimates in phylogenomics. / Siu Ting, Karen; Torres-Sánchez, María ; San Mauro, Diego; Wilcockson, David; Wilkinson, Mark; Pisani, Davide; O'Connell, Mary J.; Creevey, Christopher.

In: Molecular Biology and Evolution, Vol. 36, No. 6, 01.06.2019, p. 1344-1356.

Research output: Contribution to journalArticle

Harvard

Siu Ting, K, Torres-Sánchez, M, San Mauro, D, Wilcockson, D, Wilkinson, M, Pisani, D, O'Connell, MJ & Creevey, C 2019, 'Inadvertent paralog inclusion drives artefactual topologies and timetree estimates in phylogenomics' Molecular Biology and Evolution, vol. 36, no. 6, pp. 1344-1356. https://doi.org/10.1093/molbev/msz067

APA

Siu Ting, K., Torres-Sánchez, M., San Mauro, D., Wilcockson, D., Wilkinson, M., Pisani, D., ... Creevey, C. (2019). Inadvertent paralog inclusion drives artefactual topologies and timetree estimates in phylogenomics. Molecular Biology and Evolution, 36(6), 1344-1356. https://doi.org/10.1093/molbev/msz067

Vancouver

Siu Ting K, Torres-Sánchez M, San Mauro D, Wilcockson D, Wilkinson M, Pisani D et al. Inadvertent paralog inclusion drives artefactual topologies and timetree estimates in phylogenomics. Molecular Biology and Evolution. 2019 Jun 1;36(6):1344-1356. https://doi.org/10.1093/molbev/msz067

Author

Siu Ting, Karen ; Torres-Sánchez, María ; San Mauro, Diego ; Wilcockson, David ; Wilkinson, Mark ; Pisani, Davide ; O'Connell, Mary J. ; Creevey, Christopher. / Inadvertent paralog inclusion drives artefactual topologies and timetree estimates in phylogenomics. In: Molecular Biology and Evolution. 2019 ; Vol. 36, No. 6. pp. 1344-1356.

Bibtex - Download

@article{585f5a3efdd343459225bd45fa718d04,
title = "Inadvertent paralog inclusion drives artefactual topologies and timetree estimates in phylogenomics",
abstract = "Increasingly, large phylogenomic datasets include transcriptomic data from non-model organisms. This has allowed controversial and unexplored evolutionary relationships in the tree of life to be addressed but also increases the risk of inadvertent inclusion of paralogs in the analysis. While this may be expected to result in decreased phylogenetic support it is not clear if it could also drive highly supported artefactual relationships. Many groups, including the hyper-diverse Lissamphibia, are especially susceptible to these issues due to ancient gene duplication events, small numbers of sequenced genomes and because transcriptomes are increasingly applied to resolve historically conflicting taxonomic hypotheses. We tested the potential impact of paralog inclusion on the topologies and timetree estimates of the Lissamphibia using published and de novo sequencing data including 18 amphibian species, from which 2,656 single-copy gene families were identified. A novel paralog filtering approach resulted in four differently curated datasets, which were used for phylogenetic reconstructions using Bayesian inference, maximum likelihood and quartet-based supertrees. We found that paralogs drive strongly supported conflicting hypotheses within the Lissamphibia (Batrachia and Procera) and older divergence time estimates even within groups where no variation in topology was observed. All investigated methods, except Bayesian inference with the CAT-GTR model, were found to be sensitive to paralogs, but with filtering convergence to the same answer (Batrachia) was observed. This is the first large-scale study to address the impact of orthology selection using transcriptomic data and emphasises the importance of quality over quantity particularly for understanding relationships of poorly sampled taxa",
keywords = "phylogenomics, orthology, paralogy, lissamphibia, timetree",
author = "{Siu Ting}, Karen and Mar{\'i}a Torres-S{\'a}nchez and {San Mauro}, Diego and David Wilcockson and Mark Wilkinson and Davide Pisani and O'Connell, {Mary J.} and Christopher Creevey",
year = "2019",
month = "6",
day = "1",
doi = "10.1093/molbev/msz067",
language = "English",
volume = "36",
pages = "1344--1356",
journal = "Molecular Biology and Evolution",
issn = "0737-4038",
publisher = "Oxford University Press",
number = "6",

}

RIS (suitable for import to EndNote) - Download

TY - JOUR

T1 - Inadvertent paralog inclusion drives artefactual topologies and timetree estimates in phylogenomics

AU - Siu Ting, Karen

AU - Torres-Sánchez, María

AU - San Mauro, Diego

AU - Wilcockson, David

AU - Wilkinson, Mark

AU - Pisani, Davide

AU - O'Connell, Mary J.

AU - Creevey, Christopher

PY - 2019/6/1

Y1 - 2019/6/1

N2 - Increasingly, large phylogenomic datasets include transcriptomic data from non-model organisms. This has allowed controversial and unexplored evolutionary relationships in the tree of life to be addressed but also increases the risk of inadvertent inclusion of paralogs in the analysis. While this may be expected to result in decreased phylogenetic support it is not clear if it could also drive highly supported artefactual relationships. Many groups, including the hyper-diverse Lissamphibia, are especially susceptible to these issues due to ancient gene duplication events, small numbers of sequenced genomes and because transcriptomes are increasingly applied to resolve historically conflicting taxonomic hypotheses. We tested the potential impact of paralog inclusion on the topologies and timetree estimates of the Lissamphibia using published and de novo sequencing data including 18 amphibian species, from which 2,656 single-copy gene families were identified. A novel paralog filtering approach resulted in four differently curated datasets, which were used for phylogenetic reconstructions using Bayesian inference, maximum likelihood and quartet-based supertrees. We found that paralogs drive strongly supported conflicting hypotheses within the Lissamphibia (Batrachia and Procera) and older divergence time estimates even within groups where no variation in topology was observed. All investigated methods, except Bayesian inference with the CAT-GTR model, were found to be sensitive to paralogs, but with filtering convergence to the same answer (Batrachia) was observed. This is the first large-scale study to address the impact of orthology selection using transcriptomic data and emphasises the importance of quality over quantity particularly for understanding relationships of poorly sampled taxa

AB - Increasingly, large phylogenomic datasets include transcriptomic data from non-model organisms. This has allowed controversial and unexplored evolutionary relationships in the tree of life to be addressed but also increases the risk of inadvertent inclusion of paralogs in the analysis. While this may be expected to result in decreased phylogenetic support it is not clear if it could also drive highly supported artefactual relationships. Many groups, including the hyper-diverse Lissamphibia, are especially susceptible to these issues due to ancient gene duplication events, small numbers of sequenced genomes and because transcriptomes are increasingly applied to resolve historically conflicting taxonomic hypotheses. We tested the potential impact of paralog inclusion on the topologies and timetree estimates of the Lissamphibia using published and de novo sequencing data including 18 amphibian species, from which 2,656 single-copy gene families were identified. A novel paralog filtering approach resulted in four differently curated datasets, which were used for phylogenetic reconstructions using Bayesian inference, maximum likelihood and quartet-based supertrees. We found that paralogs drive strongly supported conflicting hypotheses within the Lissamphibia (Batrachia and Procera) and older divergence time estimates even within groups where no variation in topology was observed. All investigated methods, except Bayesian inference with the CAT-GTR model, were found to be sensitive to paralogs, but with filtering convergence to the same answer (Batrachia) was observed. This is the first large-scale study to address the impact of orthology selection using transcriptomic data and emphasises the importance of quality over quantity particularly for understanding relationships of poorly sampled taxa

KW - phylogenomics

KW - orthology

KW - paralogy

KW - lissamphibia

KW - timetree

U2 - 10.1093/molbev/msz067

DO - 10.1093/molbev/msz067

M3 - Article

VL - 36

SP - 1344

EP - 1356

JO - Molecular Biology and Evolution

JF - Molecular Biology and Evolution

SN - 0737-4038

IS - 6

ER -

Show download statistics
View graph of relations
Citation formats