Computational recovery of enzyme haplotypes from a metagenome

Authors Organisations
Type

Student thesis: Doctoral ThesisDoctor of Philosophy

Original languageEnglish
Awarding Institution
Supervisors/Advisors
Award date2018
Links
Show download statistics
View graph of relations

Abstract

Population-level diversity of microbial communities (microbiomes) represent a biotechnological resource for biomining, biorefining and synthetic biology; but industrial exploitation of enzymes responsible for catalyzing reactions of interest requires the recovery of the exact DNA sequences (or “haplotypes”) that encode the genes. However, haplotype reconstruction is an extremely difficult
computational problem, further complicated by the infancy of techniques for the handling of environmental sequencing data (metagenomics). Current haplotyping approaches cannot choose between alternative haplotype reconstructions and fail to provide biological evidence of correct predictions.
Additionally, there is no philosophical framework under which we can consider the variation of genes within a microbial community, such as those that encode isoforms of enzymes of interest to us.

To address this, my thesis proposes the “metahaplome” as a definition for the set of haplotypes for a genomic region of interest within a microbial community. This work will offer the first formalisation of the problem of recovering haplotypes from a metagenomic data set, and present Hansel and Gretel: a novel probabilistic framework that reconstructs the most likely haplotypes from complex microbiomes. The framework is robust to sequencing error and uses all available evidence from aligned reads, without altering or discarding observed variation.

The approach is verified with multiple in silico experiments, including two de facto data sets that are currently used to benchmark algorithms for the recovery of viral quasispecies, and strain identification. With long-read sequencing, this thesis will demonstrate in vitro verification of the approach, presenting the first biologically validated method for the recovery of haplotypes from a microbial community. Finally, I will introduce the “Rumen Landscape” pilot study to demonstrate the sort of research questions and novel biological insight that can be obtained through exploration of the metahaplome.