Replacing Sanger with Next Generation Sequencing to improve coverage and quality of reference DNA barcodes for plants

Authors Organisations
  • Michael Wilkinson(Author)
  • Claudio Szabo(Author)
    University of Adelaide
  • Caroline Ford(Author)
  • Yuval Yarom(Author)
    University of Adelaide
  • Adam Croxford(Author)
    University of Adelaide
  • Amanda Camp(Author)
    University of Adelaide
  • Paul Gooding(Author)
    University of Adelaide
Type Article
Original languageEnglish
Article number46040
JournalScientific Reports
Publication statusPublished - 12 Apr 2017
Permanent link
Show download statistics
View graph of relations
Citation formats


We estimate the global BOLD Systems database holds core DNA barcodes (rbcL + matK) for about 15% of land plant species and that comprehensive species coverage is still many decades away. Interim performance of the resource is compromised by variable sequence overlap and modest information content within each barcode. Our model predicts that the proportion of species-unique barcodes reduces as the database grows and that ‘false’ species-unique barcodes remain >5% until the database is almost complete. We conclude the current rbcL + matK barcode is unfit for purpose. Genome skimming and supplementary barcodes could improve diagnostic power but would slow new barcode acquisition. We therefore present two novel Next Generation Sequencing protocols (with freeware) capable of accurate, massively parallel de novo assembly of high quality DNA barcodes of >1400 bp. We explore how these capabilities could enhance species diagnosis in the coming decades


  • computational biology and bioinformatics, plant science