Open Access
Open access
Genome Research, volume 27, issue 5, pages 787-792

Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm

Zimin Aleksey V. 1
Puiu Daniela 1
Luo Ming 2
Zhu Tingting 2
Koren Sergey 3
Marçais Guillaume 4
Yorke James A. 5
Dvorak Jan 2
Salzberg Steven L. 6
1
 
Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland 21205, USA.
2
 
Department of Plant Sciences, University of California, Davis, California, 95616, USA.
3
 
National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, 20892, USA
4
 
Institute for Physical Sciences and Technology, University of Maryland, College Park, Maryland 20742, USA.
5
 
2Institute for Physical Sciences and Technology, University of Maryland, College Park, Maryland 20742, USA
6
 
7Departments of Biomedical Engineering, Computer Science, and Biostatistics, Johns Hopkins University, Baltimore, Maryland 21218, USA
Publication typeJournal Article
Publication date2017-01-27
Journal: Genome Research
Quartile SCImago
Q1
Quartile WOS
Q1
Impact factor7
ISSN10889051, 15495469
Genetics
Genetics (clinical)
Abstract
Long sequencing reads generated by single-molecule sequencing technology offer the possibility of dramatically improving the contiguity of genome assemblies. The biggest challenge today is that long reads have relatively high error rates, currently around 15%. The high error rates make it difficult to use this data alone, particularly with highly repetitive plant genomes. Errors in the raw data can lead to insertion or deletion errors (indels) in the consensus genome sequence, which in turn create significant problems for downstream analysis; for example, a single indel may shift the reading frame and incorrectly truncate a protein sequence. Here, we describe an algorithm that solves the high error rate problem by combining long, high-error reads with shorter but much more accurate Illumina sequencing reads, whose error rates average <1%. Our hybrid assembly algorithm combines these two types of reads to construct mega-reads, which are both long and accurate, and then assembles the mega-reads using the CABOG assembler, which was designed for long reads. We apply this technique to a large data set of Illumina and PacBio sequences from the species Aegilops tauschii, a large and extremely repetitive plant genome that has resisted previous attempts at assembly. We show that the resulting assembled contigs are far larger than in any previous assembly, with an N50 contig size of 486,807 nucleotides. We compare the contigs to independently produced optical maps to evaluate their large-scale accuracy, and to a set of high-quality bacterial artificial chromosome (BAC)-based assemblies to evaluate base-level accuracy.

Citations by journals

2
4
6
8
10
12
G3: Genes, Genomes, Genetics
G3: Genes, Genomes, Genetics, 11, 3.29%
G3: Genes, Genomes, Genetics
11 publications, 3.29%
Scientific Reports
Scientific Reports, 10, 2.99%
Scientific Reports
10 publications, 2.99%
Frontiers in Genetics
Frontiers in Genetics, 9, 2.69%
Frontiers in Genetics
9 publications, 2.69%
GigaScience
GigaScience, 9, 2.69%
GigaScience
9 publications, 2.69%
Genome Biology and Evolution
Genome Biology and Evolution, 7, 2.1%
Genome Biology and Evolution
7 publications, 2.1%
Frontiers in Plant Science
Frontiers in Plant Science, 6, 1.8%
Frontiers in Plant Science
6 publications, 1.8%
BMC Genomics
BMC Genomics, 6, 1.8%
BMC Genomics
6 publications, 1.8%
Microbiology Resource Announcements
Microbiology Resource Announcements, 5, 1.5%
Microbiology Resource Announcements
5 publications, 1.5%
Briefings in Bioinformatics
Briefings in Bioinformatics, 5, 1.5%
Briefings in Bioinformatics
5 publications, 1.5%
Scientific data
Scientific data, 4, 1.2%
Scientific data
4 publications, 1.2%
iScience
iScience, 4, 1.2%
iScience
4 publications, 1.2%
Molecular Ecology Resources
Molecular Ecology Resources, 4, 1.2%
Molecular Ecology Resources
4 publications, 1.2%
Bioinformatics
Bioinformatics, 4, 1.2%
Bioinformatics
4 publications, 1.2%
BMC Biology
BMC Biology, 3, 0.9%
BMC Biology
3 publications, 0.9%
Nature Communications
Nature Communications, 3, 0.9%
Nature Communications
3 publications, 0.9%
Nature
Nature, 3, 0.9%
Nature
3 publications, 0.9%
PLoS Computational Biology
PLoS Computational Biology, 3, 0.9%
PLoS Computational Biology
3 publications, 0.9%
Genomics
Genomics, 3, 0.9%
Genomics
3 publications, 0.9%
Current Biology
Current Biology, 3, 0.9%
Current Biology
3 publications, 0.9%
DNA Research
DNA Research, 3, 0.9%
DNA Research
3 publications, 0.9%
mSystems
mSystems, 3, 0.9%
mSystems
3 publications, 0.9%
Genome Research
Genome Research, 3, 0.9%
Genome Research
3 publications, 0.9%
Compendium of Plant Genomes
Compendium of Plant Genomes, 3, 0.9%
Compendium of Plant Genomes
3 publications, 0.9%
F1000Research
F1000Research, 2, 0.6%
F1000Research
2 publications, 0.6%
Journal of Bioinformatics and Computational Biology
Journal of Bioinformatics and Computational Biology, 2, 0.6%
Journal of Bioinformatics and Computational Biology
2 publications, 0.6%
Animals
Animals, 2, 0.6%
Animals
2 publications, 0.6%
International Journal of Molecular Sciences
International Journal of Molecular Sciences, 2, 0.6%
International Journal of Molecular Sciences
2 publications, 0.6%
Cells
Cells, 2, 0.6%
Cells
2 publications, 0.6%
Theoretical And Applied Genetics
Theoretical And Applied Genetics, 2, 0.6%
Theoretical And Applied Genetics
2 publications, 0.6%
2
4
6
8
10
12

Citations by publishers

10
20
30
40
50
60
Springer Nature
Springer Nature, 57, 17.07%
Springer Nature
57 publications, 17.07%
Oxford University Press
Oxford University Press, 34, 10.18%
Oxford University Press
34 publications, 10.18%
Elsevier
Elsevier, 28, 8.38%
Elsevier
28 publications, 8.38%
Frontiers Media S.A.
Frontiers Media S.A., 20, 5.99%
Frontiers Media S.A.
20 publications, 5.99%
Multidisciplinary Digital Publishing Institute (MDPI)
Multidisciplinary Digital Publishing Institute (MDPI), 16, 4.79%
Multidisciplinary Digital Publishing Institute (MDPI)
16 publications, 4.79%
Wiley
Wiley, 15, 4.49%
Wiley
15 publications, 4.49%
American Society for Microbiology
American Society for Microbiology, 13, 3.89%
American Society for Microbiology
13 publications, 3.89%
Genetics Society of America
Genetics Society of America, 13, 3.89%
Genetics Society of America
13 publications, 3.89%
Public Library of Science (PLoS)
Public Library of Science (PLoS), 6, 1.8%
Public Library of Science (PLoS)
6 publications, 1.8%
Cold Spring Harbor Laboratory
Cold Spring Harbor Laboratory, 3, 0.9%
Cold Spring Harbor Laboratory
3 publications, 0.9%
F1000 Research
F1000 Research, 2, 0.6%
F1000 Research
2 publications, 0.6%
Microbiology Society
Microbiology Society, 2, 0.6%
Microbiology Society
2 publications, 0.6%
World Scientific
World Scientific, 2, 0.6%
World Scientific
2 publications, 0.6%
Pleiades Publishing
Pleiades Publishing, 2, 0.6%
Pleiades Publishing
2 publications, 0.6%
American Association for the Advancement of Science (AAAS)
American Association for the Advancement of Science (AAAS), 2, 0.6%
American Association for the Advancement of Science (AAAS)
2 publications, 0.6%
Proceedings of the National Academy of Sciences (PNAS)
Proceedings of the National Academy of Sciences (PNAS), 2, 0.6%
Proceedings of the National Academy of Sciences (PNAS)
2 publications, 0.6%
eLife Sciences Publications
eLife Sciences Publications, 2, 0.6%
eLife Sciences Publications
2 publications, 0.6%
Rockefeller University Press
Rockefeller University Press, 2, 0.6%
Rockefeller University Press
2 publications, 0.6%
University of Chicago Press
University of Chicago Press, 1, 0.3%
University of Chicago Press
1 publication, 0.3%
Mary Ann Liebert
Mary Ann Liebert, 1, 0.3%
Mary Ann Liebert
1 publication, 0.3%
The Royal Society
The Royal Society, 1, 0.3%
The Royal Society
1 publication, 0.3%
Korean Society of Food Science and Technology, 1, 0.3%
Korean Society of Food Science and Technology
1 publication, 0.3%
KeAi Communications Co.
KeAi Communications Co., 1, 0.3%
KeAi Communications Co.
1 publication, 0.3%
American Society of Plant Biologists
American Society of Plant Biologists, 1, 0.3%
American Society of Plant Biologists
1 publication, 0.3%
Peer Community In, 1, 0.3%
Peer Community In
1 publication, 0.3%
Annual Reviews
Annual Reviews, 1, 0.3%
Annual Reviews
1 publication, 0.3%
Spandidos Publications
Spandidos Publications, 1, 0.3%
Spandidos Publications
1 publication, 0.3%
American Phytopathological Society
American Phytopathological Society, 1, 0.3%
American Phytopathological Society
1 publication, 0.3%
Taylor & Francis
Taylor & Francis, 1, 0.3%
Taylor & Francis
1 publication, 0.3%
10
20
30
40
50
60
  • We do not take into account publications that without a DOI.
  • Statistics recalculated only for publications connected to researchers, organizations and labs registered on the platform.
  • Statistics recalculated weekly.
Metrics
Share
Cite this
GOST |
Cite this
GOST Copy
Zimin A. V. et al. Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm // Genome Research. 2017. Vol. 27. No. 5. pp. 787-792.
GOST all authors (up to 50) Copy
Zimin A. V., Puiu D., Luo M., Zhu T., Koren S., Marçais G., Yorke J. A., Dvorak J., Salzberg S. L. Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm // Genome Research. 2017. Vol. 27. No. 5. pp. 787-792.
RIS |
Cite this
RIS Copy
TY - JOUR
DO - 10.1101/gr.213405.116
UR - https://doi.org/10.1101%2Fgr.213405.116
TI - Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm
T2 - Genome Research
AU - Zimin, Aleksey V.
AU - Puiu, Daniela
AU - Luo, Ming
AU - Zhu, Tingting
AU - Koren, Sergey
AU - Marçais, Guillaume
AU - Yorke, James A.
AU - Dvorak, Jan
AU - Salzberg, Steven L.
PY - 2017
DA - 2017/01/27 00:00:00
PB - Cold Spring Harbor Laboratory
SP - 787-792
IS - 5
VL - 27
PMID - 28130360
SN - 1088-9051
SN - 1549-5469
ER -
BibTex |
Cite this
BibTex Copy
@article{2017_Zimin
author = {Aleksey V. Zimin and Daniela Puiu and Ming Luo and Tingting Zhu and Sergey Koren and Guillaume Marçais and James A. Yorke and Jan Dvorak and Steven L. Salzberg},
title = {Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm},
journal = {Genome Research},
year = {2017},
volume = {27},
publisher = {Cold Spring Harbor Laboratory},
month = {jan},
url = {https://doi.org/10.1101%2Fgr.213405.116},
number = {5},
pages = {787--792},
doi = {10.1101/gr.213405.116}
}
MLA
Cite this
MLA Copy
Zimin, Aleksey V., et al. “Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm.” Genome Research, vol. 27, no. 5, Jan. 2017, pp. 787-792. https://doi.org/10.1101%2Fgr.213405.116.
Found error?