Assembling a Colossus

The loblolly pine genome is big. Bloated with retrotransposons and other repetitive sequences, it is seven times larger than the human genome and easily big enough to overwhelm standard genome assembly methods.

This forced the loblolly pine genome sequencing team, led by David Neale at the University of California, Davis, to look for ways to reduce the enormous complexity of their task.
The draft genome sequence, described in the latest issue of GENETICS and the journal Genome Biology, was pieced together from over 16 billion sequence reads. Spanning around 23 billion base pairs, it only just beats out the Norway spruce as the largest genome ever sequenced, but it is substantially more complete. For example, the N50 scaffold size of the current loblolly assembly is 66.9 Kbp, compared to 0.72 Kbp in the Norway spruce.

So how did they do it?

One strategy was to generate most of the sequence from part of a single pine nut. This tiny source material was the megagametophyte, which is the haploid tissue that provides nutrients to the developing diploid embryo. Despite the limited amount of DNA that can be extracted from this source, the reduced complexity of a haploid genome makes it easier to assemble. To link up all the sequence fragments from the haploid genome, the team also created DNA libraries from diploid needles of the parent genotype.

But this still left the assembly team, led by Steven Salzberg at Johns Hopkins University and James Yorke at the University of Maryland, with more data than their computational methods could handle.

The solution was a method of pre-processing the data into “super reads”, or larger chunks of contiguous haploid sequence that condensed many individual reads. In essence, they were dealing with the unambiguous parts of the problem first, and getting rid a huge amount of overlapping and redundant data in the process.

The result was a 100-fold reduction in the amount of megagametophyte sequence that needed to be held in the memory of the assembly computer. That kind of reduction is not just handy for giant genomes; Salzberg says it also speeds up projects of more modest scale.

Luckily, says Salzberg, the loblolly genome project wasn’t held back by the masses of repeats that are typical of conifers. Even though around 82% of the loblolly pine genome is repetitive, it turns out that most of the repeats are evolutionarily ancient. That means they have diverged enough to no longer be a big stumbling block for assembly.

All this is good news for sequencing other conifer species, especially since the team is already tackling an even larger behemoth: the 35 gigabase genome of the sugar pine.

Check out the loblolly genome articles and other highlights of this month’s GENETICS.

Zimin A., Stevens K.A., Crepeau M.W., Holtz-Morris A., Koriabine M., Marcais G., Puiu D., Roberts M., Wegrzyn J.L. & de Jong P.J. & (2014). Sequencing and Assembly of the 22-Gb Loblolly Pine Genome, Genetics, 196 (3) 875-890. DOI: 10.1534/genetics.113.159715

Wegrzyn J.L., Liechty J.D., Stevens K.A., Wu L.S., Loopstra C.A., Vasquez-Gross H.A., Dougherty W.M., Lin B.Y., Zieve J.J. & Martinez-Garcia P.J. & (2014). Unique Features of the Loblolly Pine (Pinus taeda L.) Megagenome Revealed Through Sequence Annotation, Genetics, 196 (3) 891-909. DOI: 10.1534/genetics.113.159996

Neale D.B., Wegrzyn J.L., Stevens K.A., Zimin A.V., Puiu D., Crepeau M.W., Cardeno C., Koriabine M., Holtz-Morris A.E. & Liechty J.D. & (2014). Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies, Genome Biology, 15 (3) R59. DOI: 10.1186/gb-2014-15-3-r59

Conservation Biology, Forestry, Genomics, Sequencing, Transposable Elements

Cristy Gelling is a science writer, lapsed yeast geneticist, and former Communications Director at the GSA.

View all posts by Cristy Gelling »

Early Career Leadership Spotlight: Julio Molina Pineda

We’re taking time to get to know the members of the GSA’s Early Career Scientist Committees. Join us to learn more about our early career scientist advocates. Julio Molina Pineda Policy and Advocacy University of Arkansas Research Interest My research interests focus on using model organisms to genetically dissect complex traits related to human disease. My…
Early Career Leadership Spotlight: Peiwei Chen

We’re taking time to get to know the members of the GSA’s Early Career Scientist Committees. Join us to learn more about our early career scientist advocates. Peiwei Chen Accessibility Subcommittee California Institute of Technology Research Interest Far from a harmonious place, the genome is a battleground, where every bit of DNA fights for inheritance and…
#Dros23 GSA Poster Award winners

We are pleased to announce the GSA Poster Award winners from the 64th Annual Drosophila Research Conference! Undergraduate and graduate student members of the GSA were eligible for the awards, and a hard-working team of postdocs volunteered their time as judges. Congratulations to all! Undergraduate Students 1st Place: Sofia Karter Lopez, University of Toronto “Rab11 mediates E-cadherin recycling during…
Congratulations to the Fall 2022 DeLill Nasser Awardees!

GSA is pleased to announce the recipients of the DeLill Nasser Award for Professional Development in Genetics for Fall 2022! Given twice a year to graduate students and postdoctoral researchers, DeLill Nasser Awards support attendance at meetings and laboratory courses. The award is named in honor of DeLill Nasser, a long-time GSA supporter and National Science Foundation…
New editors join GENETICS, G3 editorial boards

Several new editors are joining the GSA Journals. We’re excited to welcome Ricardo Zayas to the GENETICS editorial board under the Molecular Genetics of Development section, and on the G3: Genes|Genomes|Genetics board, we welcome Polly Campbell, Kevin Vogel, Joe Parker, and Ricardo Mallarino. Ricardo Zayas Associate Editor Ricardo Zayas is a Professor of Biology at…
Worms and Flies Provide Key Clues to Medical Mystery

This article is part of a series of posts outlining the history and impact of research in experimental organisms. The series is developed in collaboration with the GSA Public Communications and Engagement Committee. By the time Bertrand Might was six months old, it was clear something was amiss. His muscles weren’t developing normally; he was…
Congratulations to the 2023 Early Career Leadership Program Cohort!

The Genetics Society of America (GSA) is excited to announce the latest cohort of student, postdoc, and early-career research leaders joining the Early Career Leadership Program. Participants receive training and mentoring while serving on committees charged with understanding the needs, interests, concerns, and challenges of early career scientist members of the GSA. As part of…
GSA LOCI: Local Outreach Community Initiatives @ GSA Conferences

Highlights: Local Outreach Community Initiatives (LOCI): The Genetics Society of America is committed to supporting the communities of the host cities of our conferences. This new year, we are excited to reconnect with our GSA community in meaningful ways within and beyond our existing programming. The GSA membership has created a caring and supportive environment…
New members of the GSA Board of Directors: 2023–2025

We are pleased to announce the election of five new leaders to the GSA Board of Directors: 2023 Vice President/2024 President Mariana Wolfner Distinguished Professor of Molecular Biology and Genetics and Stephen H. Weiss Presidential Fellow My research has focused on the genes and pathways that mediate sexual development and reproduction, primarily in Drosophila. From…
Lance David Miller: Lighting Your Own Fire by Finding the Right Resources

By Daniel J. Gironda In the Paths to Science Policy series, we talk to individuals who have a passion for science policy and are active in advocacy through their various roles and careers. The series aims to inform and guide early career scientists interested in science policy. This series is brought to you by the…
Graça Almeida-Porada: The Importance of Communication in a Technologically Advancing World

By Daniel J. Gironda In the Paths to Science Policy series, we talk to individuals who have a passion for science policy and are active in advocacy through their various roles and careers. The series aims to inform and guide early career scientists interested in science policy. This series is brought to you by the…