Fecal alchemy: Turning poop into genomics gold

When it comes to genotyping technology, poop genetics is stuck in the 1990s. While most geneticists are now awash in genome-scale data from thousands of individuals, those who depend on fecal and other non-invasively collected samples still rely on old-school, boutique panels of a dozen or so genetic markers.

But feces — along with fur, feathers, and urine — is critically important stuff for understanding the population genetics, ecology, evolution, behavior, and conservation of wild animals. Many are too elusive or endangered to allow collection of blood samples, and even for common species it is a logistical nightmare to immobilize and draw blood from large numbers of animals in the field. In the latest issue of GENETICS, Snyder-Mackler et al. describe tools that promise to advance studies of such samples into the genomic era.

Patrick Chiyo collecting noninvasive samples from elephants in Amboseli National Park. Photo courtesy Jenny Tung.

Noninvasively collected samples have the obvious advantage of easy access. “We have freezers and freezers full of baboon poop,” says study co-leader Jenny Tung (Duke University). Tung’s group works on behavior and genetics in a wild baboon population in Kenya. But though abundant, poop also presents serious challenges for standard genetic analysis. The DNA present in noninvasive samples is typically a fragmented mixture of host and contaminant sequence. For example, only around 1% of the DNA in a fecal sample comes from the animal that produced the poop. Most of the rest is microbial.

These limitations were first overcome in the 1980s and 1990s, and the ability to analyze DNA from noninvasive samples revolutionized the field. Using such samples not only allowed geneticists to understand the genetic diversity and viability of endangered animals, it allowed them to empirically test important theories about animal behavior and evolution.

“There are many examples. Noninvasive sampling of chimps, baboons, rhesus macaques and other primates revealed that animals really do bias their behavior towards relatives, even paternal relatives that are likely more difficult for an individual to identify as kin,” says Tung. “And in baboons, it also showed that males provide some paternal care to their offspring, which wasn’t expected for a polygamous primate.”

But the genotyping methods used in such studies have changed surprisingly little over the last twenty years. For the most part, researchers still use small groups of carefully validated markers, usually based on stretches of short tandem repeat sequences (microsatellites). This means the field has mostly missed out on the benefits of genomics that have become routine for medical researchers and those who work with laboratory organisms.

“Microsatellite approaches still work. But over the last 5 or 10 years it has become impossible to ignore the way genome-scale datasets allow you to answer entirely different questions,” says Tung.

For example, data on how a genome varies across a population can provide crucial evidence of the evolutionary and demographic forces that have shaped it. Genomic data can also trace in detail the mergers and separations of mixing populations.

Vet, a female yellow baboon, and her children in Amboseli National Park. Photo courtesy of Susan Alberts.

Vet, a female yellow baboon, and her children in Amboseli National Park. Photo courtesy Susan Alberts.

The good news for poop genomics is that short-read next-generation sequencing methods are well suited to the fragmented DNA found in noninvasive samples. These methods have been famously adapted for analyzing a sample type that also suffers from vanishingly small amounts of target sequence: ancient DNA. The bad news is that the expensive, intensive approaches that work well for a precious sample of Neanderthal bone are not practical for a geneticist facing a freezer full of poop.

About six years ago, Tung’s friend and colleague George (PJ) Perry published a major advance that allowed large-scale resequencing from noninvasive samples. It was based on a method known as sequence capture, which enriches for host sequence using synthetic RNA “baits” to capture the target DNA. Tung was excited by the possibilities of the methods, but realized it was still too expensive for most applications. This was partly because the baits had to be custom-designed and synthesized for the species of interest. The method also had the drawback of only capturing a tiny fraction of the genome, while consuming large amounts of sample.

“Even fecal samples are exhaustible,” says Tung. “We have a lot of irreplaceable samples from dead animals, for instance. If we’re going to use them up, we want to cover all our bases and gather data on a truly genome-wide scale.”

So Tung’s group and their collaborators worked to modify and scale up Perry’s protocol. They also constructed the baits in a considerably cheaper way, using in vitro transcription of RNA from baboon DNA templates, sidestepping the need for custom synthesis. The new protocol had more modest input DNA requirements and could enrich the target DNA by 40-fold.

But getting enough sequence per sample was just the beginning. Xiang Zhou (University of Michigan) led the group’s efforts to develop tools to analyze data from the new method. Zhou says one of the reasons microsatellites became so popular was the availability of standard and easy-to-use software for assigning paternity from the data. “If people are going to transition to a new method, we thought it would be incredibly important that we package our models into software that will make it as easy as possible,” says Zhou.

But to develop something comparable for low-coverage sequence, the team faced two major challenges: the data is simultaneously much richer (more sequence) and much lower quality (more uncertainty). To deal with the large quantity of data they needed much more computationally efficient algorithms. They also had to factor in the lower data quality, which makes it impossible to use the simpler approaches that work when the genotype at each site is known with certainty. Instead, they incorporated the error rate across all the sites in the genome, generating a sophisticated statistical model.

One of (several) freezers in the Tung lab containing boxes of fecal samples. Photo courtesy Jenny Tung.

Using the new capture method and the paternity assignment software (called WHODAD), the team were able to construct pedigrees from baboon fecal samples that almost perfectly matched those created using traditional analysis of high-quality DNA from blood. In short, despite the low coverage of the genome (typically less than 1x), and the resulting very high uncertainty of the genotype at any one site, the trends in the data were more than enough to reconstruct family relationships.

But what about cost? Lead author Noah Snyder-Mackler gave the project the pet name “fecal alchemy” because it aims to transform poop into a data goldmine. But not every researcher can afford gold — most labs must use the cheapest tool that will get the job done. Tung says they included a cost analysis in the paper because they are regularly asked about the price of making the switch.

“Right now it costs about twice as much to produce 1x coverage of the entire baboon genome as it does to type 14 microsatellites. But the amount of information you get is much greater! So if you’re thinking in terms of cost per genotype, our method is way more cost effective. But in terms of absolute amounts it’s more expensive. In the end the cost-benefit decision depends on what questions you’re trying to answer,” says Tung. “Of course we’d like to get it even cheaper and more efficient and more robust. We’re working on it!”

FUNDING

This work was partly funded by the National Science Foundation DEB through an EAGER grant, with co-funding from NSF Biological Anthropology.

CITATION

Noah Snyder-Mackler, William H. Majoros, Michael L. Yuan, Amanda O. Shaver, Jacob B. Gordon, Gisela H. Kopp, Stephen A. Schlebusch, Jeffrey D. Wall,Susan C. Alberts, Sayan Mukherjee, Xiang Zhou, Jenny Tung (2016). Efficient Genome-Wide Sequencing and Low-Coverage Pedigree Analysis from Noninvasively Collected Samples. Genetics, 203(2), 699-714.

http://www.genetics.org/content/203/2/699

DOI: 10.1534/genetics.116.187492

Behavior, Ecology, Evolution, Genetics Journal, Genomics, Population Genetics, Primates, Sequencing, Wildlife

Cristy Gelling is a science writer, lapsed yeast geneticist, and former Communications Director at the GSA.

View all posts by Cristy Gelling »

Early Career Leadership Spotlight: Julio Molina Pineda

We’re taking time to get to know the members of the GSA’s Early Career Scientist Committees. Join us to learn more about our early career scientist advocates. Julio Molina Pineda Policy and Advocacy University of Arkansas Research Interest My research interests focus on using model organisms to genetically dissect complex traits related to human disease. My…
Early Career Leadership Spotlight: Peiwei Chen

We’re taking time to get to know the members of the GSA’s Early Career Scientist Committees. Join us to learn more about our early career scientist advocates. Peiwei Chen Accessibility Subcommittee California Institute of Technology Research Interest Far from a harmonious place, the genome is a battleground, where every bit of DNA fights for inheritance and…
#Dros23 GSA Poster Award winners

We are pleased to announce the GSA Poster Award winners from the 64th Annual Drosophila Research Conference! Undergraduate and graduate student members of the GSA were eligible for the awards, and a hard-working team of postdocs volunteered their time as judges. Congratulations to all! Undergraduate Students 1st Place: Sofia Karter Lopez, University of Toronto “Rab11 mediates E-cadherin recycling during…
Congratulations to the Fall 2022 DeLill Nasser Awardees!

GSA is pleased to announce the recipients of the DeLill Nasser Award for Professional Development in Genetics for Fall 2022! Given twice a year to graduate students and postdoctoral researchers, DeLill Nasser Awards support attendance at meetings and laboratory courses. The award is named in honor of DeLill Nasser, a long-time GSA supporter and National Science Foundation…
New editors join GENETICS, G3 editorial boards

Several new editors are joining the GSA Journals. We’re excited to welcome Ricardo Zayas to the GENETICS editorial board under the Molecular Genetics of Development section, and on the G3: Genes|Genomes|Genetics board, we welcome Polly Campbell, Kevin Vogel, Joe Parker, and Ricardo Mallarino. Ricardo Zayas Associate Editor Ricardo Zayas is a Professor of Biology at…
Worms and Flies Provide Key Clues to Medical Mystery

This article is part of a series of posts outlining the history and impact of research in experimental organisms. The series is developed in collaboration with the GSA Public Communications and Engagement Committee. By the time Bertrand Might was six months old, it was clear something was amiss. His muscles weren’t developing normally; he was…
Congratulations to the 2023 Early Career Leadership Program Cohort!

The Genetics Society of America (GSA) is excited to announce the latest cohort of student, postdoc, and early-career research leaders joining the Early Career Leadership Program. Participants receive training and mentoring while serving on committees charged with understanding the needs, interests, concerns, and challenges of early career scientist members of the GSA. As part of…
GSA LOCI: Local Outreach Community Initiatives @ GSA Conferences

Highlights: Local Outreach Community Initiatives (LOCI): The Genetics Society of America is committed to supporting the communities of the host cities of our conferences. This new year, we are excited to reconnect with our GSA community in meaningful ways within and beyond our existing programming. The GSA membership has created a caring and supportive environment…
New members of the GSA Board of Directors: 2023–2025

We are pleased to announce the election of five new leaders to the GSA Board of Directors: 2023 Vice President/2024 President Mariana Wolfner Distinguished Professor of Molecular Biology and Genetics and Stephen H. Weiss Presidential Fellow My research has focused on the genes and pathways that mediate sexual development and reproduction, primarily in Drosophila. From…
Lance David Miller: Lighting Your Own Fire by Finding the Right Resources

By Daniel J. Gironda In the Paths to Science Policy series, we talk to individuals who have a passion for science policy and are active in advocacy through their various roles and careers. The series aims to inform and guide early career scientists interested in science policy. This series is brought to you by the…
Graça Almeida-Porada: The Importance of Communication in a Technologically Advancing World

By Daniel J. Gironda In the Paths to Science Policy series, we talk to individuals who have a passion for science policy and are active in advocacy through their various roles and careers. The series aims to inform and guide early career scientists interested in science policy. This series is brought to you by the…