Improved duplex sequencing identifies spontaneous mutations in bacteria without long-term culturing.

Spontaneous mutations are the driving force of evolution, yet, our ability to detect and study them can be limited to mutations that accumulate clonally. Sequencing technology often cannot identify very rare variants or discriminate between bona fide mutations and errors introduced during sample preparation. In GENETICS, Zhang et al. created an improved sequencing method to study low-abundance spontaneous mutations in the bacterium Escherichia coli.

To develop their method, the authors began with duplex sequencing, in which fragmented DNA molecules are tagged with an adaptor sequence for sequencing. This method is powerful, but at high read depths, it can erroneously call true mutations as PCR duplicates, making it ill-suited for finding rare mutations.

The authors first determined the error rate of the PCR step of duplex sequencing, where most experimental artifacts would be expected to occur. Because duplex sequencing can identify reads that came from the same parental DNA molecules (based on the adaptor sequences), the authors assumed that any such reads that had mismatches must have come from base changes during the PCR. By identifying these discrepancies, they determined the rates of different kinds of errors in the sequencing process.

The authors then sequenced E. coli genomes using a new method, which they termed improved duplex sequencing (IDS). IDS is similar to duplex sequencing, but it uses adaptor sequences of multiple different lengths. The use of more and different adaptor sequences minimizes the chance that two different DNA molecules that happen to break at the same place will be erroneously called as PCR replicates. By employing this method and accounting for the error rate of the PCRs, which they had already determined, the authors were able to confidently identify rare, random mutations in E. coli.

Having identified such mutations, the authors looked for patterns. They found that clusters of mutations occurred in regions of the genome that are known to be replication fork stopping regions. This is suggestive of transcriptional errors, as would be expected for spontaneous mutations. Interestingly, mutations in these hotspots were almost entirely in relatively unimportant regions of the genome—for instance, in the non-functional parts of tRNA genes. These vulnerable areas of the genome hint at mechanisms in E. coli that may protect more critical regions from damage.


Spatial Vulnerabilities of the Escherichia coli Genome to Spontaneous Mutations Revealed with Improved Duplex Sequencing

Xiaolong Zhang, Xuehong Zhang, Xia Zhang, Yuwei Liao, Luyao Song, Qingzheng Zhang, Peiying Li, Jichao Tian, Yanyan Shao, Aisha Mohammed AI-Dherasi, Yulong Li, Ruimei Liu, Tao Chen, Xiaodi Deng, Yu Zhang, Dekang Lv, Jie Zhao, Jun Chen, Zhiguang Li

Genetics October 2018 210: 547-558;

Science Writing and Communications Intern, Genetics Society of America.

View all posts by Marisa Wexler »