How bioinformatics can help fill the therapeutic drug pipeline

Written by members of the GSA Early Career Scientist Communication and Outreach Subcommittee: Angel F. Cisneros Caballero, Université Laval; Adelita Mendoza, PhD, Washington University; Narjes Alfuraiji, University of Manchester; Anna Bajur, Max Planck Institute of Molecular Cell Biology and Genetics

During the current global pandemic, public attention is increasingly falling on the process of drug discovery and development. How exactly do we find new treatments? And what does it take to bring them to the clinic? One powerful tool in this process that often escapes notice is bioinformatics—the use of computational resources to answer biological questions.

Exponential increases in computational power have revolutionized the way we do science. Over time, this has created entirely new fields of research, since we can now analyze more data efficiently and explore more complex algorithms and models¹. Bioinformatics is one of the fields made possible by this technological achievement, and it has been critical for many recent scientific advances².

Bioinformatics comprises two interdisciplinary sub-fields that interface with computer science, mathematics, and biology: One is the research and development that scientists need to build the models modern biology requires. The other is computational biology, which is dedicated to understanding basic biological queries.

Bioinformatics is not just an academic field; it has many clinical applications. For example, we now have the technology to sequence genomes and identify genes involved in diseases, such as cancers. However, we can only do it accurately by looking at short segments at a time. Sequencing an organism’s genome becomes like a giant puzzle with thousands of pieces, and only bioinformatic methods allow us to assemble the pieces.

Bioinformatics can also be used to guide drug design experiments and maximize the chances of finding active molecules. This new knowledge can eventually be used to develop therapies and vaccines to save human lives. Here, we will look at some examples of how we can use bioinformatics to discover molecular signposts for particular biological processes. These signs are known as biomarkers, and they are important in all types of clinical research. We will then take a closer look at how bioinformatics can use this information to come up with an application, such as a drug.

Biomarkers of regeneration

Humans do not have the ability to regenerate limbs after amputation, but certain animals have this extraordinary ability, including planarian flatworms and axolotls. To understand these strong regenerative capabilities, scientists study fruit flies, flatworms, axolotls, and zebrafish. These species are powerful model systems to study tissue regeneration after amputation or damage. As in most biological fields, modern-day bioinformatics techniques are playing a key role in understanding how the genome responds to injury.

Regeneration requires a real-time genomic response, which can be studied by looking at which genes are activated or repressed in individual cells with single-cell RNA sequencing. A recent study from Fincher et al. identified flatworm genes that were active after injury by analyzing all messenger RNA (the transcriptome) of individual lineage precursor cells with Drop-seq. This technique isolates single cells in droplets so that they can be separately analyzed and compared. This method is so powerful that researchers were able to detect the transcriptome from cell types with frequencies as low as ~10 cells per animal³.

Bioinformatic analyses allowed the cells to be clustered by gene expression groups in different tissue types, which then allowed researchers to build an atlas of genes expressed in the transcriptome after injury.

In another example, Vizcaya-Molina et al. identified novel enhancers that regulate gene activation during different phases of recovery from injury in developing fruit flies. The researchers looked for accessible regions in the DNA (which are associated with higher gene activation) using a technique called ATAC sequencing. They confirmed that some regions of the transcriptome changed in response to injury, and they then wanted to know if those genes had common functions. With the help of bioinformatic databases, they found that many of those genes belonged to signaling pathways involved in cell growth and differentiation⁴.

A study by Goldman et al. uncovered the genetic regulatory program that responds to injured cardiomyocytes in zebrafish. Inaccessible regions of DNA are tightly wrapped around proteins called histones. They looked at profiles of a replacement histone that indicates transcriptional accessibility, known as H3.3, to uncover gene regulatory elements involved in heart regeneration. This method allowed researchers to identify genes that were upregulated in response to injury. Later, during cardiomyocyte regeneration, they found an enrichment of enhancer elements that were “open” for transcription and then identified the specific sequence involved during regeneration⁵.

These examples show that bioinformatics helps to unlock the mysteries of genes that regulate regeneration after injury. Bioinformatics techniques are applicable to monitoring real-time genomic response in individual cells, probing sections of accessible regions in the DNA in several organisms that are capable of regeneration. The greater computational power that bioinformatics provides will allow scientists to ask new questions that are important to the field of regeneration.

Biomarkers of virulence factors

Bioinformatic tools are also important in finding biomarkers of infectious disease virulence, which can be appealing candidates for drugs. For instance, we can look for specific genes that drive the pathogenicity of a given microorganism, such as yeast. To do this, we can design strains that lack particular genes and evaluate if this makes them less pathogenic. Testing a large number of yeast strains is typically performed using competitive growth methodologies⁶. For example, Han et al. evaluated growth of each mutant strain under controlled conditions of direct competition with other mutants, thus reducing the time and cost associated with screening each one individually. This enabled screening of a large number of strains to identify a drug target.

An example of how functional genomics can be used to identify drug targets in pathogenic fungi has been carried out in Candida albicans with the C. albicans fitness test (CaFT). In this test, each isolate is assigned a unique identifier (barcode) that we can track computationally in order to observe if there were differences in fitness among heterozygote isolates. This enabled the researchers to screen for loss of gene function in the presence of antifungal agents, from which they identified the mechanism of action of novel compounds⁷.

Competitive fitness profiling was also used to evaluate the relative fitness of large pools of A. fumigatus mutants to identify those that are involved in virulence using a non-genetically barcoded library of mutants⁸. As a result, they reduced the total number of animals that are usually required to perform virulence screening. Tn–Seq is another technique used to assess the contribution of genes to fitness in Streptococcus pneumoniae. However, instead of deleting the gene, Tn-Seq inserts additional DNA within the gene⁹.

Similarly, changes in mutant frequency can be used to compare the fitness of the different mutants. By looking at which mutants grow most poorly, scientists can identify which genes are the most essential and consider them as potential drug targets. This is of particular interest in drug discovery programmes, since it is crucial to identify genes that are responsible or involved in pathogenicity to develop and design a novel therapy.

Drug design

Once we have found the optimal drug target, we can turn to bioinformatics again to help us find a drug for it. A classic approach is to generate millions of molecules experimentally, test them, and register the ones that have an effect. However, this method is very time-consuming and resource-intensive, while the number of effective molecules can be low. Instead, we can use our models of molecular interactions to test molecules computationally and only test experimentally the ones that are predicted to be effective. This allows us to narrow down the set of molecules to test in an experiment while maximizing the chance of success. Indeed, Doman et al. showed that computational tests increase the efficiency of these experiments. When they screened a big library of molecules, only 0.02% of their tests were positive. However, when they used a computational analysis to evaluate only the ones predicted to be effective, 35% of their tests were positive¹⁰. Thus, virtual screening saves a considerable amount of time and money by reducing the number of assays yet results in higher efficiency. In fact, there are several examples of drugs found through computational screening that have been approved by the FDA. These include dorzolamide to treat glaucoma, captopril to treat hypertension, and saquinavir to treat HIV¹¹. Moreover, these approaches are being used in the context of the current COVID-19 pandemic to find potential new treatments.

All potential drugs should be subjected to multiple stages of evaluation to assess their safety—first in preclinical tests with model organisms, and then in clinical studies in humans. Despite the promise of computational methods to help identify active molecules, most fail to pass these clinical studies because of unwanted side-effects. Thus, one of the newest endeavors in the field is the use of machine learning to add predictions on how likely a given molecule is to be toxic. Machine learning is a series of tools that find trends in known data to predict the results of future observations¹².

Currently, these methods look at databases of molecules to extract their physical properties and health concerns associated with them. Then, they build models that link those properties to health concerns to derive general rules. These approaches have been very successful, with some models being able to identify toxic compounds with up to 95% accuracy.

Gaining access to greater computational power has allowed us to pursue new questions and develop further techniques to address them. This has had a notable impact on diverse fields, from basic science to applications in the clinic. The future of bioinformatics will certainly be exciting, as it will likely produce more and more results that have an impact on our daily lives.

References:

Edgar, T. W. & Manz, D. O. Research Methods for Cyber Security. (Syngress, 2017).
Gauthier, J., Vincent, A. T., Charette, S. J. & Derome, N. A brief history of bioinformatics. Brief. Bioinform. (2018). doi:10.1093/bib/bby063
Fincher, C. T., Wurtzel, O., de Hoog, T., Kravarik, K. M. & Reddien, P. W. Cell type transcriptome atlas for the planarian Schmidtea mediterranea. Science 360, (2018).
Vizcaya-Molina, E. et al. Damage-responsive elements in Drosophila regeneration. Genome Research 28, 1852–1866 (2018).
Goldman, J. A. et al. Resolving Heart Regeneration by Replacement Histone Profiling. Dev. Cell 40, 392–404.e5 (2017).
Han, T. X., Xu, X.-Y., Zhang, M.-J., Peng, X. & Du, L.-L. Global fitness profiling of fission yeast deletion strains by barcode sequencing. Genome Biol. 11, R60 (2010).
Xu, D. et al. Genome-wide fitness test and mechanism-of-action studies of inhibitory compounds in Candida albicans. PLoS Pathog. 3, e92 (2007).
Macdonald, D. et al. Inducible Cell Fusion Permits Use of Competitive Fitness Profiling in the Human Pathogenic Fungus Aspergillus fumigatus. Antimicrob. Agents Chemother. 63, (2019).
Solaimanpour, S., Sarmiento, F. & Mrázek, J. Tn-seq explorer: a tool for analysis of high-throughput sequencing data of transposon mutant libraries. PLoS One 10, e0126070 (2015).
Doman, T. N. et al. Molecular docking and high-throughput screening for novel inhibitors of protein tyrosine phosphatase-1B. J. Med. Chem. 45, 2213–2221 (2002).
Sliwoski, G., Kothiwale, S., Meiler, J. & Lowe, E. W. Computational Methods in Drug Discovery. Pharmacol. Rev. 66, 334–395 (2014).
Yang, H., Sun, L., Li, W., Liu, G. & Tang, Y. In Silico Prediction of Chemical Toxicity for Drug Design Using Machine Learning Methods and Structural Alerts. Front Chem 6, 30 (2018).

The authors:

Adelita Mendoza

Angel F. Cisneros Caballero

Anna Bajur

Narjes Alfuraiji

Bioinformatics, COVID-19, Early Career Scientists

Graduate student and postdoctoral leaders from the Early Career Scientist Committees of the GSA.

View all posts by Early Career Scientist Committees »

Early Career Leadership Spotlight: Julio Molina Pineda

We’re taking time to get to know the members of the GSA’s Early Career Scientist Committees. Join us to learn more about our early career scientist advocates. Julio Molina Pineda Policy and Advocacy University of Arkansas Research Interest My research interests focus on using model organisms to genetically dissect complex traits related to human disease. My…
Early Career Leadership Spotlight: Peiwei Chen

We’re taking time to get to know the members of the GSA’s Early Career Scientist Committees. Join us to learn more about our early career scientist advocates. Peiwei Chen Accessibility Subcommittee California Institute of Technology Research Interest Far from a harmonious place, the genome is a battleground, where every bit of DNA fights for inheritance and…
#Dros23 GSA Poster Award winners

We are pleased to announce the GSA Poster Award winners from the 64th Annual Drosophila Research Conference! Undergraduate and graduate student members of the GSA were eligible for the awards, and a hard-working team of postdocs volunteered their time as judges. Congratulations to all! Undergraduate Students 1st Place: Sofia Karter Lopez, University of Toronto “Rab11 mediates E-cadherin recycling during…
Congratulations to the Fall 2022 DeLill Nasser Awardees!

GSA is pleased to announce the recipients of the DeLill Nasser Award for Professional Development in Genetics for Fall 2022! Given twice a year to graduate students and postdoctoral researchers, DeLill Nasser Awards support attendance at meetings and laboratory courses. The award is named in honor of DeLill Nasser, a long-time GSA supporter and National Science Foundation…
New editors join GENETICS, G3 editorial boards

Several new editors are joining the GSA Journals. We’re excited to welcome Ricardo Zayas to the GENETICS editorial board under the Molecular Genetics of Development section, and on the G3: Genes|Genomes|Genetics board, we welcome Polly Campbell, Kevin Vogel, Joe Parker, and Ricardo Mallarino. Ricardo Zayas Associate Editor Ricardo Zayas is a Professor of Biology at…
Worms and Flies Provide Key Clues to Medical Mystery

This article is part of a series of posts outlining the history and impact of research in experimental organisms. The series is developed in collaboration with the GSA Public Communications and Engagement Committee. By the time Bertrand Might was six months old, it was clear something was amiss. His muscles weren’t developing normally; he was…
Congratulations to the 2023 Early Career Leadership Program Cohort!

The Genetics Society of America (GSA) is excited to announce the latest cohort of student, postdoc, and early-career research leaders joining the Early Career Leadership Program. Participants receive training and mentoring while serving on committees charged with understanding the needs, interests, concerns, and challenges of early career scientist members of the GSA. As part of…
GSA LOCI: Local Outreach Community Initiatives @ GSA Conferences

Highlights: Local Outreach Community Initiatives (LOCI): The Genetics Society of America is committed to supporting the communities of the host cities of our conferences. This new year, we are excited to reconnect with our GSA community in meaningful ways within and beyond our existing programming. The GSA membership has created a caring and supportive environment…
New members of the GSA Board of Directors: 2023–2025

We are pleased to announce the election of five new leaders to the GSA Board of Directors: 2023 Vice President/2024 President Mariana Wolfner Distinguished Professor of Molecular Biology and Genetics and Stephen H. Weiss Presidential Fellow My research has focused on the genes and pathways that mediate sexual development and reproduction, primarily in Drosophila. From…
Lance David Miller: Lighting Your Own Fire by Finding the Right Resources

By Daniel J. Gironda In the Paths to Science Policy series, we talk to individuals who have a passion for science policy and are active in advocacy through their various roles and careers. The series aims to inform and guide early career scientists interested in science policy. This series is brought to you by the…
Graça Almeida-Porada: The Importance of Communication in a Technologically Advancing World

By Daniel J. Gironda In the Paths to Science Policy series, we talk to individuals who have a passion for science policy and are active in advocacy through their various roles and careers. The series aims to inform and guide early career scientists interested in science policy. This series is brought to you by the…