Searching for the True Sequence of RNA


Searching for the True Sequence of RNA

April 13, 2023

Pictured: With an NIH grant, Shenglong Zhang will develop a tool to uncover the true sequence of an RNA.

Shenglong Zhang, Ph.D., associate professor of biological and chemical sciences, received a $676,946 National Institutes of Health R01 grant, with total funding estimated at $2,588,918 over the next four years, to develop a tool to uncover the true sequence of RNA.

Imagine a world where eyeglasses only allow the perception of four colors, despite the existence of a rich and diverse color spectrum. That is the challenge faced by scientists when deciphering RNA sequences. Present-day tools can only identify four nucleotides (A, C, G, and U) within an RNA sequence, yet there are numerous other components that remain undetected and unaccounted for.

Nucleotides (A, C, G, and U) are commonly recognized as the fundamental building blocks of RNA (ribonucleic acid). However, RNA molecules are not solely composed of these four nucleotides; each can carry modifications to its chemical structure or composition. An RNA sequence with all its diverse modifications constitutes the “true” information of the RNA. However, a true RNA sequence (the identity and location of each nucleotide building block—modified or not) within a full-length RNA remains a mystery due to lack of right tools to decipher all the building blocks in an RNA.

RNA plays diverse roles, but its main function is to create proteins by carrying genetic information during cell replication. It is also the primary genetic material for viruses. Modifications in an RNA’s nucleotides occur to perform these biological functions. Defects in RNA modifications account for more than 100 diseases, including breast cancer, type 2 diabetes, and obesity. By studying these modifications and defects, scientists may be able to understand and find treatments for these diseases.

What makes RNA modification studies especially complicated is that while more than 170 modifications have been discovered, not all nucleotide modifications are modified to 100 percent at their RNA sites. Determining the precise percentage of modification at a specific site further adds to the challenge in this field of research.

Numerous efforts have been made to map a select few known RNA modifications. However, as Zhang explains, these identification techniques depend on either a specific antibody, a unique chemical conversion resulting in a distinguishable base, or the employment of cutting-edge nanopore-based direct RNA sequencing methods. Therefore, these methods can analyze only one or a few modification types at a time.

Mass spectrometry (MS) is currently the only technique that can characterize all RNA modifications. “However, conventional MS methods lose information regarding the location and co-occurrence of modified nucleotides,” explains Zhang. “There is no existing technology that can sequence all modifications simultaneously to unfold true RNA sequences, particularly on a large scale.”

Complicating things further, they are not detectable using existing next-generation sequencing-based technologies because they require the conversion of RNA to complementary DNA that lacks any modification information. “These mapping methods are specific to each modification type and are primarily qualitative in nature,” says Zhang. “They typically identify sites in RNA where one specific modification can occur, but they do not indicate any quantitative information about the percentage of modifications that occur at that particular position”


Associate Professor Computer Science Wenjia Li will work with Associate Professor of Biological and Chemical Sciences Shenglong Zhang to develop a new tool to discover the true make up of RNA.

Working with co-investigator Associate Professor of Computer Science Wenjia Li, Ph.D., Zhang aims to develop a new tool to discover the true make up of RNA. “With our method, we hope we can see not just four nucleotides (A, C, G, and U), but also all modifications comprehensively,” says Zhang.

Zhang and Li recently developed a series of novel next-generation mass spectrometry-based sequencing methods (NextGen MassSpec-Seq) that make it possible to sequence RNA without any input of prior sequence information and simultaneously provide precise quantification of RNA modifications.

Together with collaborators at Columbia University and University of Utah, they will now further develop Next Gen Mass Spec-Seq to sequence tRNAs (a small RNA molecule that plays a key role in protein synthesis) in different cellular and disease conditions. The goal is to expand the application to other RNA types beyond tRNAs for direct sequencing and quantitative map of all modifications comprehensively.

“Our tool aims to tackle a long-standing challenge of revealing ‘true’ RNA sequences, offering a transformative solution for studying RNA modifications,” says Zhang. “This advancement will foster a deeper understanding of functions of RNA modifications and their correlations to RNA-related diseases and pandemics.”

This grant is supported by the National Institutes of Health under Award Number 1R01HG012853-01. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.