Developing Advanced Tools to Improve Accuracy and Accessibility for Long-Read RNA Sequencing

Descriptive image for Developing Advanced Tools to Improve Accuracy and Accessibility for Long-Read RNA Sequencing

In the ever-evolving world of computational biology, one of the greatest challenges has always been deriving knowledge and insight from the massive amount of both DNA (double helix) and RNA (single helix) data through which scientists must sift. For example, for those that remember, it took almost 13 years (1990–2003) to sequence and assemble the first near-complete draft of the human genome.

New advances in biotechnology, along with concordant computational advances, have dramatically shortened the timeline for such analyses, with a process known as “long-read sequencing”—equate it to reading voluminous amounts of genomic data in sentences and paragraphs instead of individual words—proving very effective.

Now, with funding from the National Institutes of Health (NIH), researchers from the University of Maryland and the University of North Carolina are taking methods for processing long-read RNA sequencing data to the next level. They’re developing an open-source software pipeline that they say can significantly improve the accuracy and accessibility of analyses based on these sequencing technologies.

A key component of these pipelines is a software package they have developed known as oarfish, which offers advanced techniques to minimize errors and deliver more accurate insights into gene activity. This includes analyzing complex transcriptomes—how genes turn “on” and “off” in different cells and tissues—which is crucial for investigating disease mechanisms, developing new drugs and identifying biomarkers. 

Oarfish is intended to provide an accurate, efficient and easy-to-use interface for analysis of long-read RNA sequencing data, says Rob Patro, an associate professor of computer science and a principal investigator of the NIH grant.

“Our goal, with this new support from NIH, is to significantly scale-up the prior work we have done on oarfish, allowing those interested in working with long-read sequencing to develop smoother, more versatile, and more accurate workflows,” he says.

Assisting Patro on the project at UMD are Zahra Zare Jousheghani, a sixth-year doctoral student in computational biology, and Noor Pratap Singh, a sixth-year doctoral student in computer science. All three are active in the Center for Bioinformatics and Computational Biology, which is part of the University of Maryland Institute for Advanced Computer Studies, where Patro has a joint appointment and where business staff are managing the $1.4 million UMD portion of the NIH grant.

Click HERE to read the full article

The Department welcomes comments, suggestions and corrections.  Send email to editor [-at-] cs [dot] umd [dot] edu.