Rob Patro Makes Biological Data Easier to Search

He discusses his path into computing, his work in computational genomics and advice for students interested in computational biology.
Descriptive image for Rob Patro Makes Biological Data Easier to Search

University of Maryland Associate Professor of Computer Science Rob Patro works in computational genomics, where his research addresses the growing need to process and interpret large amounts of biological sequencing data. He leads the COMBINE Lab, a group that develops computational methods, software tools and data structures for studying gene expression and making biological data easier to organize, search and analyze. His work aims to give scientists better ways to search biological data, similar to how search engines help users find information across the internet.

In this Q&A, Patro discusses how an early interest in video games led him to computer science, how his lab approaches genomics research and why interdisciplinary work is central to building tools that biologists can use.

Was there a defining moment that shaped your career path into computer science?

I think there was. There’s this comic that I sometimes show in the classes I teach. The setup is basically a professor at the front of the room, drawing on a whiteboard and talking about computability theory, while a student sits in class with a thought bubble that says, “Man, I just wanted to learn how to make video games.”

For me, what initially got me into computer science was video games. I became interested in how games were made, how graphics were programmed and what was happening behind the scenes. That was really my entry point into programming.

My uncle is a computer scientist who works for the government, and I went to him and asked about this stuff. He gave me a book about the C compiler, and that’s really how I got started.

Can you tell me about your research focus and what drew you to that field?

What I do now is completely different from that original interest. My path in computer science eventually led me to computational genomics.

My group builds methods for the efficient processing and analysis of sequencing data. That can include sequencing an organism's genome or its expressed genes, which is called the transcriptome.

A major focus of our work is building tools to analyze gene expression data. For example, if you want to measure which genes are expressed in a tissue or under the treatment of a specific drug, we study how to do so accurately and efficiently.

Another large research area for us is developing better data structures for storing and querying biological data. We have huge amounts of sequencing data, but we don’t yet have search mechanisms comparable to those on the internet.

What are you currently working on, and what stands out in those projects?

One major project we’re working on involves computational methods for analyzing single-cell and spatial single-cell gene expression data.

What’s exciting about this technology is that it lets us localize sequencing reads to individual cells. Instead of measuring gene expression across an entire sample, we can ask what the gene expression looks like in each specific cell, and there could be tens or hundreds of thousands of them.

That gives us a much higher-resolution view of biological systems than we’ve traditionally had. From a computer science perspective, the challenge is that these technologies generate massive amounts of data and create new algorithmic and processing problems.

A major focus of our lab is not only developing algorithms and data structures but also building usable software tools for biologists and researchers. One of the things I find most exciting is seeing researchers in different scientific domains use the tools we build to answer their own questions.

Can you describe your lab and how it is structured?

My lab studies many different kinds of problems, ranging from pure computer science and algorithm design to highly applied and interdisciplinary research.

We have students from computer science programs as well as students from programs such as cell biology, bioinformatics and genomics. The projects cover a broad spectrum of topics, but they all connect through the use of computation to study large-scale biological data*.

We focus on developing computational methods and then applying them to real biological data analysis problems. More recently, we’ve also expanded our work on applying these methods directly to understanding biological questions.

How does your work connect to the broader computer science community and society?

One area of our work focuses on making large biological datasets searchable in ways that are currently difficult or impossible to achieve.

I often compare it to an internet search. Imagine trying to find something online if you had to know the exact web address in advance and couldn’t search by content. That’s similar to the challenge we currently face with sequencing data.

What we would like to build is a searchable index that would allow researchers to ask broader questions. For example, if a new virus is discovered, researchers could ask where else it has appeared in previously collected sequencing data.

The tools we develop for gene expression analysis also support research in areas such as developmental biology, disease research and tissue engineering. As new kinds of biological data are generated, researchers need computational methods to turn raw measurements into information that scientists can analyze and interpret.

What inspired you to join the University of Maryland?

I’m kind of a Terp for life. I originally started college at Worcester Polytechnic Institute, but I transferred back to Maryland, where I completed my undergraduate degree and Ph.D.

After that, I pursued a postdoctoral position at Carnegie Mellon University, and later worked as an Assistant Professor at Stony Brook University before returning to Maryland.

One of the major reasons I came back was the strength of the computer science department and the collaborative research environment around computational biology and genomics at Maryland. The Center for Bioinformatics and Computational Biology brings together researchers from multiple departments who work on related problems.

When you work in an interdisciplinary area, having a strong research community and collaborators working on adjacent problems is very important.

What advice would you give to students interested in this field?

I think it’s important to understand where the data comes from, how they are generated and what properties they have.

Students should absolutely focus on building strong computer science skills and understanding the computational and theoretical tools available to them, but it’s also worthwhile to learn about the scientific domain to which computer science is connected.

In this field, understanding the biological aspects of the problem can help you develop more practical methods and tools that have a greater impact on researchers who use them.

—Story by Samuel Malede Zewdu, CS Communications

                                                                                                       ###

Many of the open-source tools developed by the COMBINE Lab can be found via their GitHub organization.

The Department welcomes comments, suggestions and corrections.  Send email to editor [-at-] cs [dot] umd [dot] edu.