AudioHelicase Podcast: Silvi Rouskin on solving the structure of the coronavirus genome
In this episode of AudioHelicase Podcast, Whitehead Fellow Silvi Rouskin discusses her research on solving the structure of the novel coronavirus's RNA genome, with the goal of revealing weak points in the virus’s gene regulation that new drugs could potentially target.
CONOR GEARIN, HOST: The world is reeling from a pandemic caused by the novel coronavirus known as SARS-CoV-2. Despite its large impact, the virus actually depends on a relatively small genome that consists entirely of the nucleic acid RNA. Single-stranded RNA is made up of four building blocks, or nucleotides — A, U, C, and G. The strands of RNA are flexible and can bend into many different shapes. Each different shape could give the molecule different functions. Understanding the structure of the coronavirus’s RNA genome could prove key to designing new antiviral drugs for coronavirus or future viral threats.
I’m Conor Gearin, digital media specialist at Whitehead Institute. In this episode, we’ll hear from Whitehead Fellow Silvi Rouskin, whose lab is studying the shape of RNA molecules in the coronavirus. By modelling the structures of coronavirus RNA, Rouskin hopes to reveal weak points in the virus’s gene regulation that new drugs could potentially target.
Rouskin earned her PhD at the University of California, San Francisco in the lab of Jonathan Weissman. She is the Andria and Paul Heafy Fellow at Whitehead Institute. As a Whitehead Fellow, she runs her own lab and conducts independent research while still an early-career scientist.
Silvi, welcome to AudioHelicase. To start us off, what research is your lab currently pursuing?
SILVI ROUSKIN: Right now, my lab is almost entirely studying SARS-CoV-2. This is because we were previously working on HIV, on the human immunodeficiency virus 1. We realized that a lot of the tools and methods we developed are directly applicable to SARS, because it’s also an RNA virus. We’re still continuing work on HIV and other work that we’re interested in, which has to do with neurodegeneration, so it’s very diverse. But right now it’s easier and most motivating to work on SARS. And people in my lab are very excited to work on it, so they’ve been switching projects or pausing what they’re doing just to be able to help and contribute towards studying this virus and getting some answers.
In general we are an RNA biology lab. We love RNA, RNA is our favorite molecule. What I’ve been focusing on for a long time is studying the shapes of RNA molecules — so how each individual RNA molecule folds and forms different structures inside cells. RNA is very unique in this ability to form structures. They’re different than in the case of protein. Usually for protein, you can almost predict how a protein is going to fold entirely from the sequence, because there is one very stable solution. In the case of RNA, it clearly has the capacity for the same sequence to actually form into multiple different structures. There’s still a lot we don’t understand about the driving forces. And the shape of the RNA molecules are extremely important — similar to the shape of proteins — because it’s intimately tied to their functions.
GEARIN: In general, how does the structure of an RNA molecule affect its function? Has your HIV research provided any examples of how this works in practice?
ROUSKIN: Those are still open questions for the large majority of those shapes, but I’ll just focus on a few things that we found. HIV is a very interesting virus. One reason is that it has a really small genome. And that’s true about all RNA viruses, they’re very small. For HIV, the entire viral genome is about 10,000 nucleotides. The human genome is 3 billion nucleotides. The coronavirus is 30,000. So the coronavirus is the largest of RNA viruses, and it’s still very, very small. RNA viruses have this challenge, that they have to encode so much information, they have to pack so much information in this really small space. So they’ve evolved, and they’ve gotten really creative in optimizing and layering information on top of their primary sequence.
We’ve realized that one kind of this layering is how the primary sequence of the genome folds. And so what we found, for example for HIV, there’s only a handful of genes: it actually just has 9 genes — really small. But it needs to express genes from the exact same molecule that is both the genetic material and the thing that gets translated into protein.
GEARIN: So what you’re pointing out here is that in human cells, you have the DNA sequence which includes many regions that don’t code for proteins but instead regulate how genes are expressed in the form of RNA, but for RNA viruses like HIV, they have to rely on a small amount of single-stranded RNA for both of these functions.
ROUSKIN: Right, yes — it’s replacing a very complex machinery in the human genome that involves a lot of DNA and a lot of proteins. In the HIV genome, a lot of this is being replaced by RNA and RNA being able to, by itself, make multiple conformations. The idea is pretty simple. There are some primary sequence motifs, some motifs in the actual sequence, that are typically a binding site for a protein factor. And the way it folds its RNA can expose those motifs so that those motifs are available. So the primary sequence now says, “Here I am, bind me.” And that would allow you to express a certain set of genes from this molecule. Or it could fold entirely differently and bury those motifs such that now they can’t be bound, they can’t be recognized by cellular factors. It can hide itself or it can expose, and that all depends on how the RNA folds.
GEARIN: That’s fascinating. How did you become interested in studying RNA structures?
ROUSKIN: I’ve always been interested in studying RNA and RNA structure because it’s so critical to the function. In grad school, I had developed an approach to study RNA structure in cells, which was lacking. Everybody was studying RNA in a test tube. But then when I started as a fellow I realized that there was one big limitation: that for every sequence, we were assuming that every molecule with that sequence is going to fold exactly the same way. But that assumption became very hard to justify once we started working on viruses, because we know that the RNA has so many different fates and different goals. And so I was very adamant to find an approach that will allow us to be able to tell, from experimental data, without theory, just say, “Here’s the data: is this sequence forming multiple structures or not?”
GEARIN: In order to figure out these RNA structures, you and your lab have developed a technique called DMS-MaPseq that involves marking RNA molecules with a chemical called dimethyl sulfate, DMS for short, which lets you sequence the RNA and read out which RNA bases are open, meaning that they’re available for binding. Can you talk a little about how these tools work — how you take that information about open bases and then model a structure for the molecule?
ROUSKIN: DMS is just a small molecule and it puts a very small mark at all the open bases on the RNA. If you are given a sequence, the sequence is only four bases, right — A, C, G, and U in the case of RNA. But if you are given the sequence of four bases, and if someone marked for you — these bases are open — then the problem is much simpler to figure out how this molecule is folded, because the rules are pretty simple as well. A binds to U and C binds to G. This is a very heavy constraint on how the RNA can fold. Purely computational prediction is really good at predicting how short sequences are going to fold because the rules are so simple.
But once you are talking about how the coronavirus genome is going to fold, which is 30,000 nucleotides, those rules are simple, but now that goes against you, because there are so many ways you can pair the bases that will work in principle and will give you a stable solution. But if you knew exactly what bases have to be open, for a single molecule, then that constrains it so much that you can usually derive a very good model and unique model as to how this is folding.
The key here was because you have single-molecule resolution, to be able to actually see that there are different bases that are open even though your sequence is exactly the same. You have the same sequence, but actually on a subset of molecules, only these particular bases are open, and on a different subset only those are open, and they never overlap. It’s a clustering algorithm, so they cluster different molecules together or far apart because they come from a different shape or different underlying structure.
GEARIN: When the COVID-19 pandemic began, you realized that the toolset you’d developed for studying HIV could also help us learn about the genome of the novel coronavirus, which is also an RNA virus like HIV. What has it been like applying your toolset to study coronavirus?
ROUSKIN: This was probably the most exciting research, because it was so urgent, and we really wanted it, and we wanted it now — to characterize it and be as fast as possible in the given circumstances. And knowing that we’re really best positioned to do this, that there’s no one else in the entire world that could do a better job, because we had just done this for HIV. And it was very straightforward to apply it for coronavirus, which is another RNA virus. It actually ended up being easier than HIV, because when we infected cells with coronavirus, they were producing a lot more virus, so it was very easy to sequence that. We were very excited. We’re still obviously very excited.
We’re collaborating on this with the lab of Anthony Griffiths at BU, Boston University. There are only a handful of labs across the entire United States — I think it’s four or five—that have this level of containment. This was very good for us, because DMS is also toxic, since it modifies RNA. For their safety level, they had to wear some very serious suits anyway, so they were able to also add DMS. I didn’t know Anthony at all — and it’s been amazing to see how quickly and how willingly scientists have been coming together and sharing to study this virus. That has been so amazing. Everyone I’ve emailed, absolutely every single person, regarding needing advice, knowledge, reagents, anything on the coronavirus—they’ve responded right away, they’ve met with me, they’ve helped. It’s been incredible. Sometimes it feels like the Avengers — that you really are coming together with the best people in the world, experts who are working in this field, and everyone is willing to help each other towards this common goal to find the weaknesses of this virus, to understand this virus, to help — and to avenge.
GEARIN: You and your collaborators have been able to solve for the structure of the entire SARS-CoV-2 genome, which is now available for other researchers to design drugs that could target specific RNA structures. What stands out in the results of that work so far?
ROUSKIN: That was the most exciting. It’s something that no one had ever seen before, nobody knew what it looks like. We really wanted to do this as fast as possible and get our work on bioRxiv, which is what we did, so that people can use the maps that we’re providing to develop RNA-based therapeutics.
This is preliminary and we’re working very hard on this right now, but what we see there is that SARS, it’s an RNA virus — it needs to get translated to make protein. But it needs to be translated in the right way. And we think that a critical aspect of how it is getting the right proportion of proteins that it needs is by having different conformations of its RNA. But in this case, it’s just a different control step. So in HIV, what we found is gene expression, so an upstream control step. In this one, it’s translation, going from RNA to protein, and how the RNA folding is affecting whether you’re going to be translated or not, in simple terms.
GEARIN: You mentioned that you’ve posted a manuscript of this study on the preprint server bioRxiv. In your experience, what has been the value of preprints in the time of COVID-19?
ROUSKIN: I think preprints have been invaluable. Clearly a lot of people have seen our work, because I was immediately plugged into the SARS community of scientists. Similarly, our research has greatly benefited from reading other people’s preprints on bioRxiv. What’s published in journals right now is at least a year behind, the way I see it — versus on bioRxiv, you see the research that is happening now. I think that is a huge, huge benefit, especially in this case of SARS and this global pandemic — this sharing of information.
GEARIN: Where is the scientific community right now in its efforts to make antiviral drugs that target coronavirus RNA?
ROUSKIN: I think right now, this field is in its infancy. But it has so much potential. I’m speaking at this conference in December, it was going to be in Boston and now of course it’s virtual, but it’s called “Drugging the Undruggable.” And it’s about drugging RNA, directly. There’s been a lot of interest in this idea, and really the first successes happened in 2017, just a few years ago, where it was shown you can use small molecules to specifically target an RNA structure, an RNA sequence and have a functional output. The first ones were used to treat spinal muscular atrophy. And that success has been very inspiring
In principle, just like we can drug protein, we can drug RNA. And if we can understand what’s the shape of the RNA, then we can understand what’s the shape of the drug that will fit into the RNA tightly and bind it. And once it binds it, it’s going to prevent it from doing its function. I’m very excited about the potential of drugging RNA and kind of envisioning this world in which we can very quickly figure out, for any virus, what the different shapes are, and even just by pure computational modeling, we could design the right chemical, the right drug to fit those shapes and develop drugs very fast.
I’m very excited about going into this field and being a part of it, because it’s extremely important that we are able to respond to a new virus. I’m going to spend a lot of effort in this direction.
GEARIN: You can learn more about Silvi Rouskin’s research on our website at wi.mit.edu. Find past episodes of AudioHelicase and stay tuned for new ones by subscribing on iTunes and SoundCloud. Thanks for listening.
Music: “Versailles” by Pierce Murphy (CC-BY 4.0). Produced by Conor Gearin.
Communications and Public Affairs