The Human Genome Project was an ambitious initiative to sequence every piece of human DNA. The project brought together collaborators from research institutions around the world, including MIT’s Whitehead Institute for Biomedical Research, and was finally completed in 2003. Now, more than two decades later, MIT professor Jonathan Weissman and his colleagues went beyond the sequence to report the first comprehensive functional map of genes expressed in human cells. Data from this project, published online June 9 in Cell, links each gene to its job in the cell, and is the culmination of years of collaboration on the Perturb-seq single-cell sequencing method.
The data is available for other scientists to use. “It’s a great resource in that the human genome is a great resource, in that you can go in and do discovery-based research,” says Weissman, who is also a fellow at the Whitehead Institute and a researcher at the Howard Hughes Medical Institute. “Rather than defining in advance the biology you’re going to look at, you have this map of genotype-phenotype relationships and you can go in and filter the database without having to do any experiments.”
The screen allowed researchers to delve into various biological questions. They used it to explore the cellular effects of genes with unknown functions, to study the response of mitochondria to stress, and to screen for genes that cause chromosome loss or gain, a phenotype that has proven difficult to study in the past. “I think this data set is going to allow all kinds of analyzes that we haven’t even imagined yet by people who come from other areas of biology, and all of a sudden they just have this available to them.” , says Tom, a former Weissman Lab postdoc. Norman, co-lead author of the article.
The project takes advantage of the Perturb-seq approach which allows the impact of gene activation or deactivation to be tracked with unprecedented depth. This method was first published in 2016 by a group of researchers including Weissman and fellow MIT professor Aviv Regev, but could only be used on small sets of genes and at great expense.
Perturb-seq’s massive map was made possible by the seminal work of Joseph Replogle, an MD-PhD student in Weissman’s lab and co-first author of this paper. Replogle, in collaboration with Norman, who now runs a lab at Memorial Sloan Kettering Cancer Center; Britt Adamson, assistant professor in the Department of Molecular Biology at Princeton University; and a group from 10x Genomics, set out to create a new version of Perturb-seq that could be scaled. The researchers published a proof-of-concept paper in Natural biotechnology in 2020.
The Perturb-seq method uses CRISPR-Cas9 genome editing to introduce genetic changes into cells, then uses single-cell RNA sequencing to capture information about the expressed RNAs resulting from a given genetic change. Since RNAs control all aspects of cell behavior, this method can help decode the many cellular effects of genetic modifications.
Since their first proof-of-concept paper, Weissman, Regev and others have used this sequencing method on a smaller scale. For example, researchers used Perturb-seq in 2021 to explore how human and viral genes interact during infection with HCMV, a common herpesvirus.
In the new study, Replogle and collaborators, including Reuben Saunders, a graduate student in Weissman’s lab and co-first author of the paper, extended the method to the entire genome. Using cancer cell lines from human blood as well as non-cancerous cells derived from the retina, he performed Perturb-seq on over 2.5 million cells and used the data to create a comprehensive map linking genotypes to phenotypes.
Dig into the data
With the screen complete, the researchers decided to use their new dataset and look at a few biological questions. “The advantage of Perturb-seq is that it allows you to get a large data set in an unbiased way,” says Tom Norman. “No one knows exactly what the limits are of what you can get from this kind of data set. Now the question is, what do you actually do with it?
The first and most obvious application was to study genes with unknown functions. Since the screen also read the phenotypes of many known genes, researchers could use the data to compare unknown genes to known genes and look for similar transcriptional results, which could suggest that the gene products worked together in the frame. of a larger complex.
The mutation of a gene called C7orf26 in particular stood out. The researchers noticed that genes whose deletion led to a similar phenotype were part of a protein complex called Integrator that played a role in the creation of small nuclear RNAs. The Integrator complex is made up of many smaller subunits – previous studies had suggested 14 individual proteins – and the researchers were able to confirm that C7orf26 composed a 15th component of the complex.
They also discovered that the 15 subunits worked together in smaller modules to perform specific functions within the Integrator complex. “Without that thousand-foot-high view of the situation, it wasn’t so clear that these different modules were so functionally distinct,” Saunders says.
Another benefit of Perturb-seq is that because the assay focuses on single cells, researchers could use the data to examine more complex phenotypes that become confusing when studied with data from other cells. “We often take all the cells where the ‘X gene’ is knocked down and compute them together to see how they’ve changed,” says Weissman. “But sometimes when you knock down a gene, different cells that lose that same gene behave differently, and that behavior can be missed by the average.”
The researchers found that a subset of genes whose deletion led to different outcomes from cell to cell were responsible for chromosome segregation. Their removal caused the cells to lose a chromosome or pick up an extra one, a condition known as aneuploidy. “You couldn’t predict what the transcriptional response was to losing that gene, because it depended on the side effect of which chromosome you gained or lost,” Weissman says. “We realized that we could then reverse the trend and create this composite phenotype looking for signatures of gained and lost chromosomes. In this way, we performed the first genome-wide screen for factors necessary for proper DNA segregation.
“I think the aneuploidy study is the most exciting application of this data so far,” Norman says. “It captures a phenotype that you can only get using a single-cell readout. You can’t chase it any other way.
The researchers also used their dataset to study how mitochondria respond to stress. Mitochondria, which evolved from free-living bacteria, carry 13 genes in their genomes. In nuclear DNA, approximately 1,000 genes are linked in some way to mitochondrial function. “People have long been interested in how nuclear and mitochondrial DNA are coordinated and regulated under different cellular conditions, especially when a cell is under stress,” says Replogle.
The researchers found that when they disrupted different mitochondria-related genes, the nuclear genome responded the same way to many different genetic changes. However, mitochondrial genome responses were much more variable.
“The question of why mitochondria still have their own DNA remains open,” said Replogle. “A general conclusion from our work is that one of the advantages of having a separate mitochondrial genome could be to have localized or very specific genetic regulation in response to different stressors.”
“If you have one mitochondria that’s broken and another that’s broken in a different way, those mitochondria might react differently,” Weissman says.
In the future, the researchers hope to use Perturb-seq on different cell types in addition to the cancer cell line they started in. They also hope to continue exploring their gene function map and hope others will do the same. “It’s really the culmination of many years of work by the authors and other contributors, and I’m really happy to see that it continues to succeed and grow,” says Norman.