[Article 1]
[Title] Discovering the Functions of Noncoding Sequence Variants
[Dek] Neville Sanjana explores noncoding genomic regions by combining pooled CRISPR screening and single cell sequencing.
[Byline] Interviewed by Niki Spahich, PhD
[Headshot Credit] Judy Quinn, New York Genome Center
After a postdoctoral research position with CRISPR pioneer Feng Zhang, Neville Sanjana set out to target numerous genes at once with the nascent technology, developing pooled CRISPR screens.1,2 Now a core faculty member at the New York Genome Center and an associate professor at New York University, Sanjana developed a new technique that combines CRISPR with genome-wide association study (GWAS) data and single cell technology to explore the effects of hundreds of variants in parallel.3
Using STING-seq, short for systematic targeting and inhibition of noncoding GWAS loci with single cell sequencing, Sanjana and his team identified promising sequence variants linked to various polygenic blood traits. Most of these variants map to noncoding genomic regions, making it unclear which genes they affect. To determine target genes for certain traits, STING-seq uses CRISPR to repress GWAS variants of interest and allows researchers to see the downstream effects of this repression on transcripts and proteins through single cell sequencing.
Why did you combine CRISPR screening with single cell sequencing and GWAS analysis?
There are amazing GWAS databases, such as the UK Biobank that we used in our study, but we can only do correlative studies with this data, comparing people with and without the disease to see what is different. One of the fundamental challenges is to understand which variants cause a trait or disease. That is difficult because it is hard to break up linkage disequilibrium. A person inherits pieces of their genome from their parents, and some of those pieces travel together. We might not know whether one variant in a dataset is causal or if it is co-inherited with the actual causal variant. Additionally, most common variants are in noncoding regions. Because they are not within a gene, we may not know what they do.
When I came to my current institution, I was surrounded by researchers who did single cell experiments. Because CRISPR screens focus on many genes, it became clear that it would be powerful to pair up these technologies. We previously could knockout each gene in the human genome with genome-scale CRISPR libraries, but we were assessing one phenotype at a time. When we couple CRISPR with single cell, multi-omic readouts, we can look at many genetic perturbations and multiple phenotypes. Coupling CRISPR screens with single cell approaches simultaneously allows for the systematic targeting and inhibition of noncoding GWAS variants.
How did you employ STING-seq?
Using STING-seq, we targeted and inhibited hundreds of candidate blood trait variants in noncoding cis-regulatory elements (CREs). We treated human cells with a CRISPR that represses the area it binds to rather than cutting it and performed a single cell sequencing method, looking at each perturbation at the transcript and protein levels. With this method, we can address the linkage problem, identify causal variants that are involved with a particular trait or disease, and figure out which genes are targets.
Parking a giant CRISPR repressor on DNA is helpful for discovering causal variants, but it is not realistic in a biological sense. STING-seq inhibits areas of the genome using a repressor, so we only saw decreases in gene expression. Because different variants might cause various gene expression changes, we then developed beeSTING-seq, which uses a base editor to make single nucleotide edits. We targeted genomic regions where a cytosine to thymine (C to T) base editor would impart a sequence change that matched certain variants in our GWAS data. In one example, a T edit made the expression of a target gene go up compared to the original C allele, so we learned how this genetic variant worked through that gene.
Did any of your results surprise you?
We found two variants that influenced the same transcription factor gene, but one had an effect that was twice as strong as the other. Because transcription factors interact with many genes, these variants affected many downstream genes in trans at the same level as the target transcription factor. This gives us a good sense of how the genome is wired, and it helps us understand the purpose of noncoding variants, where they can have subtly different effects on a gene in cis that then changes gene expression across the whole genome.
What is next for STING-seq?
As somebody who loves the idea of using functional genomics to understand and treat disease, my hope is that researchers will use STING-seq to study as many diseases as possible and find genes that are useful for developing next generation therapeutics.
This interview has been condensed and edited for clarity.
[References]
- Sanjana NE, et al. Improved vectors and genome-wide libraries for CRISPR screening. Nat Methods. 2014;11(8):783-784.
- Shalem O, et al. Genome-scale CRISPR-Cas9 knockout screening in human cells. Science. 2014;343(6166):84-87.
- Morris JA, et al. Discovery of target genes and pathways at GWAS loci by pooled single-cell CRISPR screens. Science. 2023;380(6646):eadh7699.
[Article 2]
[Title] Monitoring Multiple Myeloma Progression through Sequencing
[Dek] Irene Ghobrial sequences circulating tumor cells in blood samples to genomically profile patients with multiple myeloma.
[Byline] Interviewed by Charlene Lancaster, PhD
[Headshot Credit] Sam Ogden, Dana Farber
Multiple myeloma is a relatively uncommon cancer of plasma cells, where excessive proliferation of these malignant cells within the bone marrow leads to abnormal antibody production. The disease develops from asymptomatic precursor stages, monoclonal gammopathy of undetermined significance (MGUS) and smoldering multiple myeloma (SMM), with a subset of patients progressing from MGUS through SMM to multiple myeloma.1
Irene Ghobrial’s training with Robert Kyle, who she considers the grandfather of myeloma, at the Mayo Clinic sparked her interest in multiple myeloma. She now runs her own laboratory at the Dana-Farber Cancer Institute, where she identifies risk factors that contribute to disease progression and devises innovative approaches for early disease detection. In a recently published Cancer Discovery paper, Ghobrial’s team developed a novel testing procedure called minimally invasive multiple myeloma sequencing or MinimuMM-seq, which allowed them to estimate the patients’ disease burden and detect the emergence of new clones or subclones in a less invasive manner.2
What motivated you to develop MinimuMM-seq?
Many people with MGUS or SMM will never develop multiple myeloma. But every single patient diagnosed with myeloma today must have had these precursor conditions years before. As clinicians, we should be detecting those earlier stages and not waiting for people to have symptoms including kidney failure, bone fractures, and anemia before treating them. Currently, when we screen asymptomatic patients for the disease and monitor its progression, we perform bone marrow biopsies, which are painful. Because of that, clinicians do not execute the procedure enough times to determine if a patient is responding to therapy. So, although we have amazing drugs to treat patients with myeloma, we are using them completely blind. Additionally, the malignant cells are not evenly distributed throughout the marrow, so the biopsy sample may not be representative of the patient’s condition. After obtaining the samples, we are also still using a very old technology, fluorescence in situ hybridization (FISH), to assess the genomic changes, such as specific mutations or copy number alterations. But FISH fails in many patients, especially in the earlier stages where they do not have enough cells for analysis.
How does MinimuMM-seq work?
In MinimuMM-seq, we examine circulating tumor cells (CTC) derived from patient blood samples. Researchers have shown previously that CTC counts are prognostic, where the more cells that are circulating, the worse the patient’s prognosis. CTC also provide clinicians with information about the malignant cells within the whole body and not just at a single site. We isolated and enriched the CTC from the blood samples using the CellSearch system, extracted their DNA, and performed whole genome sequencing.
What are the advantages of using MinimuMM-seq over bone marrow biopsies and FISH?
Using this method, clinicians do not need to perform a bone marrow biopsy, which is easier on the patients. However, bone marrow biopsies are still likely useful at the beginning to validate the results from MinimuMM-seq. This new technique also allows clinicians to obtain multiple sequential samples to monitor disease progression over time, as well as to examine clonal dynamics including which clones are emerging or disappearing. Additionally, they can acquire information about clones that originate from different areas of the bone marrow rather than from only one area using a bone marrow biopsy. Compared to FISH, MinimuMM-seq displays greater sensitivity and specificity, and it can identify new mutations, translocations, or other genomic alterations that could be critical for understanding multiple myeloma progression and which therapies a patient may benefit from.
What are your next steps?
The current MinimuMM-seq method requires that we use 50 cells or more. We are now working on reducing the number of cells needed to detect minimal residual disease at the very early stages of MGUS. We are also investigating if we can isolate these cells from old frozen samples, as well as comparing the simplicity and sensitivity of this technique to other liquid biopsy methods, such as circulating free DNA. I am hoping in the future MinimuMM-seq will become a clinically validated test, but in the meantime, the method continues to open more doors to a better understanding of multiple myeloma.
This interview has been condensed and edited for clarity.
[Reference]
1. Kumar SK, et al. Multiple myeloma. Nat Rev Dis Primer. 2017;3(1):1-20.
2. Dutta AK, et al. MinimuMM-seq: Genome sequencing of circulating tumor cells for minimally invasive molecular characterization of multiple myeloma pathology. Cancer Discov. 2023;13(2):348-363.
[Article 3]
[Title] Revving the Motor: Full-Length Protein Sequencing with Nanopore Technology
[Dek] Jeff Nivala develops nanopore-based sequencing techniques to help advance proteomics.
[Byline] Nathan Ni, PhD
[Headshot Credit] Jeff Hagen
Protein sequencing presents different challenges than nucleic acid sequencing, meaning that proteomics has yet to benefit as much as genomics from the next-generation sequencing revolution. However, the ability to sequence proteins at “nucleic acid levels” would be tremendously beneficial. Accordingly, scientists look to high-throughput nucleic acid sequencing for inspiration on how to improve existing protein sequencing techniques or develop new ones.1
Jeff Nivala, a molecular engineer at the University of Washington, sees nanopore technology as the way forward toward single-molecule protein sequencing and beyond. In an interview with The Scientist, Nivala describes his new technique, currently described in a bioRxiv pre-print, where he uses the enzyme ClpX to unfold and ratchet long protein strands through nanopores, allowing them to be read with single-amino acid sensitivity. 2
What sparked your interest in using nanopore technology for protein sequencing?
The big breakthrough for nucleic acid sequencing using nanopores was the discovery of a motor protein that can ratchet the strand nucleotide-by-nucleotide through the pore. At the start of my graduate studies, I tried to find a similar motor for proteins. Fortunately, around the same time, a study came out in Cell that characterized how the unfoldase ClpX worked at the single-molecule level.3 The detail presented in this study let me imagine how I could transfer this motor protein over and apply it for nanopore protein sequencing, and I was able to put two and two together.4
What is the biggest difference between using nanopore technology for protein sequencing versus nucleic acid sequencing?
Protein sequencing is a lot more challenging. Nucleic acids have uniform negatively charged backbones, which means that electrophoretic force is enough to move nucleic acids through a nanopore. Proteins are heterogeneously charged, so they do not behave as nicely and the signals become noisier. There is also a lot more complexity when working with 20 amino acids compared to four nucleotides, not to mention tertiary structures, folded domains, and so on.
How difficult is it to ratchet a protein through a nanopore?
ClpX can handle synthetic proteins fairly easily because they are not folded. Natural proteins with fully-folded domains can be harder to resolve because they have to be unfolded before passing through the nanopore. Proteins can also refold on the trans side of the pore after passing through. In another study using this technology, we actually have to unfold a protein twice, once before it can go through the nanopore via electrophoretic force and once before it can come back up. There is still a lot to be discovered about how well a motor works with a given protein.
How difficult is it to distinguish between different amino acids?
The signal that we observe comes from sensing a sliding window of around 20 amino acids at a time as they pass through the pore. This makes it much more difficult to detect single amino acid differences. However, the longer the sequence, the higher the odds that it will generate distinct signal elements. Putting these individual elements together creates a collective unique signature, which lets us adopt a fingerprint-based approach and use a signature to identify a protein.
These sequence differences can be very subtle, making it hard to do traditional statistics or other analysis methods. This is where machine learning comes in, and we are training machine learning programs to extract the differences in signal between different proteins, learn what features are associated with what amino acids, and map how the surrounding amino acids contribute to the observed signal at a given position.
What are your short- and long-term goals for this technique?
We are increasing the size of our data sets by looking at more complex amino acid sequences so that we can train better models. Most of our data right now comes from synthetic proteins, so we are building our dataset by adding more natural molecules. Ultimately, the goal is to have a model that can recognize any given arbitrary protein in the human proteome.
As we go down that path, we anticipate that we will see new challenges appear. For example, will we need an improved motor protein for certain types of protein sequences? Will a different pore size make our method more sensitive because our current nanopore was optimized for DNA sequences? There are going to be so many improvements that make this technique exponentially better in the future.
This interview has been condensed and edited for clarity.
References
- Floyd BM, Marcotte EM. Protein sequencing, one molecule at a time. Annu Rev Biophys. 2022;51:181-200.
- Motone K, et al. Multi-pass, single-molecule nanopore reading of long protein strands with single-amino acid sensitivity. bioRxiv. 2023.10.19.563182.
- Maillard RA, et al. ClpX(P) generates mechanical force to unfold and translocate its protein substrates. Cell. 2011;145(3):459-69.
- Nivala J, et al. Unfoldase-mediated protein translocation through an α-hemolysin nanopore. Nat Biotechnol. 2013;31:247-50.
[Article 4]
[Title] Boosting Bacterial Genomes
[Dek] Gang Fang’s new metagenomics method helps sequence rare bacteria.
[Byline] Interviewed by Aparna Nathan, PhD
[Headshot Credit] Brian Schutza
Human bodies are teeming with trillions of microbial cells that comprise the microbiome, many of them bacteria. Although they may be small, some of these bacteria maintain health, but others promote sickness. These differences often come down to the genes in each bacterial genome, but it can be challenging to find and sequence rare strains.
Gang Fang, a geneticist at the Icahn School of Medicine at Mount Sinai, has proposed a new solution called mEnrich-seq that draws on his decade of bacterial epigenomic research to distinguish different species’ DNA for metagenomic studies.1 In an interview with The Scientist, Fang describes his vision for how mEnrich-seq can help scientists answer hard questions about human’s bacterial companions.
What are some challenges associated with studying the human microbiome with metagenomics?
We have a lot of technologies to understand the microbiome in many different ways, but there is a common problem. If a bacterial species is abundant in a sample, then we can learn almost anything about it, but if the abundance of a species is really low, it is very hard to study. There may even be two or three coexisting strains of the same species, and the important strain may not be the one that is relatively more abundant. These different strains are often very similar in terms of their genomes, so it is extremely hard to differentiate between them.
What motivated you to develop mEnrich-seq?
If a target is rare, most of the sequencing throughput will be consumed by the more abundant species. The moment we sequence, we have already lost this battle, so we needed a new strategy before sequencing. The natural epigenetic barcodes in bacteria give us a unique way to solve this problem. Even though different species and strains have similar genomes, they often encode different DNA methyltransferases, which determine their DNA methylation patterns.2 Bacteria do this to differentiate between self and foreign DNA. We can use this to differentiate between species’ or strains’ genomes based on the global methylation pattern.
If we want to target a certain genome and we know its methylation pattern, we can rationally choose restriction enzymes that will cut at a certain sequence called the methylation motif. The enzymes will digest the vast majority of the background DNA that does not have this matching methylation. With mEnrich-seq, we can enrich bacteria of interest over 100-fold.
How does this protocol compare to a standard metagenomic sequencing experiment?
We actually considered this in our design. We wanted it to be effective but also very easy to plug into the existing pipeline. Ultimately, mEnrich-seq only involves two steps in addition to the standard library preparation. In the first step, after adapter ligation and before amplification, we digest the DNA using the rationally chosen restriction enzyme. By digesting the DNA with the adapter already ligated, only the intact DNA will have ligated adapters on both ends and can be amplified across their full length. The other DNA will be much shorter, which leads to step two: after amplification, we perform size selection. The other steps, such as quality control, remain the same.
In what contexts do you think this method could be most useful?
One application is to battle antibiotic resistance, for example in urinary tract infections (UTI). Ideally, clinicians want to sensitively detect the antibiotic-resistance genes carried by a patient’s UTI strain, but a urine sample has a lot of host DNA and other bacteria. Right now, the best practice for UTI is to culture the urine samples, which takes three days to get a result. This is not ideal. We want to build an antibiotic resistance profile quickly—within one day—so that the doctor can decide which antibiotics to give a patient.
Another application is for beneficial bacteria, or probiotics, such as Bifidobacterium. Different strains can have very different health benefits. If we want to discover probiotics associated with human disease or drug responses, we need to do some initial screening to narrow down the species in fecal samples and recover more promising candidates.
A lot of people are interested in these applications, and we think mEnrich-seq provides a new, more sensitive, reliable, and cost-effective way to tackle these problems.
This interview has been condensed and edited for clarity.
[References]
- Cao L, et al. mEnrich-seq: methylation-guided enrichment sequencing of bacterial taxa of interest from microbiome. Nat Methods. 2024;21(2):236-246.
- Beaulaurier J, et al. Metagenomic binning and association of plasmids with bacterial host genomes using DNA methylation. Nat Biotechnol. 2018;36(1):61-69.