In the early 1980s, David Gilmour, now an emeritus molecular and cell biology professor at Pennsylvania State University, joined the laboratory of geneticist and biochemist John Lis as a graduate student at Cornell University. Back then, researchers wanted to better understand how proteins interacted with specific genes, and they were interested in developing approaches for probing these interactions using intact cells.
Lis and Gilmour focused on identifying proteins that bound to specific chromatin sites to understand the mechanisms of gene regulation. To do so, they planned to stabilize proteins at their chromatin binding locations and use nucleic acid hybridization to fish out specific DNA sequences along with the bound proteins. By using enzymes to remove the DNA, they would end up with their protein of interest.
To keep the protein-DNA bonds stable, they used ultraviolet (UV) light, which induces crosslinks between proteins and DNA in vitro.1 “Every lab that worked with DNA at the time had one of these transilluminators, which is a box with a UV lamp, that they used to look at their ethidium bromide-stained gels,” Gilmour said. “I just took that device, flipped it upside down, and used it to shine UV light on my samples.”
Although the approach to collecting DNA-bound proteins seemed feasible, Gilmour’s first attempts were unsuccessful. A potential solution to his problem presented itself at a seminar where he learned that researchers captured DNA-protein complexes with immunoprecipitation.
Gilmour tested using antibodies to capture proteins with E. coli RNA polymerase and a bacterial plasmid to which it bound. After irradiating the bacterial cells with UV light and sonicating the material to fragment DNA, he added RNA polymerase antiserum and immunoprecipitated the complex. He then radiolabeled the coprecipitated DNA and analyzed it using a dot blot hybridization assay.2
“It worked beautifully. I could easily detect polymerase crosslinking to specific sequences,” he recalled.
This success motivated Lis and Gilmour to test the UV-based crosslinking and immunoprecipitation approach in eukaryotic cells. They found that the UV crosslinking approach detected the enzyme on single copy heat shock genes and allowed the team to determine transcription initiation and termination sites for some of these genes.3 In further experiments focused on the fruit fly heat shock protein 70 (hsp70) genes, the researchers found that RNA polymerase II interacted with the promoter region of the gene even in non-heat shocked cells, suggesting that the enzyme could be essential for transcriptional activation.4
According to Gilmour, these initial experiments established the feasibility of protein-DNA crosslinks in intact cells combined with immunoprecipitation. “That is helpful to people because it gives them confidence; it gives them something to build upon,” he said.
The Genesis of Chromatin Immunoprecipitation Assay
A few miles east at the Massachusetts Institute of Technology (MIT), Mark Solomon was excited about the work of Lis and Gilmour. “That was beautiful work. It was very precise, very clean. It was part of the inspiration for what we did,” recalled Solomon, who is now a biochemist at Yale University.
In the 1980s, Solomon was a graduate student in the laboratory of biochemist Alexander Varshavsky, who is now at the California Institute of Technology. Varshavsky studied chromatin structure and wanted to develop new methods to detect the DNA-protein interplay in intact cells.
Motivated to better understand chromatin structural organization, Varshavsky focused on nucleosomes, the basic structural units of chromatin composed of a histone core wrapped with DNA. At the time, researchers were unsure whether histones in the nucleosomes dissociated or remained bound to the DNA as the chromatin uncoiled and RNA polymerase began its job transcribing genes.
It was actually a really amazing application at the time to be able to measure transcription factor binding across the genome.
—Steven Jones, British Columbia Cancer Research Center
To investigate this question, Varshavsky and his team used a chemical reagent, formaldehyde, to induce protein-DNA crosslinks. According to Varshavsky, the idea to use formaldehyde came from experiments he conducted in the 1970s when he and his colleagues used the chemical to fix the simian virus (SV) 40 DNA, a procedure that helped them identify chromatin regions that were more susceptible to restriction endonuclease cleavage.5,6 By looking at the SV40 chromosomes, Varshavsky’s team also showed that formaldehyde crosslinked histones to DNA within the nucleosomes, making it a suitable crosslinker for probing histone-DNA interactions within intact cells.7
Solomon, Varshavsky, and Pamela Larsen, who was a postdoctoral researcher in Varshavsky’s lab and is now a biologist at the University of Texas Health San Antonio, examined chromatin structure of the hsp70 genes by using formaldehyde to produce protein-DNA crosslinks in fruit fly cells.8 To evaluate whether histones remain bound to chromatin as the hsp70 genes are transcribed, the team formaldehyde fixed heat-shocked and non-heat-shocked cells and assessed hsp70 genes migration patterns in a gel by tagging the DNA with a radioactively labeled probe.
They found that hsp70 DNA of heat-shocked cells migrated faster in a gel than hsp70 genes of non-heat-shocked cells. These findings suggested that heat shocking induced changes in the chromatin conformation that reduced protein-DNA contacts.
Next, the team tested whether the migration pattern differences they observed were due to the bound RNA polymerase. To do so, they used a detergent that leaves RNA polymerase intact while removing all histones attached to DNA. After treating isolated DNA with the detergent, they observed no gel migration differences between hsp70 genes from heat-shocked and non-heat shocked cells. “It didn’t look like it was RNA polymerase. Therefore, it was probably histones,” Solomon said.
Inspired by the UV light-based immunochemical approach developed by Lis and Gilmour, the team decided to use antibodies to capture a histone of interest, in this case histone 4 (H4), one of the nucleosome core forming histones. After purifying the chromatin fragments from the formaldehyde-fixed cells, the team immunoprecipitated the DNA-protein complexes using an anti-H4 antibody. Following reversal of the crosslinks, they analyzed the DNA fragments using a dot blot hybridization assay. In this first run of ChIP, they found that H4 remained attached to the hsp70 chromatin fragments, indicating that this histone was retained in the actively transcribed genes.
The key steps of the formaldehyde antibody technique originally described by Varshavsky’s team remain cornerstones of more recent crosslinked ChIP protocols. The method gained popularity when molecular biologist David Allis at the Rockefeller University and others expanded its use to investigate how histone modifications affect the expression of genes.9,10 A further boost to ChIP applications came when researchers realized that they could look at DNA-protein interactions at more than just one specific chromatin location by applying the technique across the genome.
ChIP-chip and ChIP-seq: Going Genome Wide
One of the first efforts to apply ChIP genome wide was made by researchers in the laboratory of molecular biologist Richard Young at the Whitehead Institute. In the 2000s, Young’s team used DNA microarrays, which are glass slides coated with short DNA sequences, to detect genes expressed by cells in response to a variety of stimuli. Although powerful, microarray chips did not provide any information about the proteins bound to the DNA and how they might affect gene expression.
To uncover the sites of protein-DNA interactions at genome scale, Bing Ren, a former postdoctoral researcher in Young’s lab and now a gene regulation researcher at the University of California, San Diego, and his colleagues combined ChIP with microarrays, developing a method later dubbed ChIP-chip.
Things have gotten a lot more powerful. It’s the same basic process underneath, but what you can do is amazing.
—Mark Solomon, Yale University
In ChIP-chip, researchers perform the traditional ChIP steps to obtain the DNA fragments, which they then amplify and label with a fluorescent dye. The scientists next hybridize the DNA to a microarray containing sequences from an organism’s genome and assess the fluorescence intensity in the microarray as a readout of the genomic sites to which a protein has bound.
To test the accuracy of the method, Ren and his colleagues monitored the DNA binding sites of two transcriptional activators, Gal4 and Ste12, in yeast cells.11 Gal4 enhances the transcription of many genes involved in galactose metabolism, while Ste12 regulates the expression of genes involved in the differentiation of yeast cells into mating competent cells. They found that the Gal4 protein bound to the promoter region of several genes previously reported to be regulated by this transcriptional activator. By treating yeast with pheromones to trigger Ste12 function, the team also showed that Ste12 regulated the expression of 29 pheromone-induced genes, revealing a set of genes directly regulated by the transcriptional activator.
In further experiments using ChIP-chip, researchers described nucleosome depletion at active promoters across the Saccharomyces cerevisiae genome and identified proteins involved in DNA replication, repair, and methylation.12,13
Although the combination of ChIP and microarrays greatly expanded researchers’ views of DNA-protein interactions in the genome, microarrays came with limitations, including the number of DNA sequences added to the array, which limited coverage, and the resolution of microarrays, which often allowed scientists to identify only large genomic changes.14 The advent of next-generation sequencing (NGS) and its combination with ChIP offered scientists another way to look at these associations at the whole-genome level, allowing a more precise mapping of protein-DNA binding sites.
Exploring ChIP combined with NGS, or ChIP-seq, interested Steven Jones, a bioinformatician at the British Columbia Cancer Research Center. In the mid-2000s, NGS technologies became more widely available, but the cost of sequencing entire genomes, particularly large ones, was simply too high for many scientists.
Instead of using NGS to map a whole genome, Jones and his team decided to see if they could combine the high-throughput technology with ChIP to uncover the binding sites of specific transcription factors. “It was actually a really amazing application at the time to be able to measure transcription factor binding across the genome,” Jones recalled.
In a collaboration with Michael Snyder, a functional genomicist at Stanford University who was at Yale University at the time, Jones’ team obtained chromatin immunoprecipitated DNA fragments to which the signal transducer and activator of transcription 1 (Stat1), a transcription factor with a well characterized function in mammalian cells, had bound. Using these DNA sequences, the researchers built a library that they sequenced.
The sequencing technology provided shorter reads that were ideal for characterizing ChIP-derived fragments. By mapping these sequences back to the reference genome, the team identified peaks at specific genomic loci.15 Interferon-γ stimulation of cells increased the number of Stat1-bound sites compared to unstimulated cells, with ChIP-seq detecting 24 out of 34 chromatin sites known to contain Stat1 interferon-responsive binding sites.
“People had done ChIP microarray-based approaches and that had given a very good view of where these binding [sites] were. But to see peaks that could be well defined enough to actually say, ‘yes, it’s actually binding to these bases here,’ was just really powerful,” Jones said. “It was a little bit revolutionary.”
At the same time that Jones and his team published their ChIP-seq findings on Stat1, other researchers also began to explore genome-wide views of protein-binding sites. For instance, researchers mapped the binding sites of the neuron-restrictive silencer factor (NRSF) as well as the distribution of histone methylation across the human genome.16,17
“Things have gotten a lot more powerful. It’s the same basic process underneath, but what you can do is amazing,” Solomon said.
- Park CS, et al. Molecular mechanism of promoter selection in gene transcription. I. Development of a rapid mixing-photocrosslinking technique to study the kinetics of Escherichia coli RNA polymerase binding to T7 DNA. J Biol Chem. 1982;257(12):6944-6949.
- Gilmour DS, Lis JT. Detecting protein-DNA interactions in vivo: distribution of RNA polymerase on specific bacterial genes. Proc Natl Acad Sci U S A. 1984;81(14):4275-4279.
- Gilmour DS, Lis JT. In vivo interactions of RNA polymerase II with genes of Drosophila melanogaster. Mol Cell Biol. 1985;5(8):2009-2018.
- Gilmour DS, Lis JT. RNA polymerase II interacts with the promoter region of the noninduced hsp70 gene in Drosophila melanogaster cells. Mol Cell Biol. 1986;6(11):3984-3989.
- Varshavsky AJ, et al. SV40 viral minichromosome: preferential exposure of the origin of replication as probed by restriction endonucleases. Nucleic Acids Res. 1978;5(10):3469-3477.
- Varshavsky AJ, et al. A stretch of "late" SV40 viral DNA about 400 bp long which includes the origin of replication is specifically exposed in SV40 minichromosomes. Cell. 1979;16(2):453-466.
- Solomon MJ, Varshavsky A. Formaldehyde-mediated DNA-protein crosslinking: a probe for in vivo chromatin structures. Proc Natl Acad Sci U S A. 1985;82(19):6470-6474.
- Solomon MJ, et al. Mapping protein-DNA interactions in vivo with formaldehyde: evidence that histone H4 is retained on a highly transcribed gene. Cell. 1988;53(6):937-947.
- Kuo MH, Allis CD. In vivo cross-linking and immunoprecipitation for studying dynamic Protein:DNA associations in a chromatin environment. Methods. 1999;19(3):425-433.
- Briggs SD, et al. Histone H3 lysine 4 methylation is mediated by Set1 and required for cell growth and rDNA silencing in Saccharomyces cerevisiae. Genes Dev. 2001;15(24):3286-3295.
- Ren B, et al. Genome-wide location and function of DNA binding proteins. Science. 2000;290(5500):2306-2309.
- Lee CK, et al. Evidence for nucleosome depletion at active regulatory regions genome-wide. Nat Genet. 2004;36(8):900-905.
- Woodfine K, et al. Investigating chromosome organization with genomic microarrays. Chromosome Res. 2005;13(3):249-257.
- Park PJ. ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet. 2009;10(10):669-680.
- Robertson G, et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods. 2007;4(8):651-657.
- Johnson DS, et al. Genome-wide mapping of in vivo protein-DNA interactions. Science. 2007;316(5830):1497-1502.
- Barski A, et al. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129(4):823-837.