Here we describe how to normalize CUT&RUN sequencing data using E. coli Spike-in DNA.
Align sequencing reads to reference genome (e.g. human, mouse), and filter out muti-mapping reads, reads assigned to ENCODE DAC exclusion list regions, and duplicate reads (as desired) to determine the total number of uniquely aligned reads. Perform for each reaction.
In a separate alignment, align sequencing reads to the E. coli K12, MG1655 reference genome: https://support.illumina.com/sequencing/sequencing_software/igenome.html. Filter out reads that do NOT align uniquely. Note that this alignment is separate from the reference genome in Step 1.
For pairwise comparisons, quantify E. coli Spike-in DNA reads for each CUT&RUN reaction and normalize to the total number of uniquely aligned reads.
Example: CUT&RUN was used to map H3K4me3 in treated and untreated cells.
Treatment spike-in = 100,000 E. coli reads in 5,000,000 total reads = 2%
Untreated spike-in = 30,000 E. coli reads in 3,000,000 total reads = 1%
Calculate normalization factor (see [1]) such that after normalization the E. coli spike-in signal is set to be equal across all reactions.
Example from above, comparing H3K4me3 in treated vs. untreated cells:
Treatment normalization factor = 1 / 2% spike-in bandwidth = 0.5
Untreated normalization factor = 1 / 1% spike-in bandwidth = 1.0
Use single scalar normalization ratio with the -- scaleFactor option enabled in deepTools bamCoverage tool to generate normalized bigwig files for visualization in IGV (https://deeptools.readthedocs.io/en/develop/content/tools/bamCoverage.html).
Continuing with the Example from above:
Treatment sample --scaleFactor = 0.5
Untreated sample --scaleFactor = 1.0
The effect of normalization on a dataset is inversely proportional to the E. coli Spike-in bandwidth. In other words, reactions with the highest bandwidth will receive the largest reduction in signal after normalization. For further information on sequencing normalization using exogenous spike-in controls, see [1,2].
References
Tay et al. Hdac3 is an epigenetic inhibitor of the cytotoxicity program in CD8 T cells. J Exp Med 217 (2020).
Orlando et al. Quantitative ChIP-Seq normalization reveals global modulation of the epigenome. Cell Rep 9, 1163-1170 (2014).