Normalizing to E. coli Spike-in DNA

Here we describe how to normalize CUT&RUN sequencing data using E. coli Spike-in DNA.

  1. Align sequencing reads to reference genome (e.g. human, mouse), and filter out muti-mapping reads, reads assigned to ENCODE DAC exclusion list regions, and duplicate reads (as desired) to determine the total number of uniquely aligned reads. Perform for each reaction.

  2. In a separate alignment, align sequencing reads to the E. coli K12, MG1655 reference genome: https://support.illumina.com/sequencing/sequencing_software/igenome.html. Filter out reads that do NOT align uniquely. Note that this alignment is separate from the reference genome in Step 1.

  3. For pairwise comparisons, quantify E. coli Spike-in DNA reads for each CUT&RUN reaction and normalize to the total number of uniquely aligned reads.

    Example: CUT&RUN was used to map H3K4me3 in treated and untreated cells.

    1. Treatment spike-in = 100,000 E. coli reads in 5,000,000 total reads = 2%

    2. Untreated spike-in = 30,000 E. coli reads in 3,000,000 total reads = 1%

  4. Calculate normalization factor (see [1]) such that after normalization the E. coli spike-in signal is set to be equal across all reactions.

    Example from above, comparing H3K4me3 in treated vs. untreated cells:

    1. Treatment normalization factor = 1 / 2% spike-in bandwidth = 0.5

    2. Untreated normalization factor = 1 / 1% spike-in bandwidth = 1.0

  5. Use single scalar normalization ratio with the -- scaleFactor option enabled in deepTools bamCoverage tool to generate normalized bigwig files for visualization in IGV (https://deeptools.readthedocs.io/en/develop/content/tools/bamCoverage.html).

    Continuing with the Example from above:

    1. Treatment sample --scaleFactor = 0.5

    2. Untreated sample --scaleFactor = 1.0

The effect of normalization on a dataset is inversely proportional to the E. coli Spike-in bandwidth. In other words, reactions with the highest bandwidth will receive the largest reduction in signal after normalization. For further information on sequencing normalization using exogenous spike-in controls, see [1,2].


References

  1. Tay et al. Hdac3 is an epigenetic inhibitor of the cytotoxicity program in CD8 T cells. J Exp Med 217 (2020).

  2. Orlando et al. Quantitative ChIP-Seq normalization reveals global modulation of the epigenome. Cell Rep 9, 1163-1170 (2014).