Normalizing to E. coli spike-in DNA

Aim for E. coli Spike-in DNA to comprise ~1% (0.5-5%) of total sequencing reads. In the protocol, 0.5 ng is recommended for 500,000 cells. Generally, this can be decreased linearly with decreasing cell number (e.g. 0.1 ng per 100,000 cells). The amount may need to be adjusted to achieve read counts in the optimal range due to variables such as target abundance, antibody efficiency, etc.

To normalize sequencing results using E. coli Spike-in DNA:

  1. Align sequencing reads to reference genome (e.g. human, mouse), and filter out muti-mapping reads, reads assigned to ENCODE DAC exclusion list regions, and duplicate reads (as desired) to determine the total number of uniquely aligned reads. Perform for each reaction.
  2. In a separate alignment, align sequencing reads to the E. coli K12, MG1655 reference genome: https://support.illumina.com/sequencing/sequencing_software/igenome.html. Filter out reads that do NOT align uniquely. Note that this alignment is separate from the experimental reference genome in Step 1.
  3. For pairwise comparisons, quantify E. coli Spike-in DNA reads for each CUT&RUN reaction and normalize to the total number of uniquely aligned reads.
    1. Example: CUT&RUN was used to map H3K4me3 in treated and untreated cells.
      1. Treatment spike-in = 100,000 E. coli reads in 5,000,000 total reads = 2%
      2. Untreated spike-in = 30,000 E. coli reads in 3,000,000 total reads = 1%
  4. Calculate normalization factor (see [1]) such that after normalization the E. coli spike-in signal is set to be equal across all reactions.
    1. Example from above, comparing H3K4me3 in treated vs. untreated cells:
      1. Treatment normalization factor = 1 / 2% spike-in bandwidth = 0.5
      2. Untreated normalization factor = 1 / 1% spike-in bandwidth = 1.0
  5. Use single scalar normalization ratio with the -- scaleFactor option enabled in deepTools bamCoverage tool to generate normalized bigwig files for visualization in IGV (https://deeptools.readthedocs.io/en/develop/content/tools/bamCoverage.html).
    1. Continuing with the Example from above:
      1. Treatment sample --scaleFactor = 0.5
      2. Untreated sample --scaleFactor = 1.0

The effect of normalization on a dataset is inversely proportional to the E. coli Spike-in bandwidth. In other words, reactions with the highest bandwidth will receive the largest reduction in signal after normalization. For further information on sequencing normalization using exogenous spike-in controls, see [1,2].


References:

  1. Tay et al. Hdac3 is an epigenetic inhibitor of the cytotoxicity program in CD8 T cells. J Exp Med 217 (2020).
  2. Orlando et al. Quantitative ChIP-Seq normalization reveals global modulation of the epigenome. Cell Rep 9, 1163-1170 (2014).

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.