Analyzing SNAP-CUTANA Spike-in Controls and expected results

How to analyze SNAP-CUTANA™ Spike-in Controls

Download R1 & R2 paired-end sequencing files (fastq.gz) for control reactions. Double-click the fastq.gz files to create fastq files and save in a new folder.
On the SNAP-CUTANA Spike-in product page (e.g. K-MetStat Panel), under Documents and Resources, download the Shell Script (.sh) and K-MetStat Panel Analysis (.xlsx) files. Save to the folder from Step 1.
Open the .sh file in TextEdit or any text editing program. Do NOT open in Word or a PDF program. Scroll past the barcode sequences to find the analysis script.
The script is a loop that counts the number of reads aligned to each PTM-specific DNA barcode in a reaction. Each PTM in the SNAP-CUTANA Panel is represented by two unique barcodes, A & B. For the script, you need to create one loop per control reaction. To customize:
1. Copy lines between # template loop begin ## and # template loop end ##.
2. Paste the loop under the last done. Paste one copy per control reaction.
3. In the first loop replace sample1_R1.fastq and sample1_R2.fastq with R1 & R2 fastq file names for one control reaction. Repeat for each loop. Press save.
In Terminal, set the directory to your folder: Type cd and press space. Drag the folder from your files into Terminal to copy the location. Press return.
Run your script in Terminal: Type sh and press space. Drag your .sh file from your files into Terminal to copy the file location. Press return. Terminal generates barcode read counts from R1 & R2 reads, one loop/reaction at a time.
Open the Panel Analysis Excel .xlsx file. Fill in reaction names and set the on-target PTM in Column B. The first reaction is set to IgG (negative control); for other reactions, select a target (i.e. H3K4me3) from the drop-down menu.
Copy R1 barcode read counts from the first loop in Terminal. In Excel, paste into the yellow cells for that reaction in Column C. Copy & paste the R2 read counts from the same loop to yellow cells in Column D. Repeat for each loop/reaction.
The Excel file automatically analyzes spike-in data for each reaction by:
1. Calculating total read counts for each DNA barcode (R1 + R2) in Column E.
2. Calculating total barcode read counts for each PTM (A + B) in Column F.
3. Expressing total read counts for each PTM as a percentage of on-target PTM read counts (Columns G & J), providing a readout of on- vs. off-target PTM recovery and antibody specificity.
Column J auto-populates the Output Table (Figure). Reactions are separated by row and PTM data are sorted into columns. A color gradient is used to visualize the recovery of each PTM normalized to on-target PTM, from blue (100%) to orange (less than 20%).
For each reaction, calculate the percent of unique sequencing reads that have been assigned to spike-ins. In Excel, type the total number of unique reads in the yellow cell Uniq align reads (in Column B). The % total barcode reads is calculated in the cell immediately below and is added to the Output Table.

Figure. K-MetStat Spike-ins validate workflows and flag poor samples in CUTANA CUT&RUN experiments. Spike-in data for H3K4me3 positive control reactions is shown for three independently prepared mouse B cell samples (10,000 cells each). Samples 1 & 2 show expected results, while Sample 3 was flagged for recovery of off-target PTMs and low signal-to-noise. Representative data from one IgG reaction is shown as a negative control.

Expected results from SNAP-CUTANA Spike-in control reactions

The IgG negative control shows low background and no preference among PTMs (Figure, Top row).
H3K4me3 positive control has strong enrichment for H3K4me3 spike-ins, less than 20% off-target PTM recovery, and high signal-to-noise (Figure, Samples 1 and 2).
Spike-in barcode reads comprise ~1% (0.5-5%) of total sequencing reads. This may vary based on target abundance and sequencing depth. The main goal is to have thousands of reads, which will allow adequate sampling of the K-MetStat Panel for reliable data analysis.
If control reactions generate expected spike-in data (Figure, Samples 1 and 2), you can be confident in the technical aspects of your workflow.
More than 20% off-target PTM recovery in H3K4me3 control and/or high background in IgG control indicate experimental problems (Figure, Sample 3). See this article for guidance for using SNAP-CUTANA Spike-in controls to troubleshoot workflows.