Analysis and expected results for SNAP-CUTANA™ Spike-ins

SNAP-CUTANA data analysis protocol

This protocol can also be used to analyze SNAP-CUTANA Spike-in data from CUT&RUN and CUT&Tag reactions. CUT&RUN data are shown as an example.

  1. Download R1 & R2 paired-end sequencing files (fastq.gz) for control reactions. Double-click the fastq.gz files to create fastq files and save in a new folder.

  2. On the SNAP-CUTANA Spike-in product page (e.g. K-MetStat Panel), under Documents and Resources, download the Shell Script (.sh) and K-MetStat Panel Analysis (.xlsx) files. Save to the folder from Step 1.

  3. Open the .sh file in TextEdit or any text editing program. Do NOT open in Word or a PDF program. Scroll past the barcode sequences to find the analysis script.

  4. The script is a loop that counts the number of reads aligned to each PTM-specific DNA barcode in a reaction. Each PTM in the SNAP-CUTANA Panel is represented by two unique barcodes, A & B. For the script, you need to create one loop per control reaction. To customize:

    1. Copy lines between # template loop begin ## and # template loop end ##.

    2. Paste the loop under the last done. Paste one copy per control reaction.

    3. In the first loop replace sample1_R1.fastq and sample1_R2.fastq with R1 & R2 fastq file names for one control reaction. Repeat for each loop. Press save.

  5. In Terminal, set the directory to your folder: Type cd and press space. Drag the folder from your files into Terminal to copy the location. Press return.

  6. Run your script in Terminal: Type sh and press space. Drag your .sh file from your files into Terminal to copy the file location. Press return. Terminal generates barcode read counts from R1 & R2 reads, one loop/reaction at a time.

  7. Open the Panel Analysis Excel .xlsx file. Fill in reaction names and set the on-target PTM in Column B. The first reaction is set to IgG (negative control); for other reactions, select a target (i.e. H3K4me3) from the drop-down menu.

  8. Copy R1 barcode read counts from the first loop in Terminal. In Excel, paste into the yellow cells for that reaction in Column C. Copy & paste the R2 read counts from the same loop to yellow cells in Column D. Repeat for each loop/reaction.

  9. The Excel file automatically analyzes spike-in data for each reaction by:

    1. Calculating total read counts for each DNA barcode (R1 + R2) in Column E.

    2. Calculating total barcode read counts for each PTM (A + B) in Column F.

    3. Expressing total read counts for each PTM as a percentage of on-target PTM read counts (Columns G & J), providing a readout of on- vs. off-target PTM recovery and antibody specificity.

  10. Column J auto-populates the Output Table (Figure 2). Reactions are separated by row and PTM data are sorted into columns. A color gradient is used to visualize the recovery of each PTM normalized to on-target PTM, from blue (100%) to orange (less than 20%).

  11. For each reaction, calculate the percent of unique sequencing reads that have been assigned to spike-ins. In Excel, type the total number of unique reads in the yellow cell Uniq align reads (in Column B). The % total barcode reads is calculated in the cell immediately below and is added to the Output Table.

2024.09.05 SNAP K-MetHeatmap pass_fail@2x

Figure 1. K-MetStat Spike-ins validate workflows and flag poor samples in CUTANA experiments. Spike-in data for H3K4me3 and H3K27me3 positive control reactions is shown for three independently prepared mouse B cell samples (10,000 cells each) in CUT&RUN. Samples 1 & 2 show expected results, while Sample 3 was flagged for recovery of off-target PTMs and low signal-to-noise. Representative data from one IgG reaction is shown as a negative control.


Expected results from SNAP-CUTANA Spike-in control reactions

  • The IgG negative control shows low background and no preference among PTMs (Figure 1, Top row).

  • Positive controls (e.g., H3K4me3, H3K27me3, or H3K36me3) has strong enrichment for its target nucleosome spike-ins, less than 20% off-target PTM recovery, and high signal-to-noise (Figure 1, Samples 1 and 2).

  • Spike-in barcode reads comprise ~1% (0.5-5%) of total sequencing reads. This may vary based on target abundance and sequencing depth. The main goal is to have thousands of reads, which will allow adequate sampling of the K-MetStat Panel for reliable data analysis.

  • If control reactions generate expected spike-in data (Figure 1, Samples 1 and 2), you can be confident in the technical aspects of your workflow.

  • More than 20% off-target PTM recovery in the positive control and/or high background in IgG control indicate experimental problems (Figure 1, Sample 3). See this article for guidance for using SNAP-CUTANA Spike-in controls to troubleshoot workflows.