Once the user’s FASTQS have been aligned to the reference genome successfully, an Alignment QC Report will be generated allowing the user to qualify their dataset. This report can be found in the Alignment section of an experiment. To the right of each section of the Alignment QC Report will be a brief description of the metric generated for clarity.
The first metric generated in the QC Report is Sequencing Stats and Alignment Metrics where the user can determine the quality of alignment. Suggested ranges for sequencing stats are as follows:
Total Reads: Aim for 5-10 million reads per CUT&RUN and CUT&Tag sample, though up to approximately 15 million is acceptable on the higher end. Expect a loss of 1-2 million reads post-alignment due to duplicates, blacklisted reads, and multi-aligned reads.
Unique Alignment Rate: This percentage should be high for a quality dataset, typically 70-95% for specific PTMs. An exception is the IgG negative control, which is not expected to align well. Datasets below the recommended percentage warrant further investigation, as contributing factors could include poor assay yields/unique templates, high PCR duplicates, or over-sequencing, which can decrease uniquely aligned reads.
Duplication Rate: The goal is less than 30%. High duplication rates can indicate low template diversity from the assay, over-amplification during PCR, over-sequencing, or the presence of more than 5% Illumina adapter dimer in the library/flow cell.
E. coli Alignment Rate: The goal is less than 5%. A high E. coli alignment rate can suggest incorrect reconstitution of the stock (e.g., adding too much to stop buffer/samples), low template diversity from the assay, or over-sequencing.
The SNAP Spike-in Results, found on the Alignment QC Report, display a heatmap illustrating the percentage of barcode reads from the panel in reactions containing the barcoded nucleosome spike-in (e.g., IgG, H3K4me3 controls). Ideally, the IgG antibody control should show less than 20% recovery across each of the 16 panel members, indicating no specificity. Conversely, the H3K4me3 antibody control should demonstrate 100% specificity for the H3K4me3 barcoded nucleosome and less than 20% for all other members. Deviations exceeding 20% for off-target panel members may suggest a poor-quality antibody, an excessive amount of SNAP Spike-in, or assay-related issues.
TSS (Transcription Start Site) and Gene Body Heatmap(s) will also be generated for your reactions and these indicate the unique reads associated with your targets that are aligned to known sites of transcription. These heatmaps provide an idea of the pattern of enrichment for each reaction. For example, H3K4me3 is a PTM of active transcription and should be represented as a punctate peak in the Mean Signal (RPKM) centered on the TSS and at the beginning of the Gene body.