Geneious Academy logo

Analyzing CRISPR Editing Results Tutorial

Learn how to analyze CRISPR editing results in Geneious Prime, including how to prepare your sequence files, set up the analysis and interpret the results.

Introduction

In this tutorial, you will learn to use the “Analyze CRISPR Editing Results” tool in Geneious Prime 2020.2 and later.

This tool aligns, clusters and analyzes NGS reads from your CRISPR editing experiments so that you can determine the frequency of variants and their protein effects. This tutorial covers how to process your reads for use with this tool (Step 1), how to set up the analysis (Step 2) and how to interpret the results (Step 3).

The dataset* used in this tutorial is paired-end Illumina MiSeq data produced from amplicon sequencing of the target region surrounding the CRISPR site. The target region comprises 240bp of the APC gene, a WNT signalling pathway regulator.

Note the BBDuk plugin is required for this tutorial. To install go to Tools → Plugins, select BBDuk Trimmer from the list of available plugins and click Install.

*We thank Luke Dow from Weill Cornell Medical College for providing the dataset for this tutorial.

INSTRUCTIONS
To complete the tutorial yourself with included sequence data, download the tutorial and install it by dragging and dropping the zip file into Geneious Prime. Do not unzip the tutorial.

DOWNLOAD TUTORIAL

EXERCISE 1
Step 1: Preparing your data

EXERCISE 2
Step 2: Running “Analyze CRISPR Editing Results

EXERCISE 3
Step 3: Interpreting the results

How to work with CRISPR
Tutorial finding CRISPR sites

Step 1: Preparing your data

Analyze CRISPR Editing Results is designed to be run on unaligned NGS reads imported from fastq files. If your reads are paired, you should select the appropriate pairing settings upon importing your file. Paired reads must then be merged prior to running the analysis. We also recommend performing basic quality trimming prior to merging to ensure the accuracy of the merging step.

Note that Analyze CRISPR Editing Results can also be run on Sanger sequences. If you have forward and reverse reads spanning your amplicon, you should assemble these into a single sequence prior to the analysis, using the de novo assembly tool (as described at this link).

The Sample Reads dataset provided for this tutorial contains already paired reads. We will first trim this dataset using the BBDuk plugin.

Select Sample Reads and go to Annotate and Predict → Trim using BBDuk. First click the Reset to Defaults option under the grey settings cog in the bottom left of the window to clear any previous settings you have been using. Select Trim Adapters and leave the settings as they are. Then select Trim Low Quality and set the Minimum Quality to 20. This will trim poor quality bases with phred scores of less than 20 from the ends of the reads. To discard reads that are too short to be useful after trimming, select Discard Short Reads and set the minimum length to 20 bp.

Your BBDuk settings should look as in the screenshot below:

Click OK to run the analysis. This will produce a new file called “Sample Reads (trimmed)”, which should contain 188,324 reads. You can see the full results of the trimming under the Info tab for this file

Now merge the paired reads into single sequences by selecting the “Sample Reads (trimmed)” file and going to Sequence → Merge Paired Reads.

You should now have 2 files, one of merged reads and one containing reads that couldn’t be merged. The merged reads file will be used as input to the next step.

Step 2: Running “Analyze CRISPR Editing Results”

The Analyze CRISPR Editing Results tool maps the merged reads to a reference sequence, trims them to the region of interest around the CRISPR editing site, and then collapses the reads into identical clusters and outputs the number of reads in each cluster as a percentage of the total.

The reference sequence should be a short sequence spanning the CRISPR editing site, of similar length to the reads. This will normally be the unedited amplicon sequence. The reference sequence can be set in the operation dialog, or selected along with the sample reads prior to opening the operation.

Select the reference file Apc Reference along with the Sample Reads (trimmed) (merged) file you created in Step 1 (hold down control/command to select both files). Then open Analyze CRISPR Editing Results from the Annotate and Predict menu. Apc Reference should automatically be set as the Reference sequence.

Our reference sequence is a portion of the APC gene and is annotated with a partial CDS annotation. This allows variants to be analyzed for their protein effect using the correct frame. If your reference sequence does not have a CDS annotation, you can set which translation frame to use under the Variant Analysis section.

The Variants of Interest section of the setup dialog allows you to specify how far upstream and downstream of the CRISPR cleavage site to look for variants. Leave this at the default setting of 50 bp. Leave minimum variant frequency set to 0.5%; this setting determines the minimum frequency of variants that will be included in the results.

Your setup window should look as in the screenshot below. Click OK to begin the analysis.

The analysis may take a few minutes to run. In the next step, we will look at how the results are displayed.

Step 3: Interpreting the Results

After the analysis has finished you should see a contig document called “Apc Reference CRISPR Variants for Sample Reads (trimmed) (merged)”. This is similar to the output from map to reference, but instead of seeing each read mapped to the reference, only one representative of each identical cluster is mapped, as shown in the screenshot below.

The name of each mapped sequence takes the following format:
Variant frequency, Variant effect (total number of reads in cluster, number of reads with putative sequencing errors in cluster)
e.g. 17.85% Frame shift 2 bp deletion (15,484 reads, including 483 with sequencing errors)

Note that the sequences are trimmed so that only 50 bp upstream and downstream of the putative cut site are shown, as per the “Variants of Interest” setting in the setup. This is the region used to group reads into identical clusters – variant bases outside this region are not considered.

Geneious will automatically determine where the cut site is based on the location of the majority of variants. For a full description of the algorithm, see section 15.2.3 of the Geneious Prime User Manual.

The overall results for the dataset, including the percentage of reads with no variant vs the percentage of reads with knockout variants are shown in the Description of the Contig document. For more detailed results, view the Info tab of the document.

Results can also be output in tabular format by viewing the Annotations tab above the viewer. Each representative sequence has a variant annotation (turned off by default in the sequence viewer) containing the statistics for that variant such as effect, frequency and variant bases. This information can be viewed in the Annotations table where it can be exported in .csv format.