Finding CRISPR Sites Tutorial
Learn how to find and annotate CRISPR sites in a sequence and check against a genome for off-target binding sites.
The CRISPR/Cas9 system is an RNA-guided endonuclease technology for gene editing. This system requires guide RNA (gRNA) comprising a 20 bp target sequence adjacent to a PAM (Protospacer Adjacent Motif) site to direct the Cas9 endonuclease to the cleavage site. The Find CRISPR sites tool in Geneious Prime will search for gRNA (“CRISPR”) sites in the selected gene, and search for off-target binding sites in your genome of interest. Each CRISPR site in the selected sequence is scored according to how many off-target sites it potentially will bind to, and how similar the off-target sites are to the original sequence.
In this tutorial you will use the CRISPR tool in Geneious Prime to search for “GN(20)GG” guide RNA (gRNA or “CRISPR”) sites in the LYP1 gene from Saccharomyces cerevisiae (baker’s yeast) and check for non-target binding sites against the rest of the Saccharomyces cerevisiae genome.
You will be working with the LYP1 CDS from Saccharomyces cerevisiae S288c and using the S288c genome as the off-target database. LYP1, or Lysine permease, is found on chromosome 14 in the yeast genome. It is one of three amino acid permeases (Alp1p, Can1p, Lyp1p) responsible for uptake of cationic amino acids. For more information about this gene, click here.
To complete the tutorial yourself with included sequence data, download the tutorial and install it by dragging and dropping the zip file into Geneious Prime. Do not unzip the tutorial.
Exercise 1: Finding CRISPR sites and checking for off-target matches
In this exercise you will run the Find CRISPR Sites function to find “GN(20)GG” CRISPR sites on the LYP1 CDS sequence. For each site identified within LYP1, you will also search for non-target binding sites within the Saccharomyces cerevisiae genome. If a gRNA sequence binds to other regions of the genome, Cas9 will also affect the sequence in these regions, and this may have serious unintended consequences. For each CRISPR site, an Activity score, which predicts the activity at the target site, and a Specificity score, which is based on the presence of off-target binding sites elsewhere in the genome, is calculated.
Select the LYP1 CDS document and then go to Annotate and Predict→Find CRISPR sites.
First, reset the options to the default settings. To do this, click on the cog in the bottom left hand corner of the window and select Reset to defaults (this will be grayed out if the current settings are the default settings).
As we want to find CRISPR sites anywhere in this sequence, next to Find CRISPR Targets select Anywhere in sequence.
The Geneious Prime CRISPR tool can be used to find 3′ Cas9 sites, or 5′ Cpf1 sites. In the PAM Site Location selector, make sure 3′ (Cas9) is selected.
In the Motif panel, use the Target and PAM fields to specify the gRNA sequence to search for. If you wish to evaluate all potential CRISPR sites enter N(20) in the Target box. Or, if you have a particular sequence you want to check for offsite binding, you can enter the exact sequence in the Target box. In this tutorial, we are interested in the “GN(20)GG” guide sequence. The target sequence for this is GN(19) and the PAM sequence is NGG so in the Target field type “GN(19)” and in the PAM Site field type “NGG”. The preview underneath these fields shows what the guide sequence for this target and PAM sequence – ie GNNNNNNNNNNNNNNNNNNNNGG.
By default Geneious Prime will use the method of Doench et al. (2016) to score the activity of the CRISPR sites. Activity, or on-target scoring, models the sequence features of the gRNA site itself to predict activity. When you run the Doench et al. (2016) model for the first time, Geneious will install the required dependencies (python and R) prior to running the model so you may notice it take a little longer.
To score the sites based on their off-target binding, check Score against an off-target database under Specificity Scoring.
The Specificity Score is a measure of how many off-target sites the gRNA potentially will bind to, and how similar the off-target sites are to the original sequence. This is calculated according to the method developed by the Zhang lab at MIT (see here for details). Each off-target site is given a score based on how similar it is to the original CRISPR site and where any mismatches occur (mismatches near the PAM site will affect binding more than mismatches further away from the PAM site). A higher score for an off-target site indicates a higher similarity to the original CRISPR site (and thus a higher likelihood of the CRISPR/Cas complex binding to the off target). The overall specificity score for a CRISPR site is 100% minus a weighted sum of off-target scores in the target genome. Thus, a higher specificity score indicates a better CRISPR site with few or weak potential offsite targets.
The off-target database is typically the whole genome of your organsim of interest, but can include other sequences, for example, the targeting vector. You can make an off-target database by creating a new, empty folder in your Geneious database and importing the sequences you wish to use. As there are a wide variety of genomes researchers may wish to test against, genome sequences can be very large, and new versions of genome assemblies may be released, Geneious does not contain inbuilt copies of any full genome sequences. Genomes can be downloaded directly from NCBI using the NCBI folders at the bottom of the Sources panel. Commonly studied genomes (for example, the human, zebrafish and the rat genomes) can be downloaded from NCBI using the links found in the Genomes folder in the Geneious Sample Documents. Genomes can be also be downloaded from other sources and imported into Geneious using common file formats.
In this tutorial the off-target database will be the Saccharomyces cerevisiae S288c genome, and this has been provided for you in the Yeast genome subfolder within this tutorial. Click the folder icon in the Specificity Scoring panel and choose the Yeast genome folder from the folder selector.
Your settings should now look as in the screenshot below, and you can click OK to run the analysis.
Note: if we had chosen to look for sites within a selected subregion of a larger sequence, but had not selected an off target database for Specificity Scoring, Geneious would automatically test against the unselected region of the sequence for off-target binding.
Once the analysis has run, you’ll see the following message:
This message appears because the LYP1 CDS sequence we are searching for CRISPR sites in is also present in the off-target database, as it is part of the yeast genome. Matches within this region are ignored for off-target scoring as they are likely to be the guide site itself.
Click OK to clear this message, and you should now see a new annotation track containing CRISPR sites with the “GN(20)GG motif. There should be 41 CRISPR site annotations on this track.
The annotations on the track are colored according to their Activity Score. This score is a number between 0 and 1, with high numbers denoting higher expected activity. The annotation coloring is a gradient from red to green, with lower values in red and higher values green.
To change the coloring to reflect the Specificity Score, click on the down arrow next to the track name and choose Color by / Heatmap, then choose “Zhang (2013) Specificity Score” from the list. As with Activity scoring, lower scores are colored red and higher scores (denoting fewer or weaker offsite targets) are colored green. The Color by / Heatmap window also allows you to see and edit the values of the numerical scale used for the heatmap coloring.
Click the Save button to save the annotation track on the sequence.
To see the actual scores for each site, mouse over the annotation. You will then see a pop-up window with information about that site.
To view the scores in a tabular format, open the Annotations tab. To show only the CRISPR annotations in the annotations table, click Type and select CRISPR. Click Columns and tick #Off-target sites, Doench (2016) Specificity Score and Zhang (2013) Specificity score (these may already be selected). These columns should now be visible in the annotations table.
Clicking the name of a column in the annotations table will sort the rows of the table by the values in that column. A small triangle will appear next to the column name indicating whether the rows have been sorted from smallest to largest value or vice versa. Clicking the column name again will reverse the direction the rows are sorted. Sort the rows of the annotations table by lowest to highest Specificity Score.
From this table you’ll see that many CRISPR sites have specificity scores of 100%, meaning they have no off-target matches that fit the criteria of no indels and 3 or fewer mismatches with the guide RNA (however they may have additional off-target sites with more than 3 mismatches). CRISPR guide 1 has the lowest off-target score. Select this row in the table and return to the Sequence View. The “CRISPR guide 1” annotation should now be selected in the sequence view. Hold the mouse over the annotation so a popup window appears. This tool tip contains more information about this CRISPR site.
The Specificity score for this site is 83.33%. There is only one off-target binding site for this sequence in the Saccharomyces cerevisiae genome, but it is an exact match for the CRISPR guide so has an off-target score of 100%. This site is in the ALP1 CDS at position 136,903→136,925 of Chromosome 14.
For CRISPR guides with multiple off-target sites, only the top five will be listed in the tool tip. This information can also be viewed, sorted and exported from the Annotations tab.
Exercise 2: Finding paired CRISPR sites
One method for improving the specificity of CRISPR/Cas9 targeting is by using the mutant Cas9-D10A nickase with a pair of guide RNAs complementary to opposite strands of the target site. The two individual nicks on opposite strands simulate a double stranded break, which then leads to non-homologous end joining. Off-target interactions are minimized because any single off-target nick will be repaired with much higher fidelity by the base excision repair pathway.
In this exercise you will use Find CRISPR Sites to find paired sites on the LYP1 CDS. Select the LYP1 CDS document and go to Annotate and Predict→Find CRISPR sites.
Keep the same settings as for the previous exercise, and also tick the box next to Pair CRISPR sites. You can specify the maximum allowable overlap of the sites returned, and the maximum space allowed between the paired sites. The maximum overlap and maximum space between sites are measured from the 5′, PAM-distal end of the CRISPR sites. We will use the default options of 6 and 16 respectively.
This time we will color the CRISPR track by the paired score, so set Color CRISPR Sites by to Paired Score.
The dialog box should now look like the image below. Click OK to run the search.
After the analysis has run you should see a second CRISPR sites track containing three pairs of CRISPR sites.
When using the Pair CRISPR sites option, sites will only be annotated if they have at least one pair conforming to the maximum overlap and maximum space between sites settings. Sites which do not have a pair within these settings are not annotated.
Each pair of CRISPR sites has a combined “paired CRISPR score”. If you hold your mouse over one of the CRISPR annotations, you can see the paired CRISPR score for this site. This combined score is a mean of the specificity scores of each individual CRISPR site. The pairs of CRISPR sites with the highest combined scores are linked. Each CRISPR site will be linked to its highest scoring paired site, unless that second site has an even higher scored pairing with another site. The pair of sites are colored according to their paired score, instead of their individual CRISPR scores.