Transferring Annotations Tutorial

Learn how to quickly annotate a sequence by transferring annotations from related sequences. This tutorial provides instruction on three methods for transferring annotations between sequences.

DOWNLOAD THE TRANSFERRING ANNOTATIONS TUTORIAL

Tutorial Instructions

Geneious Prime tutorials are installed by either 'Dragging and dropping' the zip file into Geneious Prime or using File → Import → From File... in the Geneious Prime menu. Do not unzip the tutorial.

Transferring Annotations Between Sequences

Note: To complete the tutorial with the referenced data please download and install the tutorial above into Geneious Prime.

Annotations provide descriptive information about the structure and/or function of features within DNA, RNA or protein sequences. Annotations may sometimes be grouped into “tracks” and sometimes span multiple intervals, for example, when a CDS comprises multiple exons.

In Geneious Prime, annotations are displayed graphically in the Sequence Viewer panel where they can be edited and their display can be controlled via the Annotations & Tracks.  Annotations are depicted graphically as boxes or as arrows if the feature has a defined orientation.

Annotations are the equivalent of “Features” found in GenBank files. See our Knowledge Base post for more information on standard annotations/features used in Geneious and GenBank.

Geneious has numerous tools and third-party plugins for adding new annotations to sequences, most of which can be found under the Annotate and Predict menu. This tutorial focuses only on methods for transferring existing annotations between sequences.

Transferring annotations allows you to quickly annotate a sequence using features annotated on related sequences.

There are three ways to transfer annotations between sequences within Geneious, which can be summarized as being either:

  • by alignment – Using “Copy To…” – Useful for transferring individual annotations or for shorter alignments where homology is shared across the entire length of the alignment
  • by homology – Using “Annotate From” Using a custom feature database – useful for annotating unaligned sequences with a diverse selection of features
  • by homology – Using “Transfer Annotations” – Useful for transferring features to a reference or consensus sequence in an alignment or assembly, where homology is shared across some or all regions of the alignment

This tutorial provides exercises demonstrating how to use each of the above methods.

This tutorial requires the MAFFT alignment plugin. If you do not already have it installed, the go to Geneious Menu Tools → Plugins… and install the MAFFT Multiple Alignment tool from the list of Available Plugins.

Exercise 1: Transferring annotations using “Copy to…”
Exercise 2: Live annotation using “Annotate From”
Exercise 3: Live annotation using “Transfer Annotations”
Summary

Exercise 1: Transferring annotations by homology Using the “Copy to…” function

The Copy to… method for transferring annotations requires that you have an alignment or assembly of two or more homologous sequences that have differing annotations that you would like to transfer or combine.

This method does not compare or consider sequence similarity as it assumes accurate alignment of homologous features across the breadth of the alignment.

This method can be used to transfer individual annotations, groups of annotations or the entire annotation set from one sequence to any other sequence within the alignment, or to the consensus sequence of the alignment.

If the alignment retains links to the parent sequences used to generate the alignment, then you will be given the option to apply the transferred annotations to the parental sequences when you save the changes to the alignment.

Exercise – Annotating a mitochondrial genome sequence

In this exercise we will transfer annotations from a published annotated sequence of the emu mitochondrial genome to a “new” unannotated kiwi mitochondrial genome. The sequences are provided with this tutorial and are named Mitochondrion_Emu and Mitochondrion_Kiwi_1.

Select the two files from the document list, select the Sequence view panel, select the “General” tab, and make sure the option to display annotations is turned on.

You should see two sequences in the Sequence viewer panel, the emu sequence with annotations, and the kiwi sequence without annotations.

The first thing we need to do is to create an alignment of these sequences, as they differ by about 300 bp in length.

With the two files selected, click on Align/Assemble → Pairwise align, select the MAFFT aligner and click OK to align the two genomes. This will create an alignment file called Nucleotide Alignment. Select this file from the File list to view it. If you zoom in you will see that the sequences share high similarity across most of the alignment.

The next step is to perform the annotation transfer. We will transfer all annotations from the emu sequence to the kiwi sequence. To do this, right click (or Alt/CTRL-click) on the Mitochondrial_Emu sequence title, this will select the sequence and all annotations associated with this sequence, and display a contextual menu. From the menu, select Annotation → Copy all in selected region to → Mitochondrion_Kiwi_1.

Once you have “copied all” you should see all of the annotations now added to the kiwi sequence. Save the alignment. Because the alignment is linked back to the parental sequences, you should be given the option to “Apply the changes to the the parental sequences. Make sure you choose Yes to apply the changes to the Mitochondrion_Kiwi_1 sequence.

Note that if you had wanted to transfer only single feature, or a single class of feature (for instance only CDS’s), then right clicking on an individual feature will change the contextual menu options to allow you to do this.

If you select the Mitochondrion_Kiwi_1 file, zoom out if required, and you should see that it now contains all of the transferred annotations. Hovering the mouse over any of the annotations will show you details of the transferred annotations. For CDS annotations this includes an automatic translation of the region spanned by the annotation coordinates.

You may notice that source annotation has also transferred (the thick blue line labelled source Dromaius novaehollandiae – you may need to turn on display of Source annotations in the Annotations Tab to see this Annotation type). Double click on the blue Source annotation to edit the annotation and change the Name: to source Apteryx owenii, the binomial name for the kiwi. Before closing the Edit Annotations window, you should also click on Properties, then double click on the organism: property and change the property value to Apteryx owenii. Also, click on interval and edit the interval so that it covers the entire genome sequence (1-17,020 bp).

If you hover over the ND4 CDS annotation on the kiwi sequence (bases 10,240-11,613) you will see that the automatic translation includes two extra amino acids after a stop codon.

We will now correct this error.  Go to the Display tab and ensure that Translation is turned on, and the Frame: to display is set to By selection or annotation.

If you select the 3′ end of the ND4 CDS and zoom in using the Full Zoom tool, you will see the kiwi CDS actually terminates two codons earlier than the emu homolog. You may also notice that the codon for the stop codon is a non-standard AGA. This has been called because the transferred ND4 annotation contains information specifying the genetic code, and is using translation table 2 for vertebrate mitrochondrial genomes (See http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi for more information).

To correct the discrepancy between the automatic translation and the predicted translation, select the end of the CDS annotation feature and drag it so that it ends at the AGA stop codon. You should also drag and adjust the corresponding ND4 gene annotation.

This exercise has demonstrated how the Copy To: function allows you to rapidly transfer annotations between homologous sequences. It has also demonstrated that this method of transfer is only as good as the quality of the alignment. You should always double check the boundaries of all annotations that you have transferred to make sure they are correct.

Exercise 2: Transferring annotations by homology Using “Annotate From:” tool

The Annotate from Database tool is found on the Live Annotate and Predict tab associated with the Sequence Viewer panel.   This tool uses a Blast-like search to identify and annotate any features on your sequence that share homology with features found in a custom database. This tool can also be accessed from the Geneious Annotate & Predict menu.

The Annotate from Database tool, as the name suggests, requires that you create a database of the annotations that are likely to be found in your sequence. A database comprises a Geneious folder containing annotated sequences of interest. The database may contain sequences with multiple annotations, for instance a related genome, and/or annotated sequences that encode single features.

Geneious when first installed provides a Sample Documents folder which contains an example annotation database.  This folder is normally located within your Geneious database at

/Local/Sample Documents/PlasMapper features

If for any reason you do not have the sample documents in your local database you can download them by going to Geneious menu File → Import → Sample Documents.

The Plasmapper Features database contains plasmid-related annotated sequences and is derived from the database used by PlasMapper (see http://wishart.biology.ualberta.ca/PlasMapper/). This database can be used to annotate plasmids with many common features, including promoters, terminators, origins of replication (ori’s) and selection markers. Take a look at the sequences within the Plasmapper folder to see what a database can look like.

2A. Annotation of a plasmid sequence

In this exercise we will annotate an unannotated plasmid sequence using the Plasmapper folder as a database.

Select the unannotated plasmid sequence PROEXHTA, select the Live Annotation & Predict tab, turn on Annotate from…, and if the Source: folder is not already set to “PlasMapper Features, then click on the Source: Name and use the Select Feature folder window to navigate to

/Local/Sample Documents/Plasmapper Features

then click OK.

Straight away you should see a number of features appear on the plasmid sequence. These features appear because they share 100% similarity with annotated features present in the Plasmapper database.

Use the slider in the Annotate From… tab to decrease the % Similarity required for a match, drop it to 98% and you will see all of the major features of the plasmid appear. Click the Apply button to add the matched features to the plasmid, then save the document. That’s it, you’re done, you now have an annotated plasmid sequence.

2B. Annotation of a mitochondrial genome using a custom annotation database

In  Exercise 2B, as with Exercise 1, we will take an unannotated kiwi mitochondrial genome and this time annotate it using a simple database comprising only the emu mitochondrial genome.

Step 1: Creating your own annotation database

To create a database we first need to create a folder to hold our annotated sequence. Right click (or ALT/CTRL-click) on the /Local folder in your Geneious Sources list and chose the option for New Folder. Give your new folder an appropriate name (in this example we’ll use the name Emu database), then click OK.

Now copy the Mitochondrion_Emu file in this Tutorial folder and paste it into the new Emu database folder. That’s it, you now have a very simple annotation database.

Step 2: Annotating your kiwi sequence

Switch back to the tutorial folder and select the unannotated sequence file called Mitochondrion_Kiwi_2 located in the Annotation Tutorial folder. Select the Sequence View panel, and click on the Live Annotate and Predict tab .

To set the Emu database folder as an Annotation database, click on Source: and use the Select Feature Folder window that opens to navigate to the Emu Database folder.

Once you have specified the database the live annotation tool will go to work comparing your sequence to all annotated features found in sequences within the database folder. For large databases, a progress bar will appear showing that the live annotation search is in progress. Adjust the % Similarity slider downwards until no new features appear in the Sequence Viewer.  You should find that below about 45% similarity you will see that no new features appear on the kiwi sequence.

If you are happy that the majority of features have been identified, click the Apply button to permanently add the annotations to your sequence.

If you hover over any newly added annotation in the sequence viewer window, a yellow pop-up note will appear showing data relating to the annotation, including the Hit name, feature type, gene product function (if known) and a predicted translation if the feature is a coding sequence (CDS).

Note that the Find Annotation tool has also transferred the Source annotation from the emu file (colored blue). As for Exercise 1, you should edit the Source Annotation to specify Apteryx owenii as the source organism and as for exercise 1,  edit and correct the Feature organism: property and the Feature interval.

Once you have completed checking and editing the transferred annotations, Save the sequence.

In the yellow pop up you will also see the new annotation shows the “Transferred Translation” of the matching emu CDS. To delete the emu translations from all of your CDS annotations, select the Mitochondrion_Kiwi_2 file, click on the Annotations tab, and in the search field type CDS to display annotations of type CDS. Then click in the Annotation table and use command/control-A to select all, then select Edit Annotations. From this window, remove the Transferred Translation property.

Finally, as seen in Exercise 1, because these annotations are transferred based on shared homology with an annotated feature there may be errors in CDS and gene ranges due to slight differences in gene product sizes. Double check all of the newly annotated features to ensure the boundaries and translations make sense. Adjust the annotations ranges if required.

This exercise has demonstrated how the Annotate from: function allows you to rapidly transfer annotations to a sequence, based on the nucleotide similarity between the annotations and the sequence. In the next exercise we will use protein annotations instead of nucleotide annotations to annotate our sequence.

2C. Annotation using a protein database

The Annotate From… tool allows you transfer annotations from protein sequences. In this exercise we will use a list of annotated proteins as our annotation database. Select the list Mitochondrion_Emu_CDS to view the list of annotated proteins. If you zoom in you will see that each protein has a stop at the end. This is required for proper annotation of a complete CDS annotation. As above in exercise 2B step 1, create a new folder, this time call it Protein DB and place the Mitochondrion_Emu_CDS list in the new folder.

Next, select the unannotated sequence Mitochondrion_Kiwi_4 and go to the Live Annotation & Predict tab, check the option to Annotate from…, set the Source: folder to the new Protein DB folder, then hit the Advanced button and make sure the option for Translation Search is checked, then click Done.

The translation search translates the nucleotide sequence in all 6 frames for comparison to the protein sequences in the annotation database. Adjust the Similarity slider to ensure all matches are found, then hit Apply and Save to permanently add the CDS annotations to the sequence.

Exercise 3: Using the “Transfer Annotations” tool

The transfer annotations tool will appear in the Live annotate & Predict tab if you have selected an alignment file. This tool can also be accessed from the Geneious “Annotate & Predict” menu. This tool works in the same manner as the Annotate From: tool. However, instead of using a database of annotated regions, this tool uses annotated regions found within one or more of sequences within the alignment and compares and matches them to regions within the reference or consensus sequence. The reference sequence can be any sequence in your alignment, and can be defined by right clicking (Alt/Control-click) on a sequence and choosing Set as Reference Sequence.

In this exercise we will again transfer annotations from the emu mitochondrial genome to the kiwi mitochondrial genome.

The basic steps for using this tool.

1. Use an alignment tool to align the sequence you wish to annotate with a homologous sequence that contains the annotations you want to transfer.

2. Use “Set as reference sequence” to designate the sequence you wish to transfer annotations to.

3. Use the Transfer Annotations tool, apply and save.

Exercise 3 – Using Transfer Annotations

In this exercise, as with Exercise 1, we will transfer annotations from the published annotated sequence of the mitochondrial genome of the emu to a “new” unannotated genome of the kiwi.

The sequences are provided with this tutorial and are Named Mitochondrion_Emu and Mitochondrion_Kiwi_3. Select the Two files from the document list, select the Sequence view panel, select the General tab, and make sure the option to Display Annotations is turned on.

The first step before annotation transfer is to align the sequences.

With the two files selected, click on Align/Assemble → Pairwise align, select the MAFFT aligner and click OK to align the two genomes. This will create an alignment file called Nucleotide Alignment 2. Select this file to view it, then right click on the Mitochondrion_Kiwi_3 sequence and choose Set as Reference sequence.  Save your document.  If you are asked if you wish to apply changes to original sequences.  You should click Yes.

Now go to the Transfer Annotations tool that will have appeared in the Live Annotate and Predict panel.  Drag the slider to reduce the %Similarity stringency until all features are found.  Click Apply to permanently transfer the annotations to the reference sequence, then Save, to save the alignment, and apply the annotations to the parental sequence.

As with the earlier exercises, you should always double check the boundaries of the transferred annotations to make sure they are make sense, and if they are coding sequences, that the translation is correct.

Summary

The three Annotation-transfer methods described in this tutorial are all complementary, and often any one of the three methods could be used to do the same job with the same data-set.  If you worked through all exercises in this tutorial you will have seen that all three methods were used to transfer annotations between related mitochondrial genomes.

The tool you decide to use depends in part on whether you are working with closely or distantly related sequences and whether you have homologous annotated sequences suitable for transfer by alignment.

Be aware that generally the “Copy to…” method should only be used for wholesale transfer of annotations within an alignment if the sequences share homology across their entire length.  Please consult the Geneious Manual for further information on transfer of annotations.

If you want to add the genome to your database so that you can use it in Geneious, drag and drop the files you want to your Local folders. You can do this either before or after downloading the full document.