Transferring Annotations Between Sequences Tutorial
Learn how to quickly annotate a sequence by transferring annotations from related sequences. This tutorial provides instruction on three methods for transferring annotations between sequences.
Annotations provide descriptive information about the structure and/or function of features within DNA, RNA or protein sequences. Annotations may sometimes be grouped into “tracks” and can span multiple intervals, for example, when a CDS comprises multiple exons.
In Geneious Prime, annotations are displayed graphically in the Sequence Viewer panel where they can be edited. Their display can be controlled via the Annotations & Tracks tab. Annotations are depicted graphically as boxes or as arrows if the feature has a defined orientation.
Annotations are the equivalent of “Features” in GenBank files. See our Knowledge Base post for more information on standard annotations/features used in Geneious and GenBank.
Geneious Prime has numerous tools and external plugins for adding new annotations to sequences, most of which can be found under the Annotate and Predict menu. This tutorial focuses only on methods for transferring existing annotations between sequences.
Transferring annotations allows you to quickly annotate a sequence using features already annotated on related sequences.
There are three ways to transfer annotations between sequences within Geneious Prime, which can be summarized as being either:
- by homology – Using “Annotate from Database” Using a custom database – useful for annotating unaligned sequences with a diverse selection of annotations
- by homology – Using “Transfer Annotations” – Useful for transferring annotations to a reference or consensus sequence in an alignment or assembly, where homology is shared across some or all regions of the alignment
- by alignment – Using “Copy To…” – Useful for transferring individual annotations or for shorter alignments where homology is shared across the entire length of the alignment
This tutorial provides exercises demonstrating how to use each of the above methods. If you are unsure about which method to use then we recommend you use the Annotate from Database tool.
This tutorial requires the MAFFT alignment plugin. To install, go to Menu Tools → Plugins… and choose MAFFT Multiple Alignment from the list of Available Plugins.
To complete the tutorial yourself with included sequence data, download the tutorial and install it by dragging and dropping the zip file into Geneious Prime. Do not unzip the tutorial.
Exercise 1: Transferring annotations Using the “Annotate from Database” tool
The Annotate from Database tool, as the name suggests, requires that you create a database of annotations that are likely to be found in your sequence. A database comprises a folder containing annotated sequences of interest. The database may contain a sequence with multiple annotations, for instance a related genome, and/or annotated sequences that encode single features.
In Geneious Prime 2019.2 onwards, unannotated sequences can also be used in the Source folder, and Geneious will treat these as though they have a single misc_feature annotation across their entire length.
Improvements in Geneious Prime 2020 and later
In Geneious Prime 2020 and later the Annotate from database tool includes a new Adjust CDS boundaries option which allows transferred CDS annotation boundaries to be automatically adjusted to match, within specified limits, the start and stop codons on a corresponding predicted ORF. This is on by default and will be used in the exercise below. Users of older versions of Geneious will need to check and if required, manually adjust CDS ends (see Exercise 2).
Annotation of a sequence using an annotation database
In this exercise we will take an unannotated Kiwi mitochondrial genome and annotate it using a simple database comprising an annotated Emu mitochondrial genome sequence.
Step 1: Create an annotation database
To create a database, first create a new folder. Right click (or ALT/CTRL-click) on the Local folder in the Sources panel and choose New Folder. Name the new folder Emu database.
Next, copy the Mitochondrion_Emu file in the Transferring Annotations folder and paste it into the new Emu database folder. That’s it, you now have a very simple annotation database.
Step 2: Annotating your Kiwi sequence
Switch to the Transferring Annotations folder and select the unannotated sequence file called Mitochondrion_Kiwi_1. In the Sequence View panel, click on the Live Annotate and Predict tab, and then check the Annotate from… box.
To set the Emu database folder as an Annotation database, click on Source: and use the Select Feature Folder window that opens to locate and select the Emu database folder.
Once you have specified the database the Annotate From Database tool will begin comparing your sequence to all annotated features found on sequences within the database folder. For large databases, or large target sequences, a progress bar will appear showing that the “live” annotation search is in progress. Adjust the Similarity: slider. You should find that below about 50% similarity no new features appear on the Kiwi sequence.
If you are happy that the majority of features have been identified, click the Apply button to add the annotations to your sequence.
If you hover over any newly added annotation in the sequence viewer window, a yellow pop-up note will appear showing data relating to the annotation, including the Hit name, feature type, gene product function (if known), percentage similarity, and a predicted translation if the feature is a coding sequence (CDS).
Once you have completed checking and editing the transferred annotations, Save the sequence.
You may notice that Source annotation has also transferred the blue annotation labelled source Dromaius novaehollandiae (you may need to turn on display of Source Annotations in the Annotations Tab). Double click on the blue Source annotation to edit it and change the Name: to source Apteryx owenii, the binomial name for the kiwi. Before closing the Edit Annotations window, you should also click on Properties, then double click on the organism: property and change the property value to Apteryx owenii. Also, click on interval and edit the interval so that it covers the entire genome sequence (1-17,020 bp).
If you hover over a CDS then in the yellow pop up you will also see the annotation has a “Transferred Translation” qualifier. If you plan to submit your newly annotated sequence to GenBank then you should delete the “Transferred Translations” qualifiers from all CDS annotations. To do this, select the Mitochondrion_Kiwi_1 file, click on the Annotations tab, set Type to CDS, select all CDS listed in the Annotations table, then click Edit Annotations. In Properties select Transferred Translation, then hit Remove to remove the Transferred Translation qualifiers.
Demonstration of automatic CDS end adjustment
In the above exercise all CDS annotations were automatically adjusted (when required) to match adjacent in-frame start and stop codons.
To demonstrate how CDS end adjustment works, we will turn off the option and repeat the transfer annotation step.
Select the Mitochondrion_Kiwi_1 file, go menu Edit → Go to Base, set Position: to 11607 and hit Go. This will zoom you into the end of the ND4 CDS.
Then click on the Live Annotate and Predict tab, check the Annotate from… box, hit the Advanced button and uncheck the option to Adjust CDS boundaries.
Adjust the Similarity: slider if required and in the Sequence Viewer you will now see a new uncorrected transferred CDS (marked below). This new uncorrected CDS does not terminate on a stop codon and is longer than the corrected CDS due to the Emu ND4 protein being three AA (9 bp) longer than the likely Kiwi ND4. In most cases you should keep the Adjust CDS boundaries option checked to ensure you transfer valid CDS annotations.
This exercise has demonstrated how the Annotate from Database: function allows you to rapidly transfer annotations based on the nucleotide similarity between DNA sequences. In the next exercise we will use protein annotations instead of nucleotide annotations to annotate our sequence.
Annotation using a protein database
The Annotate From… tool also allows you transfer annotations from protein sequences. In this exercise we will use a list of annotated proteins as our annotation database. Select the list Mitochondrion_Emu_CDS to view the list of annotated proteins. If you zoom in you will see that each protein has a stop at the end. This is required for proper annotation of a complete CDS annotation. As above, create a new folder, this time call it Emu Protein database, and place the Mitochondrion_Emu_CDS list in the new folder.
Next, select the unannotated sequence Mitochondrion_Kiwi_2 and go to the Live Annotation & Predict tab, check the option to Annotate from…, set the Source: folder to the new Emu Protein database folder, then hit the Advanced button and make sure the option for Translation Search is checked, then click Done.
The Translation search option will translate the nucleotide sequence in all 6 frames for comparison to the protein sequences in the annotation database. Adjust the Similarity slider to ensure all matches are found, then hit Apply and Save to permanently add the CDS annotations to the Mitochondrion_Kiwi_2 sequence.
Exercise 2: Using the “Transfer Annotations” tool
The Transfer Annotations tool will appear in the Live annotate & Predict tab if you have selected an alignment file. This tool can also be accessed from the Annotate & Predict menu.
The Transfer Annotations tool works in a similar manner as the Annotate From database tool. However, instead of using a database of annotated sequences, this tool uses annotated regions found within one or more sequences in the alignment and compares and matches them to regions within the reference or consensus sequence. The reference sequence can be any sequence in your alignment, and can be defined by right clicking (Alt/Control-click) on a sequence and choosing Set as Reference Sequence.
Note that the Transfer Annotations tool does not detect and adjust ends of CDS annotations to align with actual start and stop codons. We recommend you use the Annotate From Database tool if you are transferring CDS annotations.
In this exercise we will again transfer annotations from the emu mitochondrial genome to the kiwi mitochondrial genome.
The basic steps for using this tool.
1. Perform an multiple alignment to align the sequence you wish to annotate with a homologous sequence that contains the annotations you want to transfer.
2. Use Set as reference sequence to designate the sequence you wish to transfer annotations to.
3. Use the Transfer Annotations tool, then hit Apply and Save.
Using Transfer Annotations
In this exercise, as with Exercise 1, we will transfer annotations from the published annotated sequence of the mitochondrial genome of the emu to a “new” unannotated genome of the kiwi.
Select the Mitochondrion_Emu and Mitochondrion_Kiwi_3 files.
In the Sequence View panel, select the General tab, and make sure the option to Display Annotations is checked.
The first step before annotation transfer is to align the sequences.
With the two files selected, on the Toolbar click on Align/Assemble → Pairwise align, select the MAFFT aligner and click OK to align the two genomes. This will create an alignment file called Nucleotide Alignment. Select this file to view it, then right click on the Mitochondrion_Kiwi_3 sequence and choose Set as Reference sequence. Save your document. When you are asked if you wish to apply changes to original sequences. You should click Yes.
Now select the Live Annotate and Predict panel and click the Check box to turn on the Transfer Annotations tool. Drag the slider to reduce the %Similarity stringency until all features are found. Click Apply to permanently transfer the annotations to the reference sequence, then Save, to save the alignment, and apply the annotations to the parental sequence.
If you use the Transfer Annotations tool to transfer CDS annotations then you should always manually check the boundaries of the transferred CDS annotations to make sure the translation frame is correct and that the CDS begins and ends with appropriate start and stop codons.
Manually correcting CDS boundaries
If you hover over the ND4 CDS annotation on the Kiwi sequence (bases 10,240-11,613) you will see that the automatic translation includes two extra amino acids after a stop codon. This indicates that the Kiwi ND4 CDS is three codons (9 bp) shorter than the Emu ND4 CDS.
To manually fix the Kiwi ND4 CDS boundary, go to the Display tab and ensure that Translation is turned on, and the Frame: to display is set to By selection or annotation.
If you select the 3′ end of the ND4 CDS and zoom in using the Full Zoom tool, you will confirm the Kiwi CDS terminates two codons earlier than the emu homolog. You may also notice that the codon for the stop codon is a non-standard AGA. This has been called because the transferred ND4 annotation contains information specifying the genetic code, and is using translation table 2 for vertebrate mitrochondrial genomes (See http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi for more information).
To correct the discrepancy between the automatic translation and the predicted translation, select the end of the CDS annotation feature and drag it so that it ends correctly at the AGA stop codon. You should also drag and adjust the corresponding ND4 gene annotation.
Exercise 3: Transferring annotations between aligned sequences Using the “Copy to…” function”
The Copy to… method for transferring annotations requires that you have an alignment or assembly of two or more homologous sequences that have differing annotations that you would like to transfer or combine.
This method does not compare or consider sequence similarity as it assumes accurate alignment of homologous features across the breadth of the alignment.
This method can be used to transfer individual annotations, groups of annotations or the entire annotation set from one sequence to any other sequence within the alignment, or to the consensus sequence of the alignment.
If the alignment retains links to the parent sequences used to generate the alignment, then you will be given the option to apply the transferred annotations to the parental sequences when you save the changes to the alignment.
Using “Copy To…” to transfer annotations
As with the previous two exercises, in this exercise we will transfer annotations from the published annotated sequence of the mitochondrial genome of the emu to a “new” unannotated genome of the kiwi.
Select the two files from the document list, select the Sequence view panel, select the General tab, and make sure the option to display annotations is turned on.
You should see two sequences in the Sequence viewer panel, the emu sequence with annotations, and the Kiwi sequence without annotations.
Firstly we need to align the sequences. With the two files selected, on the Toolbar go Align/Assemble → Pairwise align, select the MAFFT aligner and click OK to align the two genomes. This will create an alignment file called Nucleotide alignment 2. Select this file from the File list to view it. If you zoom in you will see that the sequences share high similarity across most of the alignment.
The next step is to transfer all annotations from the emu sequence to the Kiwi sequence. To do this, right click (or Alt/CTRL-click) on the Mitochondrial_Emu sequence title, this will select the sequence and all annotations associated with this sequence, and display a contextual menu. From the menu, select Annotation → Copy all in selected region to → Mitochondrion_Kiwi_4.
Once you have “copied all” you should see all of the annotations now added to the Kiwi sequence. Save the alignment. Because the alignment is linked back to the parental sequences, you should be given the option to Apply the changes to the original (unaligned) sequence. Make sure you choose Yes to apply the changes to the original Mitochondrion_Kiwi_4 sequence.
Note that if you want to transfer only single feature, or a single class of feature (for instance only CDS’s), then right clicking on an individual feature will change the contextual menu options to allow you to do this.
If you now select the Mitochondrion_Kiwi_4 file, zoom out if required, then you should see that it now contains all of the transferred annotations. Hovering the mouse over any of the annotations will show you details of the transferred annotations. For CDS annotations this includes an automatic translation of the region spanned by the annotation coordinates. As with the Transfer Annotations tool, you should always manually check the boundaries of the transferred CDS annotations and adjust them if required.
Note that the Source annotation has also been transfered from the emu file (coloured blue). As for Exercise 1, you should edit the Source Annotation to specify Apteryx owenii as the source organism and as for Exercise 1, edit and correct the Feature organism: property and the Feature interval.
This exercise has demonstrated how the Copy To: function allows you to rapidly transfer annotations between aligned sequences.
The three Annotation-transfer methods described in this tutorial are all complementary, and often any one of the three methods could be used to do the same job with the same data-set. If you worked through all exercises in this tutorial you will have seen that all three methods were used to transfer annotations between related mitochondrial genomes.
The method you decide to use depends in part on whether you are working with closely or distantly related sequences and whether you have homologous annotated sequences suitable for transfer by alignment.
Be aware that generally the “Copy to…” method should only be used for wholesale transfer of annotations within an alignment if the sequences share homology across their entire length. Consult Chapter 8 of the Geneious Manual for further information on the transfer of annotations.