Aligning Bacterial Genomes with Mauve
Learn how to align bacterial genomes using the Mauve plugin for Geneious. This tutorial covers alignment of complete genomes and ordering of draft genomes against a reference.
Complete the tutorial yourself with included sequence data. Download the tutorial then install by dragging and dropping the zip file into Geneious Prime. Do not unzip the tutorial.
This tutorial covers the use of the Mauve whole genome aligner in Geneious Prime. You will learn to perform a basic alignment of complete bacterial genomes, order a draft genome against a reference, work with the Mauve viewer, and convert a Mauve alignment into a standard alignment for downstream analysis.
Mauve is developed and maintained by Aaron Darling at the University of Technology, Sydney. The program is designed for constructing multiple genome alignments in the presence of large-scale evolutionary events such as rearrangement and inversion. For further information about Mauve including a full user guide, please see the Mauve website.
This tutorial requires the Mauve plugin version 1.1.1 or above, running in Geneious 11.1, Geneious Prime 2019 or above. To install the Mauve plugin, go to Tools→Plugins, select Mauve from the list of Available Plugins and click Install. You will then need to restart Geneious before you can use the plugin.
Exercise 1: Alignment of complete bacterial genomes with progressiveMauve
In this exercise you will create an alignment of 3 Mycobacterium genomes. Holding down shift, select NC_015758, NC_012943 and NC_009565. These are complete genomes downloaded from NCBI.
Then go to Align/Assemble → Align Whole Genomes . Ensure the Mauve Genome tab is selected, and the algorithm is set to progressiveMauve. All other settings should be kept at the default settings. For a full description of the parameters, see here. Click OK to run the alignment.
Note: in this example we are using complete genomes where each genome is a single contiguous sequence. It is also possible to align draft genomes, where each genome is represented by a list of contigs. If one or more genomes are represented by a sequence list, and progressiveMauve is chosen as the algorithm, Geneious will concatenate the contigs from each genome into a single sequence for the alignment. This option will not order contigs, and users should be aware that incorrectly ordered contigs will appear as genome rearrangements in the Mauve viewer. If you wish to align only two draft genomes, or align a draft genome against a complete genome, it is best to use the MCM algorithm, which is covered in Exercise 2.
The alignment may take a while to run. When it has finished a new Mauve alignment document will appear in your Document Table. Select the alignment and load it into the viewer.
Mauve Genome Alignment View
Mauve alignments have a special viewer which enables you to see genome rearrangements and locally aligned blocks at a glance. Each sequence is represented by one horizontal panel of blocks. Each colored block represents a region of sequence that aligns to part of another genome, and is presumably homologous and free from internal rearrangements. These are called LCBs (locally collinear blocks). The colored blocks in each genome are connected by vertical lines. For a full description of the Mauve viewer, please see the Mauve user guide.
Your alignment should look as in the screenshot below:
In this example you can see there 3 LCBs denoted by the different colors. In the middle LCB, the block in sequence NC_012943 is below the line. This indicates that this region is inverted with respect to the other two sequences.
The Mauve viewer has its own controls for zooming in and out, and scrolling to the left and right above the viewer. Try zooming in by clicking the ‘zoom in’ button 3 or 4 times. If your input sequences contained CDS, tRNA, rRNA or misc_RNA annotations, these should become visible as square blocks below the colored LCB block.
As you mouse over the colored blocks on any sequence in the alignment, you will see rectangular boxes showing the aligned region in the the other sequences. You can click on the alignment to center it on any given region.
To reset the alignment view at any time, click the home button.
The LCB Weight slider has the effect of changing the resolution at which local collinear blocks are determined. The LCB weight sets the minimum number of matching nucleotides for a collinear region to be considered as having true homology rather than random similarity. When “automatically calculate the minimum LCB score” is checked in the Mauve setup options, an LCB weight of 3 times the seed size will be used for the alignment, and this will be set as the lowest value in the LCB weight slider. Try sliding the LCB weight up and see what effect is has on the LCB blocks.
To view the aligned sequences in a view that looks similar to the regular Geneious alignment viewer, click the Alignment View tab above the sequence viewer. This displays the alignment for one LCB at a time. To choose which LCB to display, use the drop down Alignment chooser in the Display tab next to the viewer (circled in the screenshot below).
Try changing the Alignment View to LCB 2. This is the middle block where the NC_012943 was inverted. Check the arrow directions on the Source annotations in this view and you will see that NC_012943 is in the opposite orientation to the other two sequences in the alignment. If you cannot see the blue Source annotations, make sure the annotation type “Source” is checked in the Annotations and Tracks panel.
This Alignment view works the same way as the regular Geneious alignment viewer, and has the same controls for zooming, sequence display and annotations. However, only annotations can be edited in this viewer, not the sequence itself. If you wish to edit the sequence you must create an editable copy, which will be a standard Geneious alignment document (you will be prompted to do this if you click Allow Editing).
Mauve alignment documents cannot be used for downstream processes such as tree building. You will need to extract the Mauve regions to a standard Geneious alignment document following the instructions in Exercise 3 of this tutorial in order to use it in downstream operations.
Exercise 2: Aligning draft genomes with the Mauve contig mover
The Mauve Contig Mover (MCM) algorithm will align a draft genome to a reference sequence, ordering the contigs in the draft genome according the their position along the reference genome. This will be chosen as the default algorithm if two genomes are selected for alignment, and one or both of them are a list of sequences. The reference genome will be automatically determined if one of the two files selected is a single sequence, as in this example. If both files are a sequence list, you will need to choose the reference genome from the drop down list.
Select the draft genome NZ_MRBH000000000, and the reference genome NC_009565. Go to Then go to Align/Assemble → Align Whole Genomes. In the Mauve options, change the alignment algorithm to MCM algorithm if it is not already set on this. Make sure the option to Save ordered contigs is checked. Leave the other settings at their defaults and click OK to start the analysis.
This outputs two documents: a Mauve genome alignment, and a sequence list containing the draft genome contigs sorted according to order and orientation that they appear in the alignment.
Open the Mauve alignment document. You will notice that this alignment has more LCBs than the previous whole genome alignment from Exercise 1. The red vertical bars on the NZ_MRBH00000000 sequence denote the boundaries of the individual contigs in this genome.
Use the zoom controls above the viewer to inspect some of the LCBs more closely. Click the Zoom In button a few times, and then use the Shift Left button to move to approximately position 300,000 in NC_009565. You should see a large light blue block that has several red vertical lines on the lower block denoting contig boundaries. Right click on the block and choose View LCB alignment. In the Alignment View you can see that the NZ_MRBH000000000 sequence is comprised of 35 concatenated contigs from the draft genome. Individual contigs are denoted by the “Accession” annotation in green.
Now try repeating the same alignment with the progressiveMauve algorithm. You will see that this alignment has many more LCBs and rearrangements than the MCM alignment. This is because when draft genomes are aligned with progressiveMauve, Geneious concatenates the contigs into a single sequence, in the order they appear in the sequence list, prior to performing the alignment. As this concatenation step is unlikely to order the contigs correctly, the minimum number of LCBs in the alignment is probably going to correspond to the number of contigs in your list (unless adjacent contigs in the list happen to be in the correct order). However, with the MCM alogrithm the contigs are ordered and one LCB may often contain multiple contigs, so the alignment will be much cleaner with fewer LCBs.
For this reason you should always use the MCM algorithm for pairwise comparisons of draft genomes.
Exercise 3: Converting a Mauve alignment into a standard alignment
The Mauve alignment document is not a standard alignment document and downstream applications such as phylogenetic tree building cannot be run directly from this document. Thus, it is sometimes necessary to convert a Mauve alignment into a standard Geneious alignment. This involves extracting each LCB alignment, then concatenating them to make a single alignment document.
To do this, select the alignment of NC_015758, NC_012943 and NC_009565 you created in Exercise 1. Then go to Tools → Extract Mauve Regions. In this example we only want to extract the aligned regions of the 3 sequences, and not the small regions of unaligned sequence, so we will set the Minimum number of sequences to 3 and the Maximum number of sequences to 3.
You should now see 3 new alignment documents, one for each LCB, in your document table. In order to concatenate these documents into one alignment, select all three and go to Tools → Concatenate Sequences or Alignments.
When concatenating alignments, you can either match the sequences within each alignment by name, or use their position in the alignment (“index”) to determine which sequences to concatenate. In the 3 alignments you have extracted from Mauve, you will see that the 3 sequences are in the same order in each alignment, but their names are slightly different (i.e. they have the base numbering appended to the original name). Thus, we need to use “concatenate by index in alignment” in order to concatenate the alignments correctly.
Use the options as shown in the screenshot above and click OK. A new alignment document comprised of the 3 original LCB alignments will be created. On this alignment you can see the boundaries of the original LCB alignments by turning on the Source annotations.
You may now wish to use Batch Rename to shorten the sequence names in the alignment, as they will be very long. For example the names will be something like “NC_009565 (bases 1 to 936018) – NC_009565 (bases 3492134 to 936021) – NC_009565 (bases 3492476 to 4424434)”. This can be shortened to NC_009565 by going to Edit → Batch Rename, and removing 96 characters from the end of the name.
Note that if one of your genomes is rearranged compared with others in the alignment, the order the sequences are concatenated in will be incorrect for that genome, and information on the rearrangement will be lost. You should be aware of this when using the alignment for downstream applications like tree building.