- ProductsComprehensive suite of molecular biology and sequence analysis toolsCloud-based antibody screening, analysis and visualization solution
- Enterprise
- Resources
- Support
- Pricing
What is Sequence Alignment?
Sequence alignment is the process of comparing two or more DNA, RNA, or protein sequences to identify similarities between them. These similarities give hints into the functional regions of a protein (ex: active site in a protein), important structural characteristics of sequences (ex: hairpins in DNA, disulfide bonds in protein), and the evolutionary relationship between sequences. Sites with structural or functional significance are likely conserved between multiple species and sequences that are more aligned with one another often indicate relatedness (vs. sequences that do not align well). Aligning novel sequences with already known sequences can help researchers decipher the function and structure of these new sequences and understand their evolutionary relationships with existing sequences.
Beyond identifying conserved regions, sequence alignment has roles in:
- Building phylogenetic trees to understand the evolution of sequences
- Reference-based mapping to assemble sequences
- Verifying constructs generated from molecular cloning and mutagenesis experiments
Sequence alignments can either align two sequences (pairwise alignment) or several sequences (multiple sequence alignment). In both cases, sequence alignments are displayed using the following features. (Figure 1):
- Site positions: The sites of a sequence alignment are labeled across the sequence based on their position in the alignment.
- Sequences: Aligned sequences appear in rows with the identifier (ex: gene name, strain designation, species, etc.) to the left.
- Consensus sequence: This sequence is the representative sequence of the alignment and is based on the most frequent base or amino acid at that site across all sequences.
- Gaps: Gaps represent sites that have been deleted from the sequence in order to improve the overall alignment between sequences. They are represented by dashes.
- Identity: The identity is the percentage of bases or amino acids at a particular site that matches the consensus sequence. In Geneious Prime, green represents sites with 100% identity, green-brown represents sites with at least 30% and under 100% identity, and red represents sites with less than 30%.
Aligned sequences can also be displayed using a “sequence logo.” This is a graphical representation of all of the bases (or amino acids) at each location in the alignment (Figure 2). Each position in the sequence logo consists of a stack of bases or amino acids found at that position where the size of the letter reflects its frequency.
For example, in Figure 2, the large “T” at position 1 spans the entire height meaning that all sequences in the alignment had a T at position 1. On the other hand, the sequence logo at position 8 shows an A and C of different sizes to reflect their frequency in the alignment. Sequence logos help visually identify conserved regions between the sequences.
Types of Sequence Alignment
There are two types of alignments: pairwise alignments and multiple sequence alignments. Each type of alignment has their own methods and algorithm, but they both attempt to maximize the similarity between sequences by inserting gaps when necessary to improve overall alignment. Let’s take a look at each in more detail.
Pairwise Alignment
In pairwise alignment (Figure 3), you can choose whether the alignment software favors local alignment, where a specific region of the sequence is aligned, or global alignment, where the entirety of the sequence is aligned.
Examples of pairwise alignment algorithms
Needleman-Wunsch Algorithm
Select this algorithm for global sequence alignments. This algorithm is best when the query sequences are matched in size, and you expect them to be similar across the entire sequence length.
Smith-Waterman Algorithm
Select this algorithm for local sequence alignments where the query sequences are dissimilar globally but may contain smaller regions of similarities like motifs or other conserved regions.
While many pairwise alignment algorithms exist, they all score alignments based on matches and mismatches. For example, matches between pairs receive a positive value, while mismatches receive a negative value. The simplest method would use a “+1” for a match and a “-1” for a mismatch. More complex methods use a scoring system that accounts for the fact that substitutions, insertions, and deletions occur at different rates. Although gaps receive a negative value, they can be helpful to maximize the alignment of the rest of the sequence.
Pairwise Alignment Resources in Geneious Prime
➤ Pairwise Alignment: Video series covering how to use a dot plot to view regions of similarity and pairwise alignment of DNA sequences.
➤ Pairwise Alignment Tutorial: Written tutorial with example exercises to align DNA and protein sequences using dotplots and alignment algorithms.
➤ Instructions for Pairwise Alignment: Guide to performing pairwise alignments in Geneious Prime.
Multiple Sequence Alignment
Multiple sequence alignment (MSA) compares more than two sequences at a time. It can be used to determine the relationships between sequences and to infer any ancestral relationships. MSAs are more computationally complex than pairwise alignment and may require cloud computing resources for large datasets.
The most popular methods for MSA are progressive, pairwise alignment methods that do repeated pairwise alignments (ex: Feng and Doolittle). Progressive alignments are the backbone of many popular MSA algorithms including Clustal Omega, MUSCLE, MAFFT, and T-Coffee. These methods are efficient and enable the alignment of thousands of sequences at once.
In order to merge the most similar sequences with one another at each step, global pairwise alignments (ex: using the Needleman-Wunsch algorithm) of all sequences are done first to create a guide tree. This guide tree informs the order of the progressive pairwise alignment. The progressive alignment starts by aligning the two closest related sequences. Then, at each subsequent step, one of three things can happen: (1) aligning two closely related sequences together, (2) aligning a sequence with the multiple alignment in progress, or (3) aligning a pair of multiple alignments. What happens at each step is based on the branching pattern of the guide tree. This process repeats to form one multiple sequence alignment containing all of the sequences.
Other approaches to MSA include:
- Iterative methods: Begin with a suboptimal alignment that’s repeatedly modified to reach an optimal alignment.
- Dynamic programming: Examines all possible alignments before choosing the best one.
- Consensus methods: Combine the output from different multiple sequence alignments of the same sequences to determine the optimal alignment. Consensus methods help identify well-aligned regions based on the agreement between multiple alignments.
Multiple sequence alignment programs in Geneious Prime.
With the many approaches to MSA, there’s no one best algorithm. Each algorithm has its strengths and weaknesses and it’s highly likely that they produce different alignments from one another. When choosing a MSA algorithm, consider the characteristics of your sequences, such as length of sequences, number of sequences involved, and if there are any rearrangements within the sequences. Geneious Prime offers a variety of MSA algorithms either built-in or as a plug-in (Table 1).
To learn more about which method suits your sequences, read our Knowledge Base article
➤ Which multiple alignment algorithm should I use?
Program | Type of Algorithm | When to use |
Geneious Aligner | Progressive | Alignments involving fewer than 50 sequences Each sequence less than 1 kb in length |
MUSCLE Aligner | Iterative | Alignments of up to 1,000 sequences Not suitable for sequences with low homology N-terminal and C-terminal extensions |
Clustal Omega | Progressive | Alignments involving over 2,000 sequences Sequences with long, low homology N-terminal or C-terminal extensions Not suitable for sequences with large internal indels |
MAFFT | Progressive-iterative | Aligns up to 30,000 sequences Suitable for sequences with long, low homology N-terminal or C-terminal extensions Suitable for sequences with long internal gaps |
Mauve Whole Genome Aligner | Progressive | Suitable for sequences with large-scale rearrangements and inversions |
Multiple Sequence Alignment Resources in Geneious Prime
➤ Introduction to Multiple Alignments: Video on choosing an alignment algorithm and performing a MSA using MUSCLE.
➤ Practice Multiple Alignment: Exercise on building a multiple alignment of HIV and SIV sequences.
➤ Align Genomes with Mauve: Video series demonstrating whole genome alignment with the Mauve plugin.
➤ Aligning Bacterial Genomes with Mauve Tutorial: Written tutorial with example exercises on aligning of complete genomes and ordering of draft genomes against a reference.
➤ Instructions for Multiple Alignment: Guide to performing MSA in Geneious Prime.
Recommended Resources
A Brief Tour of Geneious Prime
Take look at the Geneious Prime interface with this brief tour video.
Geneious Prime Features
Geneious Prime puts industry-leading bioinformatics and molecular biology tools directly into researchers' hands.
Geneious Prime Knowledge Base
The most commonly asked questions about Geneious Prime installation, licensing, functionality and more.
Get Started with Geneious Prime
Start Your 30 Day Free Trial of Geneious Prime.
Geneious Academy
GUIDEUnderstanding Phylogenetics
Learn the basics of phylogenetics with this overview of phylogenetic treesm how they work and how to build them.
VIDEO SERIESMauve Alignment
Watch the series and learn how to use the Mauve plugin. Align genomes and convert alignments into standard alignments.
VIDEO SERIESMultiple Alignments
Watch how to perform multiple alignments and assess levels of homology among sequences before constructing a phylogeny.
TUTORIALPractice Pairwise Alignment
Use this practical exercise to align pairs of DNA and protein sequences using dot plots and alignment algorithms.
Get started with Geneious today