How to Map NGS Reads to a Reference and Call Variants

Learn how to map NGS reads to a known reference sequence, allowing you to identify genetic variants, SNPs, and coverage patterns in your sample.

Introduction

Map to Reference is a bioinformatics workflow that aligns sequencing reads to a known reference sequence, allowing you to identify genetic variants, SNPs, and coverage patterns in your sample. While this method works with first-generation Sanger sequencing, second-generation NGS (Illumina, Ion Torrent), and third-generation long-read technologies (PacBio, Nanopore), this tutorial focuses specifically on NGS data.

You'll learn the complete process—from preprocessing raw sequencing data to calling and filtering SNPs—using real Escherichia coli gene data. By the end, you'll understand how to prepare reads, perform accurate mapping, and extract meaningful variant information for downstream analysis.

Quick Reference: Map to Reference Workflow

Import & Pair Reads

Trim with BBDuk (Q20)

Map to Reference

Review Coverage & Consensus

Call SNPs

Filter by Coverage

Export Results

What You'll Need

Install the BBDuk Plugin

  1. Go to Tools → Plugins
  2. Find BBDuk Trimmer in the available plugins list
  3. Click Install

Sample Documents  - Download the example dataset containing:

  • yghJ Illumina reads (reads1 and reads2 fastq files)
  • Reference sequence (yghJ CDS divergent reference)

How Does Reference Mapping Work?

Reference mapping follows a structured workflow with three main stages:

1. Preprocessing - Prepare your reads for accurate mapping

2. Mapping - Align reads to your reference sequence

3. Variant Calling - Identify and filter SNPs

Watch the video or follow the steps below

Select a reference sequence, choose a mapping algorithm, set the sensitivity and map to reference.

>> Watch the full Map to Reference series

Step 1: Import and Pair Your NGS Reads

Why Pairing Matters

Paired-end sequencing generates forward and reverse reads from the same DNA fragment. Pairing them tells Geneious the expected distance between reads, which improves mapping accuracy and helps identify structural variants.

How to Pair Reads

Option A: Pair During Import (Recommended)

  1. Go to File → Import → From Multiple Files
  2. Select both forward and reverse fastq files
  3. Geneious will offer to pair them automatically.
    1. Select: Paired End (inward pointing) :: pairs of files
  4. Set the expected insert size (default: 500 bp for Illumina)
  5. Click OK

Option B: Pair After Import

  1. Import read files separately
  2. Select both lists in the document table
  3. Go to Sequence → Set Paired Reads
  4. Enter the insert size
  5. Click OK

Result: A single list of interlaced forward/reverse reads, each tagged with F or R.

Step 2: Trim Low-Quality Bases with BBDuk

Why Trim?

Low-quality bases at read ends cause:

  • Failed alignments
  • False positive SNP calls
  • Increased computation time

Trimming removes these problem areas while preserving high-quality data.

Understanding Quality Scores

BBDuk uses Phred Q scores. Higher Q values mean better accuracy:

Q Score

% Likelihood of correct call

Q10

90%

Q20

99%

Q30

99.9%

Recommendation: Use Q20 for Illumina data (99% likelihood threshold).

How to Trim with BBDuk

  1. Select your paired read list
  2. Go to Annotate & Predict → Trim using BBDuk
  3. Configure settings:
    • Trim adapters (uses Illumina presets)
    • Trim low quality → Set Both Ends and Minimum Quality: 20
    • Trim adapters based on paired read overhangs → Set Minimum overlap: 20
    • Discard short reads → Set Minimum length: 20
  4. Click OK

Result: A new trimmed read list ready for mapping.

Step 3: Map Reads to Your Reference

How to Run Map to Reference

  1. Select your trimmed reads and reference sequence (hold Shift to select both)
  2. Go to Align/Assemble → Map to Reference
  3. Verify the reference sequence is correctly identified
  4. Configure mapping parameters:

Method Panel:

  • Mapper: Geneious (recommended for most NGS data)
  • Sensitivity: Medium Sensitivity/Fast
  • Fine Tuning: Iterate up to 5 times

Trim Panel:

  • Select Do not trim (already trimmed with BBDuk)

Results Panel:

  • Save assembly report
  • Save contigs
  • ☐ Uncheck "Save in subfolder"
  1. Click OK

Processing time: A few minutes depending on data size.

Step 4: Explore Your Mapping Results

Understand the Contig View

Open your contig document to visualize how reads aligned to the reference.

Key settings to adjust:

  1. Advanced tab(Cog icon) → Check Vertically compress contig (displays reads in rows)
  2. General tab(House icon) → Set Colors to Paired Distance

What the Colors Mean

  • Green reads: Paired reads mapping at expected insert size
  • Yellow or Blue reads: Unusual insert sizes (potential structural variants)
  • Red reads: wrong direction

Check Insert Size Distribution

  1. Click the Insert sizes tab above the viewer
  2. Review the distribution histogram
  3. Most pairs should cluster around your expected insert size (e.g., 450-500 bp for 500 bp inserts)

Review the Consensus Sequence

Zoom in to see individual bases in the consensus sequence at the top of the contig.

Set the correct consensus threshold:

  1. Go to Display tab → Threshold
  2. Select Highest Quality 60% (recommended for NGS data with quality scores)

Why this matters: The "Highest Quality" setting uses quality scores to call the most accurate consensus. Other thresholds like "100% - Identical" can introduce false ambiguities from sequencing errors.

Assess Coverage Quality

View the coverage graph (blue line below consensus):

  1. Go to Graphs tab
  2. Enable Show graphs and Coverage

Identify low coverage regions:

  1. Go to Annotate and Predict → Find Low/High Coverage
  2. Configure:
    • Find regions with coverage below
    • Set Standard deviations from mean: 2
    • ✓ Check both Merge regions options
    • ☐ Uncheck High Coverage options
  3. Click OK
  4. Click Save to record annotations on the reference

Result: Low coverage regions are annotated—these should be excluded when calling SNPs.

Step 5: Call SNPs with the Variant Finder

How to Find Variants

  1. Select your contig document
  2. Go to Annotate and Predict → Find Variations/SNPs
  3. Configure settings:
  4. Keep default polymorphism detection parameters (filters out sequencing errors)
  5. Analyze effect of polymorphisms on translations
  6. Set Default Genetic Code: Bacterial (for this E. coli dataset)
  7. Expand "More Options"
  8. Don't find variations in annotations types: Coverage - Low
  9. Click OK

Result: A "Variants" annotation track appears on your reference sequence.

View Your SNPs

In the contig viewer:

  • SNPs appear as vertical yellow bars
  • Mouse over annotations to see details: base change, frequency, SNP type, protein effects

In table format:

  1. Click Annotations tab above the sequence viewer
  2. Set Type filter to Polymorphism
  3. Review columns: Polymorphism Type, Variant Frequency, Amino Acid Change, etc.
  4. Click Columns to customize which data appears
  5. Click Export table to save as CSV

Step 6: Filtering SNPs by average read quality

Why Filter by Average Quality?

Low-quality reads can lead to false positives or a misinterpretation of a true variant. Removing low-confidence calls that may be due to sequencing errors, improves the accuracy of downstream analyses. The parameters used when calling variants (step 5) will filter many false positives during variant calling. However, you may whish to perform futher filtering for quality assurance afterwards. Follow these steps to filter by any field available (e.g. average quality) on your polymorhpism annotations.

How to Filter Within the Annotations Tab

  1. Select your contig document
  2. Switch to the annotations tab
  3. Turn off other annotation types to only view the polymorhphism track
  4. Next to the filter window, click the down arrow to expand more options
  5. Try the filter Average Quality :: Greater than :: 30 and hit return (Tip: You can start typing within these dropdown to find fields quickly).

Notice that some of the variant annotations have turned grey (filtered). You can hide the filtered annotations with Ctrl+H or toggle this option within the filter you created.

Sort annotations into new tracks

For certain workflows, you may wish to sort annotations by certain parameters. Here we'll demonstrate by separating those high-quality variants into a new track.

With your filter applied:

  1. Select all annotations (within the annotations tab)
  2. Click "edit annotations"
  3. Edit the track name as appropriate (e.g. Filtered SNPs - Average Quality>30)
  4. Click OK

Result: A filtered annotation track with only high-confidence SNPs.

Which Mapping Algorithm Should You Use?

Geneious Prime offers multiple mapping algorithms. Within the Map to Reference settings window, click "Let us Help" under the Mapper dropdown to start an interactive decision tree to guide you to the suggested algorithm for your data.

Here's a summary on when to use each:

Geneious Mapper (Default)

Best for: Most NGS applications

Advantages:

  • Fast processing
  • High sensitivity
  • Iterative mode extends past reference ends and handles indels
  • Discovers structural variants
  • Supports circular genomes (maps correctly around the origin)
  • Works with soft-trimmed reads

Geneious for RNA-Seq

Best for: Mapping RNA reads to genomic references with introns (splice-aware)

Advantages:

  • Maps reads spanning annotated introns
  • Discovers novel introns
  • Identifies fusion genes

Disadvantage: Slower for novel discovery

When to Use Other Mappers

Install optional mapper plugins (Tools → Plugins) for specialized needs:

  • Minimap2: Ideal for long-read data (ONT, PacBio), Splice-aware alignment of PacBio or Nanopore cDNA or Direct RNA reads to a reference.
  • BBMap: Fast, highly sensitive
  • Bowtie2: Ultra-fast for well-characterized genomes, low memory usage
  • STAR: RNA-seq data, anotates splice variants

Troubleshooting Common Issues

Low Mapping Rate

  • Check: Did you pair reads correctly?
  • Check: Is your reference sequence correct?
  • Try: Increase sensitivity to Medium-High

Too Many False Positive SNPs

  • Check: Did you trim low-quality bases?
  • Check: Is coverage sufficient (typically recommend ≥20x)?
  • Try: Increase minimum variant frequency threshold

Slow Processing

  • Check: Is sensitivity set too high? (Use Medium for NGS)
  • Check: Did you trim first to reduce data size?
  • Try: Reduce iteration count in Fine Tuning

Best Practices Summary

Always pair reads before mapping (when dataset allows)- Improves accuracy and enables insert size analysis

Trim before mapping - Use Q20 for Illumina data to remove low-quality bases

Use appropriate sensitivity - Medium or Medium-Low for NGS saves time without sacrificing quality

Enable Fine Tuning - Iterative alignment improves results around indels

Filter by coverage - Exclude low-coverage regions to avoid false positives

Use quality-based consensus - "Highest Quality" threshold for NGS data with quality scores

FAQs

Q: Do I need to trim reads if my data is high quality?

Not necessarily. Hovever, even high-quality data has some low-quality bases at read ends. Trimming improves mapping accuracy and reduces false positive SNPs.

Q: What coverage depth do I need for reliable SNP calling?

Standard practice suggest a minimum 10x coverage, but 20-30x is recommended for confident variant calls. Higher coverage (50-100x) is needed for detecting rare variants.

Q: Can I map reads from different sequencing platforms together?

Not recommended. Different platforms (Illumina, Ion Torrent, PacBio) have different error profiles and should be mapped separately. Assemblies (i.e. de novo), however, can be improved with hybrid datasets (Nanopore & Illumina). Spades denovo assembler is especially suited for hybrid assemblies

Q: How do I know if my reference sequence is correct?

Check that most reads map (typically >80% for good data) and that coverage is relatively uniform. Large unmapped regions or uneven coverage suggests a reference mismatch.

Q: Should I use the Geneious mapper or a third-party mapper?

For most NGS applications, the Geneious mapper provides the best balance of speed, accuracy, and features. Use specialized mappers (minimap2, Bowtie2) only if you have specific requirements.

Recommended Resources

Plugin - BBDuk Trimmer

Download the plugin for quality trimming and filtering your sequences.

Which map to reference assembly algorithm is best for my data?

Advantages and disadvantages of different map to reference algorithms.

Manual for Map to Reference

A guide to using the map to reference tools in Geneious.

What function should I use?

The difference between Pairwise/Multiple alignment, de novo Assembly, and Map to Reference.

What to Learn Next

Use this practical exercise to perform a de novo assembly of short-read NGS data and assemble circular contigs.
Perform a reference assembly with next-generation sequencing (NGS) data and call SNPs on the assembled contig.
Calculate and compare normalized expression measures from RNA-Seq data using the Geneious expression analysis tool.
Learn the basics with this introductory guide to assembling DNA sequences with de novo and map to reference.
Get started with Geneious today