Accessing GenBank Tutorial
Learn how to access information stored in the GenBank database through the Geneious interface, including downloading nucleotide sequences, taxonomic information and publications, and running simple BLAST searches.
Written by Dr Mike Bunce (Murdoch University, Australia) and the Geneious team.
Once DNA has been sequenced it is deposited in a sequence database. The main databases which form the International Nucleotide Sequence Database Collaboration are GenBank, administered by the National Centre for Biotechnology Information (NCBI) in the USA (http://www.ncbi.nih.gov/GenBank), the European Molecular Biology Organisation (EMBO) database in Europe (http://www.ebi.ac.uk/), and the National DNA Databank of Japan (NDDBJ). These databases exchange information daily. For protein sequences the principal database is the SWISS-PROT database (http://us.expasy.org/sprot/). This also exchanges with GenBank, NDDBJ and EMBO.
Geneious provides a user-friendly interface into GenBank so that information is retrieved intuitively and visually, and can be easily integrated into your existing database.
In this tutorial, you will carry out a basic BLAST search of GenBank to identify a mystery sequence, and learn how to access information from a number of different GenBank databases
To complete the tutorial yourself with included sequence data, download the tutorial and install it by dragging and dropping the zip file into Geneious Prime. Do not unzip the tutorial.
Exercise 1: Carrying out a BLAST search of the GenBank Database
BLAST can be used to search sequence databases to find those sequences that are most similar to any given query sequence. BLAST stands for Basic Local Alignment Search Tool. This is particularly useful if you have an unknown DNA sequence and want to find out what it may code for. It is also useful in taxonomy if you want to find the most closely related species. BLAST searches can be carried out with either nucleotide sequences (blastn) or protein (blastp) sequences.
Select the ‘unknown sequence’ file, then click the BLAST button. Select the Nucleotide Collection (nr/nt) database and choose the blastn program, then click the search button on the right. This will BLAST to the whole GenBank database (excluding EST, STS, GSS, WGS, and TSA). More specific NCBI databases are available under the database chooser.
Like all databases if many people are accessing it simultaneously then output can be slow. Be patient. An estimate of the approximate search time will appear just below the toolbar. Once the search is complete the results will appear in the Document Table. Sort the results by the “E Value” column, from lowest to highest E Value, by clicking on the column header (if it is already sorted by this column you will see a triangle next to the column name). Then click on the first (or top) GenBank “hit” to display the result in the Alignment View below.
The viewer shows the query aligned to the hit. Note that in this case there is an exact match, AF483338. Click on the Query Centric View tab above the document table to see all the hits aligned to the query.
Now click back to the Hit table, select the top match and click on Download Full Sequences. This will download the complete GenBank sequence for the hit. The full sequence is displayed in the Sequence View tab, with the region that corresponds to the BLAST hit annotated on it. You can still also display the BLAST alignment by clicking on the Alignment View tab.
For a more detailed tutorial on BLAST, please see our BLAST Searching tutorial.
Exercise 2: Investigating the taxonomy of the organism
In the previous exercise, the top hit was from Raphus cucullatus. In this exercise we will explore the taxonomy of this organism.
Click on NCBI/Taxonomy. This is located near the bottom of the Sources panel (this is the panel on the left hand side of Geneious).
Enter Raphus cucullatus in the Search box and click the Search button. This search returns the entire taxonomic lineage of the species (phylum, order, family etc.), as well as its common name.
Click on the Lineage (full) link in the viewer to go to the taxonomy information on the NCBI website. Under the LinkOut heading, click on the species name next to the Encyclopedia of Life. This link will export you to information about the species in question.
Exercise 3: Searching GenBank for further sequences and papers from the same species
Click on NCBI/Nucleotide (located in the Sources panel on the left) and search for additional sequences from Raphus cucullatus.
The results contained in the NCBI search folders are only temporary and will be deleted when you run a new search or close Geneious. To add the sequences to your database, you must drag and drop them from the search folders to one of your Local folders in the Sources panel.
Select the cytochrome B sequence and then click on the Text View tab above the sequence viewer (this changes the view to the text GenBank record).
Under the text view tab you will notice a publication is listed – this is the original paper that described this GenBank sequence.
The authors of this paper deposited the sequence on GenBank. When you publish a DNA sequence it is a requirement to deposit the DNA sequences onto GenBank so that other researchers can access them. Read the first paragraph of the paper – it will give you a little perspective on why researchers conducted this research.
Now click on NCBI/Pubmed (located in the left hand Sources panel) and search for the name of the first author of the Dodo paper (Beth Shapiro) to see what other papers she has published.
Pubmed is one of many online databases that records literature published in scientific journals. The results returned in Geneious give a link to the abstract of the publication on the journal website, as well as a Google Scholar link which may show other copies of the paper.
To download articles from the NCBI/Pubmed search in a format that can be read by the Endnote citation software, select the papers you want and go to File->Export->Selected Documents, choosing “Endnote” as the format. To get a format that can be read by other citation software packages including Latex, click the Bibtex tab above the sequence view, and copy and paste the contents of the viewer into a text file.
Exercise 4: Genome searching
So far in this exercise you have only dealt with single genes. However, many whole genomes have now been sequenced and most of these are available on GenBank. The Genome database contains genomes from all types of organisms, from viruses, bacteria, through to large eukaryotes like human. It also contains organelle genomes such as mitochondria.
Click on NCBI/Genome. In the search box type “Anolis” and execute the search. This will download the genome of Anolis carolinensis, the green anole lizard. You will see that the document icons are faded out – this means that the documents returned are summary documents and do not contain any sequence. To get the sequence and all the annotations, select the file then click the Download button in the viewer. The download may take several minutes for large documents.
If you want to add the genome to your database so that you can use it in Geneious, drag and drop the files you want to your Local folders. You can do this either before or after downloading the full document.