Obtain Genomic Sequence for a Gene

This tutorial describes part of the How To
called, “Obtain genomic sequence for or near a gene, marker, transcript or protein.” I’ll
focus on obtaining sequence for and surrounding a gene. Let’s get sequence for the human EME1 gene.
I’ll start on the NCBI home page, type eme1, select the Gene database, and click Search. Open the record by clicking the gene name.
You might first take a quick glance at the “Genomic context” section. The highlighted
arrow, and the arrowheads, point to the right, so this is a plus strand gene relative to
the chromosome 17 annotation. A good way to get the genomic sequence is the next section,
where you can link to either the FASTA or GenBank format. I’ll go to the GenBank view. This region includes 8,240 base pairs. If
you want the entire genomic sequence for the gene, including any UTR’s, the untranslated
regions, go to Send ->File ->choose the format, then click the Create File button;
I’ll just exit out of this menu. If you want to include, say, two kilobases of upstream
sequence, decrease the “from” coordinate by 2000, click Update View, then download as
before. Note how the number of base pairs increases by 2000. If you want only the upstream two kilobases,
also change the “to” coordinate to the original “from” coordinate value, minus 1 if you want
exactly 2000 bases. The procedure for getting surrounding sequence is the same when your
gene is on the minus strand, as long as the box is checked for “Show reverse complement.” Another way to get genomic sequence, for example
just a UTR, or an exon or intron, is from the Gene Table display. Go back to the top
of the Gene record and change “Display Settings” to Gene Table. The ranges shown in the tables,
one table for each transcript, are links to the sequence. Note, also, the link to the very helpful “Gene
Table help.”

5 Replies to “Obtain Genomic Sequence for a Gene”

  1. Please, do you know why the human genome appears with some nucleotide sequences of 'NNNN'? I learned that N is using when any base can be represented, but how do I get the original genome without these NNNNs. Thank you.

