Bioinformatics. Группа авторов

Bioinformatics - Группа авторов


Скачать книгу
in context with other genome-scale data. User data must be formatted in a commonly used data structure in order to be interpreted correctly by the browser.

       Browser Extensible Data (BED) format is a tab-delimited format that is flexible enough to display many types of data. It can be used to display fairly simple features like the location of transcription binding factor sites, as well more complex ones like transcripts and their exons.

       Binary Alignment/Map (BAM) format is the compressed binary version of the Sequence Alignment/Map (SAM) format. It is a compact format designed for use with very large files of nucleotide sequence alignments. Because it can be indexed, only the portion of the file that is needed for display is transferred to the browser. Many tools for next generation sequence analysis use BAM format as output or input.

       Variant Call Format (VCF) is a flexible format for large files of variation data including single-nucleotide variants, insertions/deletions, copy number variants, and structural variants. Like BAM format, it is compressed and indexed, and only the portion of the file that is needed for display is transferred to the browser. Many tools for variant analysis use VCF format as output or input.

      To perform a search, enter text into the Position/Search Term box. If the query maps to a unique position in the genome, such as a search for a particular chromosome and position, the Go button links directly to the Genome Browser. However, if there is more than one hit for the query, such as a search for the term metalloprotease, the resulting page will contain a list of results that all contain that term. For some species, the terms have been indexed, and typing a gene symbol into the search box will bring up a list of possible matches. In this example, we will search for the human hypoxia inducible factor 1 alpha subunit (HIF1A) gene (Figure 4.1), which produces a single hit on GRCh38.

Snapshot depicts the home page of the UCSC Genome Browser, showing a query for the gene HIF1A on the human GRCh38 genome assembly in which the organism can be selected by clicking on its name in the phylogenetic tree. For many organisms, more than one genome assembly is available.

      Below the browser window illustrated in Figure 4.2, one would find a list of tracks that are available for display on the assembly. The tracks are separated into nine categories: Mapping and Sequencing, Genes and Gene Predictions, Phenotype and Literature, mRNA and Expressed Sequence Tag (EST), Expression, Regulation, Comparative Genomics, Variation, and Repeats. Clicking on a track name opens the Track Settings page for that track, providing a description of the data displayed in that track. Most tracks can be displayed in one of the following five modes.

      1 Hide: the track is not displayed at all.

      2 Dense: all features are collapsed into a single line; features are not labeled.

      3 Squish: each feature is shown separately, but at 50% the height of full mode; features are not labeled.

      4 Pack: each feature is shown separately, but not necessarily on separate lines; features are labeled.

      5 Full: each feature is labeled and displayed on a separate line.

Snapshot depicts the default view of the UCSC Genome Browser, describing the genomic context of the human HIF1A gene.

      Box 4.2 GENCODE


Скачать книгу