Help Documentation

The <mds_ies_db> is a dynamic, interactive, and mobile friendly database that features an assortment of searches and visualizations. This document serves as a user's manual for searching the database and controlling the interactive displays.

This manual is organized as follows:

  • Search - contains information about the navigation bar quick searches, and the search tab advanced search forms.
  • Display - contains information about the contig display pages that show the association of the MAC and MIC genomes with MDS, IES, and pointer annotations and chord diagram visualizations.
  • Data - contains information about data processing methods and internal naming convention for stored sequences.
  • Technical Notes - contains information about the database architecture and the software libraries used to create the website.

Display

Once the desired sequence is found, the sequence page can be opened to read further details about it. The sequence information page contains the Genome Browser, Chord Diagram window, Properties / MDS / Pointer / Arrangements Tables, Downloads window, and Information Sections.

Genome Browser

On the top of the sequence information page is the Genoverse genome viewer display. This display is divided into four tracks: Sequence, Transcript, MDS/IES/Pointer, and MDS/IES/Pointer Legend.

The Sequence track shows the nucleotide sequence of the reference contig.

The Transcript track shows each of the gene transcripts of the reference contig. Exons of the transcript are represented by a thick red bar. Introns are represented by a thin line connecting exons (or before the first exon and after the last exon). Each feature can be clicked to display more information about it in a popup.

The MDS/IES/Pointer track shows the MDSs, IESs, and pointers that the reference contig shares with the matching contigs on the opposite nucleus. Each row in this track corresponds to the features that correspond to a single contig on the opposite nucleus. For example, if the reference contig is MAC contig each row will correspond to all the MDS and pointers that the MAC contig shares with a single MIC contig. Each row has a different base color to make visually differentiating them easier. MDSs have a lighter shade and pointer have a darker shade of this color. On MIC contigs, IESs are always displayed in bright red regarless of the row. The MDS features are labelled numerically by their order on the corresponding MAC contig (from 5' to 3'). A negative index indicates that the mapping from MAC to MIC inverts the orientation of the segment. Each feature can be clicked to display more information about it in a popup.

The MDS/IES/Pointer Legend track shows the contig names on the opposite nucleus that are currently being viewed in the MDS/IES/Pointer track. The colored boxes next to the names are the same color used in the corresponding row of the MDS/IES/Pointer track. Each name or colored box can be clicked to display more information about the contig in a popup.

The dark grey vertical bars at the 5' or 3' ends of the viewer indicate a degenerate end (nucleotides beyond a telomere) and light grey vertical bars indicate a telomere. The gear shaped icon on the right of each track can be clicked to reveal more controls to enable/disable the legend (MDS/IES/Pointer track only), enable/disable the labels (Transcript and MDS/IES/Pointer tracks only), displaying a short description of the track, and remove the track from the display. Tracks can also be hidden or shown by clicking on the "Tracks" button in the top left corner and using the opened menu. The viewer is hyperlinked with other pages on the website so anywhere a contig or locus name occurs you can usually click on it to navigate to the corresponding information page The mouse scroll wheel can be used to zoom in and out of nucleotide regions. The entire view can be scrolled by dragging in an empty area of viewer or using the navigation bar at the top. Shift-click and drag to zoom in on a region. Additional navigation controls can be selected from the panel on the right. For more information on how to use the Genoverse genome viewer, please refer to this tutorial.

Figure 12: Genoverse browser containing information of MAC sequence, MDS annotation, MIC Hits, and MAC genes.

Chord Diagram

A chord diagram (a.k.a Circos Plot) is a way to visually represent sequence alignment information. The selected MAC/MIC sequence and all matching MIC/MAC sequences on the opposite nucleus are placed on a circle. The matching nucleotide segments of MAC and MIC are connected with arcs. Different MAC to MIC matches are connected with arcs of different colors.

This allows you to see what regions of the MIC are mapped to what regions of MAC and vice versa. To focus on only one set of MIC to MAC (or vice versa) arcs, hover your mouse over the circle segment belonging to the desired MIC/MAC arcs to decrease the opacity of all other arcs. To select which hits are displayed use the "Contigs" dropdown. To display the same visual information in the chord diagram in a linear format instead, select the "Line Diagram" tab at the top of the panel which has the same interface features as the "Chord Diagram" tab.

Figure 13: Chord diagram picture.
Figure 14: In this picture, the mouse is over Contig512.1. As a result, only arcs from this contig to ctg7180000067411 are shown.

To see the chord diagrams, click on the "Chord Diagram" button under the genome browser.

Properties / MDS / Pointer / IES / Arrangements Tables

Properties Table

The Properties Table displays information about the mathematical properties of the rearrangement maps between the MAC/MIC contig being viewed and the contigs on the opposite nucleus that have matching MDSs. These properties have been computed using the annotation software SDRAP. Please refer to the SDRAP documentation here and the publication (currently in review) for the precise definitions. Click the "Properties Table" button under the genome viewer to view this table.

MDS Table

The MDS Table displays information about the MDSs (Macronuclear Destined Sequences) between the MAC/MIC contig being viewed and the contigs on the opposite nucleus that match. These MDS have been obtained using the annotation software SDRAP. Please refer to the SDRAP documentation here and the publication (currently in review) for more details. Click the "MDS Table" button under the genome viewer to view this table.

Pointer Table

The Pointer Table displays information about the pointers between the MAC/MIC contig being viewed and the contigs on the opposite nucleus that have matching MDSs. These pointers have been obtained using the annotation software SDRAP. Please refer to the SDRAP documentation here and the publication (currently in review) for more details. Click the "Pointer Table" button under the genome viewer to view this table.

IES Table

The IES Table displays information about the IESs (Internal Eliminated Sequences) between the MIC contig being viewed and the contigs on the MAC nucleus that have match MDSs. Note, that this table is only available for MIC contigs since IESs exist only on MIC contig. These IESs have been obtained after post-processing the output of the annotation software SDRAP. The IESs displayed are sometimes referred to as Strict IESs, and are regions between a pair of consecutive MDSs from the same MAC/MIC pair that do not intersect any other MDSs of any other MIC/MAC pair. Click the "IES Table" button under the genome viewer to view this table.

Arrangement Table

The Arrangements Table displays information about the rearrangement map between the MAC/MIC contig being viewed and the contigs on the opposite nucleus that have matching MDSs. For example, an arrangement map is a sequence such as: M3 M2 M1. The M1 indicates that the MDS in position 3 on the MAC contig appears in position 1 on the MIC contig, the M2 indicates that the MDS in position 2 on the MAC contig appears inverted in position 2 on the MIC contig, and M1 indicates that the MDS in position 1 on the MAC contig appear in position 3 on the MIC contig. These arragements have been obtained using the output of the annotation software SDRAP. Please refer to the SDRAP documentation here and the publication (currently in review) for more details. Click the "Arrangement Table" button under the genome viewer display to view this table.

Features common to all tables

There are seven buttons common to every table: Select All, Deselect All, Copy, Excel, CSV, PDF, and Column Visibility. Select All selects all the matching contigs on the opposite nucleus to be viewed in the browser. Deselect All selects none of the matching contigs on the opposite nucleus to be viewed in the browser. Copy copies the viewed data to the clipboard. Excel downloads the viewed data in XLSX format. CSV downloads the viewed data in CSV format. PDF downloads the viewed data in PDF format. Column Visibility allows you to select or deselect columns to be displayed in the table. Note that selecting or deselecting contigs using the checkboxes in this table cause the corresponding contig to be added or removed from the genome viewer. To sort the table by a column, click on the up/down arrows next to the column name. To search for a matching contig by name use the search bar at the top left. To view more rows of the table use the page navigation buttons along the bottom right of the table. If not all columns are visible use the scroll bar at the bottom to scroll horizontally. To close the window click on the "Close" button at the bottom right corner of the window or the "x" at the top right corner of the window.

Figure 15: Contig Properties table.

Download Data

The Download Data window allows you to download sequences, annotations, and other information of the currently displayed contig. There are three download categories: Sequences, Annotations, Other. Click on the "Downloads" button under the genoverse display to open the download window.

Sequences: On the top of the download window, there is a Sequences field that consists of Nucleotide and Protein checkboxes and a Format field. Check Nucleotides and/or Protein to download the corresponding sequences. The only format for this category is FASTA, and it is selected by default.

Annotations: This category includes Genes, MDSs, and Telomeres check boxes. The data can be downloaded in GFF3 or BED formats.

Other: This category contains RNA Expressions and MIC Arrangements check boxes. The data can be downloaded in CSV or XLSX format.

Download: Once all desired information is checked, click the "Download" button to download the data as a zip archive.

Figure 17: Download data window.

Information fields

Under the genome viewer and table buttons there are different information fields related to the displayed sequence.

The DNA Information field provides the length of the sequence (in nucleotides), information about telomeres (for MAC contigs), and the nucleotide sequence in text format (click the button).

The MDS Information field shows the number of MIC/MAC hits on the opposite nucleus, the MDS count, the pointer count, and the IES count (if viewing a MIC contig).

The Cross References section has two subsections: External Databases and Variants. The External Databases subsection provides links to other databases with information about the contig being viewed such as OxyDB, GenBank, and the 2015 version of <mds_ies_db>. The Variants subsection contains links to other contigs on <mds_ies_db> that are knows to be variants or isoforms of the contig being viewed.

Figure 18: DNA, MDS, and Cross References Information fields.

The Gene Information section contains a table of all genes that are present on the displayed sequence. To view the transcripts of each gene click on the green "+" symbol on the corresponding gene row. To view additional features of each transcript such as exons, introns, CDS, etc. click on the green "+" symbol on the corresponding transcript row. It is possible to filter for a particular gene name or gene description by typing text into the Search field. To view the DNA and/or protein sequence of each feature in the table click on the button in the corresponding row.

Figure 20: Gene table that lists all gene names and gene descriptions.
Figure 21: Expanded information fields of a gene and transcript inside the Gene Information table.

Data

This section describes the sources of data for the <mds_ies_db>.

MDS-IES Annotation

The MDS-IES annotation comes from the MDS/IES Annotation sequence software SDRAP which is a free and open source program developed by Jasper Braun and collaborators at the USF Math-Bio Lab. The annotation process consists of BLASTing MAC contigs/scaffolds against MIC contigs/scaffolds and using high score pair information to identify MDSs on the MAC and MIC. Using the MIC's MDS information, additional processing is done to identify IESs for each MIC. Besides MDS-IES annotation, SDRAP also produces MAC telomeric sequence information and MIC's MDS arrangement pattern information. Both types of information are currently stored in the <mds_ies_db>. For more details on the SDRAP algorithm please refer to the GitHub page here.

Notation (2022)

The current naming scheme for contigs follows the genome assemblies for the respective species found at OxyDB (Oxytricha Trifallax) and TGD (Tetrahymena thermophila).

Notation (2016)

Note: This section describes the contig naming schemes for the 2016 version of <mds_ies_db> which is no longer being used in the most recent update.

The <mds_ies_db> assigns its own name to every sequence that is stored in the database. The naming convention is described as follows:

  • 6 uppercase digits that are related to the organism name (ex. Oxytricha trifallax - OXYTRI)
  • Underscore symbol "_"
  • "MIC" or "MAC" string to indicate whether this is a MAC or MIC nucleus
  • Underscore symbol "_"
  • Unique number that is assigned to the sequence

An example of the assigned MAC contig name for oxytricha trifallax is OXYTRI_MAC_1001, and for the MIC contig of tetrahymena thermophila is TTHERM_MIC_1464

Sources

The genomes, proteomes, gene expression, and other annotations found in the <mds_ies_db> were collected from external databases such as OxyDB, GenBank, and TGD.

Technical Notes

This section describes the database architecture and lists software and libraries used during the development of <mds_ies_db>.

Database Architecture

The <mds_ies_db> is a relational database powered by MySQL database management system.

Currently, there are about 14 tables in the database which are listed below.

  • Alias - contains information about sequence alias names found in different databases.
  • Contig - contains information about each MAC/MIC contig.
  • Count - contains summary information about the genes, MDSs, pointers, and IESs for each MAC and MIC contig.
  • Coverage - contains information about MDS mapping between each MAC/MIC pair.
  • Gene - contains information about genes that are found on MAC/MIC contigs.
  • IES_strict - contains information about the strict IESs that are found on MIC contigs. Strict IESs are segments of a MIC contig that are between two MDSs such that both MDSs correspond to the same MAC contig and the segment between the MDSs does not overlap any MDSs of either that MAC contig or any other.
  • IES_weak - contains information about the weak IESs that are found on MIC contigs. Weak IESs are segments of a MIC contig that are between two MDSs such that both MDSs correspond to the same MAC contig and the segment between the MDSs does not overlap any MDSs of that MAC contig, but it may overlap MDSs of other MAC contigs.
  • Match - contains information about MDSs that were identified during the MDS annotation process.
  • Parameter - contains information about the SDRAP parameters that were used during the MDS annotation process.
  • Pointer - contains information about pointers that were identified during the MDS annotation process.
  • Properties - contains information about arrangement map properties that were identified during the MDS annotation process.
  • Protein - contains information about MAC protein transcripts.
  • Stats - contains summary information about each annotation database.
  • Variant - contains information about the variants/isoforms of each contig.

Software

The <mds_ies_db> uses a number of open source software programs and libraries:

  • Bootstrap

    Bootstrap is the most popular HTML, CSS, and JS framework for developing responsive, mobile first projects on the web.
  • D3.js - Data-Driven Documents

    A JavaScript library for manipulating documents based on data using HTML, SVG and CSS.
  • DataTables

    DataTables is a plug-in for the jQuery Javascript library. It is a highly flexible tool, based upon the foundations of progressive enhancement, and adds interactive controls to any HTML table.
  • Genoverse

    Genoverse is a portable, customizable, back-end independent JavaScript and HTML5 based genome browser which allows you to explore data interactively.
  • Scrambled DNA Rearrangement Annotation Protocol

    SDRAP is a web application which annotates DNA segments in DNA rearrangement precursor and product genomes which describe the rearrangement, and computes properties of the rearrangements reflecting their complexity. The annotated segments sought by the software are analogous to MDSs, IESs, and pointers in ciliate DNA rearrangements.
  • BLAST (Basic Local Alignment Search Tool)

    BLAST refers to a suite of standalone local alignment search programs produced by NCBI.
  • SheetJS

    JavaScript library to read, edit, and export spreadsheets in a web browser.
  • FileSaver.js

    FileSaver.js is the solution to saving files on the client-side, and is perfect for web apps that generates files on the client.