48 research outputs found

    <i>Textrous!</i>: Extracting Semantic Textual Meaning from Gene Sets

    Get PDF
    <div><p>The un-biased and reproducible interpretation of high-content gene sets from large-scale genomic experiments is crucial to the understanding of biological themes, validation of experimental data, and the eventual development of plans for future experimentation. To derive biomedically-relevant information from simple gene lists, a mathematical association to scientific language and meaningful words or sentences is crucial. Unfortunately, existing software for deriving meaningful and easily-appreciable scientific textual ‘tokens’ from large gene sets either rely on controlled vocabularies (Medical Subject Headings, Gene Ontology, BioCarta) or employ Boolean text searching and co-occurrence models that are incapable of detecting indirect links in the literature. As an improvement to existing web-based informatic tools, we have developed <i>Textrous!</i>, a web-based framework for the extraction of biomedical semantic meaning from a given input gene set of arbitrary length. <i>Textrous!</i> employs natural language processing techniques, including latent semantic indexing (LSI), sentence splitting, word tokenization, parts-of-speech tagging, and noun-phrase chunking, to mine MEDLINE abstracts, PubMed Central articles, articles from the Online Mendelian Inheritance in Man (OMIM), and Mammalian Phenotype annotation obtained from Jackson Laboratories. <i>Textrous!</i> has the ability to generate meaningful output data with even very small input datasets, using two different text extraction methodologies (collective and individual) for the selecting, ranking, clustering, and visualization of English words obtained from the user data. <i>Textrous!</i>, therefore, is able to facilitate the output of quantitatively significant and easily appreciable semantic words and phrases linked to both individual gene and batch genomic data.</p></div

    Web-based user interface for <i>Textrous!</i>.

    No full text
    <p>(A) The main navigation bar is on the top-right. The search bar is below the main navigation bar, and the secondary navigation bar is below the search bar. Features can be accessed by clicking the appropriate menu item, phrases by clicking on the word hyperlinks, and excluded words by clicking the “(x genes found)” description in the search bar. (B) Primary Cosine Similarity output from <i>Textrous!</i> user interface. The main navigation bar is on the top-right. The search bar is below the main navigation bar, and the secondary navigation bar is below the search bar. The ‘Cosine Similarity’ output is demonstrated for the following Gene Symbol input sequence: Lep, Bdnf, Fto, Lepr. After symbol input into the ‘Search’ box then the cosine similarity word list is generated by pressing ‘Submit’. Automatically the ‘Cosine Table’ is depicted first. Additional textual output modes can be accessed subsequently using the toolbar. (C) Phrase hyperlinking from Cosine Similarity tables. Each word term generated from the input query list can be clicked on to link out (in red box) to the phrases in which it resides. The phrases containing the identified word are ranked according to their cosine similarity as well. (D) In addition to the Cosine Similarity output feature, the resulting word lists can be assessed by their output Z-score table or the probability scores in their p-value table. In each of these text word output formats each word can be linked out to its phrase context scoring box as in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0062665#pone-0062665-g005" target="_blank">Figure 5</a>.</p

    <i>Textrous!-</i>mediated individual processing output of an exemplary large dataset.

    No full text
    <p>The heatmap representation (teal-colored blocks indicate strongly-associated gene-word interactions in an intensity-sensitive manner: grey blocks indicate no significant interaction) indicates the gene (vertical)-word (horizontal) interactions within the large mouse learning dataset created with <i>Textrous!</i> individual processing.</p

    Hierarchical cloud collective processing from large physiological datasets.

    No full text
    <p>The hierarchical cloud represents the most strongly associated words with a large input dataset derived from behavioral experiments investigating learning task-oriented activity in mice. The highest scoring (Cosine Similarity, Z score, probability) words extracted by <i>Textrous!</i> for the input dataset are indicated next to the hierarchical cloud.</p

    Hierarchical cloud collective processing for compare-and-contrast large datasets.

    No full text
    <p>(A) The hierarchical cloud represents the most strongly associated words associated with the hPTH (1–34)-induced transcriptomic response in murine calvarial bone. The highest scoring (Cosine Similarity, Z score, probability) words extracted by <i>Textrous!</i> for the input dataset are indicated next to the hierarchical cloud. (B) Hierarchical cloud representing the most strongly associated words associated with the bPTH (7–34)-induced transcriptomic response in murine calvarial bone. The highest scoring (Cosine Similarity, Z score, probability) words extracted by <i>Textrous!</i> for the input dataset are indicated next to the hierarchical cloud. (C) Venn diagram illustrating the distinct nature of collective processing-Textrous!-extracted words for the hPTH (1–34) and bPTH (7–34) datasets. (D) Venn diagram illustrating the minimal commonality between words from manually-dismantled noun-phrases from hPTH (1–34) and bPTH (7–34) datasets.</p

    Singular Value Decomposition (SVD) on a term document matrix and the generation of the U* matrix.

    No full text
    <p>(A–C) U and V<sup>T</sup> contain the LSI vectors for terms and documents, respectively while Σ contains the singular values of the original term document matrix. (D) An illustration of the resulting matrix U*, obtained by the multiplication of U<sub>k</sub> and P. Note that the resulting matrix contains the word vectors and phrase vectors in LSI space, facilitating the comparison between every word/phrase and every other word/phrase entity.</p

    <i>Textrous!-</i>mediated individual processing output of compare-and-contrast large datasets.

    No full text
    <p>Individual processing heatmaps for hPTH (1–34)- and bPTH (7–34)-mediated transcriptomic activity in murine calvarial bone are demonstrated in panels (A) and (B) respectively. Teal-colored blocks indicate strongly-associated gene-word interactions in a intensity-sensitive manner, while grey blocks indicate no significant interaction.</p

    Multiple comparison of the functional accuracy and specificity of <i>Textrous!</i>-extracted data with other data analysis modules.

    No full text
    <p>The top five most significantly associated words obtained from <i>Textrous!</i> collective analysis of the mouse learning dataset are compared to the top 5 most significantly enriched, KEGG pathways, GO-biological processes (GO<i>bp</i>), WikiPathways, Ingenuity Pathway Analysis (IPA) Canonical Signaling Pathways (IPA CanPath), Protein Information Resource Keywords (PIR Keywords) and IPA BioFunctions generated using WebGestalt (KEGG, GObp, WikiPathways), IPA (CanPath, BioFunctions) and NIH-DAVID (PIR Keywords) respectively. The text size and descending sequential orientation indicate the first to the fifth most significantly enriched group for each analytical mode illustrated.</p

    Diverse <i>Textrous!</i> processing formats.

    No full text
    <p>(A) An illustration of the hierarchical cloud displaying multiple themes produced by <i>collective</i> processing. The hierarchical cloud shows depression and stress at the conjunction between terms related to the central nervous system and terms related to obesity. Each cell is color-coded to represent the time at which joins were made. Font sizes are adjusted in proportion to the calculated cosine similarities. (B) An illustration of the heat map produced by <i>individual</i> processing. The top associated (Cosine Similarity) terms are shown, as well as the relationships amongst genes. Here, the heat map shows that the top words are obesity-related, and that “Bdnf” is dissimilar to the other genes in the query. Grey color indicates a relative lack of association, while the intensity of teal color corresponds directly to the strength of correlation of each pairwise association. (C) Each of the output textual terms can be hyperlinked, via clicking on the word, to their associated top-scoring (Cosine Similarity) phrases. In this panel the output word term ‘<i>hyperphagia</i>’ was linked out to its associated phrase contexts.</p
    corecore