455 research outputs found

    Automatic document classification of biological literature

    Get PDF
    Background: Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, a text-mining system for biological literature, which marks up full text according to a shallow ontology that includes terms of biological interest. This project investigates document classification in the context of biological literature, making use of the Textpresso markup of a corpus of Caenorhabditis elegans literature. Results: We present a two-step text categorization algorithm to classify a corpus of C. elegans papers. Our classification method first uses a support vector machine-trained classifier, followed by a novel, phrase-based clustering algorithm. This clustering step autonomously creates cluster labels that are descriptive and understandable by humans. This clustering engine performed better on a standard test-set (Reuters 21578) compared to previously published results (F-value of 0.55 vs. 0.49), while producing cluster descriptions that appear more useful. A web interface allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept. Conclusions: We have demonstrated a simple method to classify biological documents that embodies an improvement over current methods. While the classification results are currently optimized for Caenorhabditis elegans papers by human-created rules, the classification engine can be adapted to different types of documents. We have demonstrated this by presenting a web interface that allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept

    Using Transcriptomes as Mutant Phenotypes Reveals Functional Regions of a Mediator Subunit in Caenorhabditis elegans

    Get PDF
    Although transcriptomes have recently been used as phenotypes with which to perform epistasis analyses, they are not yet used to study intragenic function/structure relationships. We developed a theoretical framework to study allelic series using transcriptomic phenotypes. As a proof-of-concept, we apply our methods to an allelic series of dpy-22, a highly pleiotropic Caenorhabditis elegans gene orthologous to the human gene MED12, which encodes a subunit of the Mediator complex. Our methods identify functional units within dpy-22 that modulate Mediator activity upon various genetic programs, including the Wnt and Ras modules

    Symmetric mixed states of nn qubits: local unitary stabilizers and entanglement classes

    Full text link
    We classify, up to local unitary equivalence, local unitary stabilizer Lie algebras for symmetric mixed states into six classes. These include the stabilizer types of the Werner states, the GHZ state and its generalizations, and Dicke states. For all but the zero algebra, we classify entanglement types (local unitary equivalence classes) of symmetric mixed states that have those stabilizers. We make use of the identification of symmetric density matrices with polynomials in three variables with real coefficients and apply the representation theory of SO(3) on this space of polynomials.Comment: 10 pages, 1 table, title change and minor clarifications for published versio

    Tissue enrichment analysis for C. elegans genomics

    Get PDF
    Background: Over the last ten years, there has been explosive development in methods for measuring gene expression. These methods can identify thousands of genes altered between conditions, but understanding these datasets and forming hypotheses based on them remains challenging. One way to analyze these datasets is to associate ontologies (hierarchical, descriptive vocabularies with controlled relations between terms) with genes and to look for enrichment of specific terms. Although Gene Ontology (GO) is available for Caenorhabditis elegans, it does not include anatomical information. Results: We have developed a tool for identifying enrichment of C. elegans tissues among gene sets and generated a website GUI where users can access this tool. Since a common drawback to ontology enrichment analyses is its verbosity, we developed a very simple filtering algorithm to reduce the ontology size by an order of magnitude. We adjusted these filters and validated our tool using a set of 30 gold standards from Expression Cluster data in WormBase. We show our tool can even discriminate between embryonic and larval tissues and can even identify tissues down to the single-cell level. We used our tool to identify multiple neuronal tissues that are down-regulated due to pathogen infection in C. elegans. Conclusions: Our Tissue Enrichment Analysis (TEA) can be found within WormBase, and can be downloaded using Python’s standard pip installer. It tests a slimmed-down C. elegans tissue ontology for enrichment of specific terms and provides users with a text and graphic representation of the results

    Two new functions in the WormBase Enrichment Suite

    Get PDF
    Genome-wide experiments routinely generate large amounts of data that can be hard to interpret biologically. A common approach to interpreting these results is to employ enrichment analyses of controlled languages, known as ontologies, that describe various biological parameters such as gene molecular or biological function. In C. elegans, three distinct ontologies, the Gene Ontology (GO), Anatomy Ontology (AO), and the Worm Phenotype Ontology (WPO) are used to annotate gene function, expression and phenotype, respectively (Ashburner et al. 2000; Lee and Sternberg, 2003; Schindelman et al. 2011). Previously, we developed software to test datasets for enrichment of anatomical terms, called the Tissue Enrichment Analysis (TEA) tool (Angeles-Albores and Sternberg, 2016). Using the same hypergeometric statistical method, we extend enrichment testing to include WPO and GO, offering a unified approach to enrichment testing in C. elegans. The WormBase Enrichment Suite can be accessed via a user-friendly interface at http://www.wormbase.org/tools/enrichment/tea/tea.cgi. To validate the tools, we analyzed a previously published extracellular vesicle (EV)-releasing neuron (EVN) signature gene set derived from dissociated ciliated EV neurons (Wang et al. 2015) using WormBase Enrichment Suite based on the WS262 WormBase release. TEA correctly identified the CEM, hook sensillum and IL2 neuron as enriched tissues. The top phenotype associated with the EVN signature was chemosensory behavior. Gene Ontology enrichment analysis showed that cell projection and cell body were the most enriched cellular components in this gene set, followed by the biological processes neuropeptide signaling pathway and vesicle localization further down. The tutorial script used to generate the figure above can be viewed at: https://github.com/dangeles/TissueEnrichmentAnalysis/blob/master/tutorial/Tutorial.ipynb The addition of Gene Enrichment Analysis (GEA) and Phenotype Enrichment Analysis (PEA) to WormBase marks an important step towards a unified set of analyses that can help researchers to understand genomic datasets. These enrichment analyses will allow the community to fully benefit from the data curation ongoing at WormBase

    Using Transcriptomes as Mutant Phenotypes Reveals Functional Regions of a Mediator Subunit in Caenorhabditis elegans

    Get PDF
    Although transcriptomes have recently been used as phenotypes with which to perform epistasis analyses, they are not yet used to study intragenic function/structure relationships. We developed a theoretical framework to study allelic series using transcriptomic phenotypes. As a proof-of-concept, we apply our methods to an allelic series of dpy-22, a highly pleiotropic Caenorhabditis elegans gene orthologous to the human gene MED12, which encodes a subunit of the Mediator complex. Our methods identify functional units within dpy-22 that modulate Mediator activity upon various genetic programs, including the Wnt and Ras modules

    Semantic representation of neural circuit knowledge in Caenorhabditis elegans.

    Get PDF
    In modern biology, new knowledge is generated quickly, making it challenging for researchers to efficiently acquire and synthesise new information from the large volume of primary publications. To address this problem, computational approaches that generate machine-readable representations of scientific findings in the form of knowledge graphs have been developed. These representations can integrate different types of experimental data from multiple papers and biological knowledge bases in a unifying data model, providing a complementary method to manual review for interacting with published knowledge. The Gene Ontology Consortium (GOC) has created a semantic modelling framework that extends individual functional gene annotations to structured descriptions of causal networks representing biological processes (Gene Ontology-Causal Activity Modelling, or GO-CAM). In this study, we explored whether the GO-CAM framework could represent knowledge of the causal relationships between environmental inputs, neural circuits and behavior in the model nematode C. elegans [C. elegans Neural-Circuit Causal Activity Modelling (CeN-CAM)]. We found that, given extensions to several relevant ontologies, a wide variety of author statements from the literature about the neural circuit basis of egg-laying and carbon dioxide (C

    A Modified Mole Cricket Lure and Description of Scapteriscus borellii (Orthoptera: Gryllotalpidae) Range Expansion and Calling Song in California

    Get PDF
    Invasive mole cricket species in the genus Scapteriscus have become significant agricultural pests and are continuing to expand their range in North America. Though largely subterranean, adults of some species, such as Scapteriscus borellii Giglio-Tos 1894, are capable of long dispersive flights and phonotaxis to male calling songs to find suitable habitats and mates. Mole crickets in the genus Scapteriscus are known to be attracted to and can be caught by audio lure traps that broadcast synthesized or recorded calling songs. We report improvements in the design and production of electronic controllers for the automation of semipermanent mole cricket trap lures as well as highly portable audio trap collection designs. Using these improved audio lure traps, we collected the first reported individuals of the pest mole cricket S. borellii in California. We describe several characteristic features of the calling song of the California population including that the pulse rate is a function of soil temperature, similar to Florida populations of S. borellii. Further, we show that other calling song characteristics (carrier frequency, intensity, and pulse rate) are significantly different between the populations

    The Caenorhabditis elegans Female State: Decoupling the Transcriptomic Effects of Aging and Sperm-Status

    Get PDF
    Understanding genome and gene function in a whole organism requires us to fully comprehend the life cycle and the physiology of the organism in question. Caenorhabditis elegans XX animals are hermaphrodites that exhaust their sperm after 3 d of egg-laying. Even though C. elegans can live for many days after cessation of egg-laying, the molecular physiology of this state has not been as intensely studied as other parts of the life cycle, despite documented changes in behavior and metabolism. To study the effects of sperm depletion and aging of C. elegans during the first 6 d of adulthood, we measured the transcriptomes of first-day adult hermaphrodites and sixth-day sperm-depleted adults, and, at the same time points, mutant fog-2(lf) worms that have a feminized germline phenotype. We found that we could separate the effects of biological aging from sperm depletion. For a large subset of genes, young adult fog-2(lf) animals had the same gene expression changes as sperm-depleted sixth-day wild-type hermaphrodites, and these genes did not change expression when fog-2(lf) females reached the sixth day of adulthood. Taken together, this indicates that changing sperm status causes a change in the internal state of the worm, which we call the female-like state. Our data provide a high-quality picture of the changes that happen in global gene expression throughout the period of early aging in the worm

    Reconstructing a metazoan genetic pathway with transcriptome-wide epistasis measurements

    Get PDF
    RNA-sequencing (RNA-seq) is commonly used to identify genetic modules that respond to perturbations. In single cells, transcriptomes have been used as phenotypes, but this concept has not been applied to whole-organism RNA-seq. Also, quantifying and interpreting epistatic effects using expression profiles remains a challenge. We developed a single coefficient to quantify transcriptome-wide epistasis that reflects the underlying interactions and which can be interpreted intuitively. To demonstrate our approach, we sequenced four single and two double mutants of Caenorhabditis elegans. From these mutants, we reconstructed the known hypoxia pathway. In addition, we uncovered a class of 56 genes with HIF-1–dependent expression that have opposite changes in expression in mutants of two genes that cooperate to negatively regulate HIF-1 abundance; however, the double mutant of these genes exhibits suppression epistasis. This class violates the classical model of HIF-1 regulation but can be explained by postulating a role of hydroxylated HIF-1 in transcriptional control
    • …
    corecore