317 research outputs found

    A graph-search framework for associating gene identifiers with documents

    Get PDF
    BACKGROUND: One step in the model organism database curation process is to find, for each article, the identifier of every gene discussed in the article. We consider a relaxation of this problem suitable for semi-automated systems, in which each article is associated with a ranked list of possible gene identifiers, and experimentally compare methods for solving this geneId ranking problem. In addition to baseline approaches based on combining named entity recognition (NER) systems with a "soft dictionary" of gene synonyms, we evaluate a graph-based method which combines the outputs of multiple NER systems, as well as other sources of information, and a learning method for reranking the output of the graph-based method. RESULTS: We show that named entity recognition (NER) systems with similar F-measure performance can have significantly different performance when used with a soft dictionary for geneId-ranking. The graph-based approach can outperform any of its component NER systems, even without learning, and learning can further improve the performance of the graph-based ranking approach. CONCLUSION: The utility of a named entity recognition (NER) system for geneId-finding may not be accurately predicted by its entity-level F1 performance, the most common performance measure. GeneId-ranking systems are best implemented by combining several NER systems. With appropriate combination methods, usefully accurate geneId-ranking systems can be constructed based on easily-available resources, without resorting to problem-specific, engineered components

    Fifteen new risk loci for coronary artery disease highlight arterial-wall-specific mechanisms

    Get PDF
    Coronary artery disease (CAD) is a leading cause of morbidity and mortality worldwide. Although 58 genomic regions have been associated with CAD thus far, most of the heritability is unexplained, indicating that additional susceptibility loci await identification. An efficient discovery strategy may be larger-scale evaluation of promising associations suggested by genome-wide association studies (GWAS). Hence, we genotyped 56,309 participants using a targeted gene array derived from earlier GWAS results and performed meta-analysis of results with 194,427 participants previously genotyped, totaling 88,192 CAD cases and 162,544 controls. We identified 25 new SNP-CAD associations (P < 5 × 10(-8), in fixed-effects meta-analysis) from 15 genomic regions, including SNPs in or near genes involved in cellular adhesion, leukocyte migration and atherosclerosis (PECAM1, rs1867624), coagulation and inflammation (PROCR, rs867186 (p.Ser219Gly)) and vascular smooth muscle cell differentiation (LMOD1, rs2820315). Correlation of these regions with cell-type-specific gene expression and plasma protein levels sheds light on potential disease mechanisms

    Sperm design and variation in the New World blackbirds (Icteridae)

    Get PDF
    Post-copulatory sexual selection (PCSS) is thought to be one of the evolutionary forces responsible for the rapid and divergent evolution of sperm design. However, whereas in some taxa particular sperm traits are positively associated with PCSS, in other taxa, these relationships are negative, and the causes of these different patterns across taxa are poorly understood. In a comparative study using New World blackbirds (Icteridae), we tested whether sperm design was influenced by the level of PCSS and found significant positive associations with the level of PCSS for all sperm components but head length. Additionally, whereas the absolute length of sperm components increased, their variation declined with the intensity of PCSS, indicating stabilizing selection around an optimal sperm design. Given the diversity of, and strong selection on, sperm design, it seems likely that sperm phenotype may influence sperm velocity within species. However, in contrast to other recent studies of passerine birds, but consistent with several other studies, we found no significant link between sperm design and velocity, using four different species that vary both in sperm design and PCSS. Potential reasons for this discrepancy between studies are discussed

    BioInfer: a corpus for information extraction in the biomedical domain

    Get PDF
    BACKGROUND: Lately, there has been a great interest in the application of information extraction methods to the biomedical domain, in particular, to the extraction of relationships of genes, proteins, and RNA from scientific publications. The development and evaluation of such methods requires annotated domain corpora. RESULTS: We present BioInfer (Bio Information Extraction Resource), a new public resource providing an annotated corpus of biomedical English. We describe an annotation scheme capturing named entities and their relationships along with a dependency analysis of sentence syntax. We further present ontologies defining the types of entities and relationships annotated in the corpus. Currently, the corpus contains 1100 sentences from abstracts of biomedical research articles annotated for relationships, named entities, as well as syntactic dependencies. Supporting software is provided with the corpus. The corpus is unique in the domain in combining these annotation types for a single set of sentences, and in the level of detail of the relationship annotation. CONCLUSION: We introduce a corpus targeted at protein, gene, and RNA relationships which serves as a resource for the development of information extraction systems and their components such as parsers and domain analyzers. The corpus will be maintained and further developed with a current version being available at

    Species-Area Relationships Are Controlled by Species Traits

    Get PDF
    The species-area relationship (SAR) is one of the most thoroughly investigated empirical relationships in ecology. Two theories have been proposed to explain SARs: classical island biogeography theory and niche theory. Classical island biogeography theory considers the processes of persistence, extinction, and colonization, whereas niche theory focuses on species requirements, such as habitat and resource use. Recent studies have called for the unification of these two theories to better explain the underlying mechanisms that generates SARs. In this context, species traits that can be related to each theory seem promising. Here we analyzed the SARs of butterfly and moth assemblages on islands differing in size and isolation. We tested whether species traits modify the SAR and the response to isolation. In addition to the expected overall effects on the area, traits related to each of the two theories increased the model fit, from 69% up to 90%. Steeper slopes have been shown to have a particularly higher sensitivity to area, which was indicated by species with restricted range (slope  = 0.82), narrow dietary niche (slope  = 0.59), low abundance (slope  = 0.52), and low reproductive potential (slope  = 0.51). We concluded that considering species traits by analyzing SARs yields considerable potential for unifying island biogeography theory and niche theory, and that the systematic and predictable effects observed when considering traits can help to guide conservation and management actions

    Investigation of three new mouse mammary tumor cell lines as models for transforming growth factor (TGF)-β and Neu pathway signaling studies: identification of a novel model for TGF-β-induced epithelial-to-mesenchymal transition

    Get PDF
    INTRODUCTION: This report describes the isolation and characterization of three new murine mammary epithelial cell lines derived from mammary tumors from MMTV (mouse mammary tumor virus)/activated Neu + TβRII-AS (transforming growth factor [TGF]-β type II receptor antisense RNA) bigenic mice (BRI-JM01 and BRI-JM05 cell lines) and MMTV/activated Neu transgenic mice (BRI-JM04 cell line). METHODS: The BRI-JM01, BRI-JM04, and BRI-JM05 cell lines were analyzed for transgene expression, their general growth characteristics, and their sensitivities to several growth factors from the epidermal growth factor (EGF) and TGF-β families (recombinant human EGF, heregulin-β(1 )and TGF-β(1)). The BRI-JM01 cells were observed to undergo a striking morphologic change in response to TGF-β(1), and they were therefore further investigated for their ability to undergo a TGF-β-induced epithelial-to-mesenchymal transition (EMT) using motility assays and immunofluorescence microscopy. RESULTS: We found that two of the three cell lines (BRI-JM04 and BRI-JM05) express the Neu transgene, whereas, unexpectedly, both of the cell lines that were established from MMTV/activated Neu + TβRII-AS bigenic tumors (BRI-JM01 and BRI-JM05) do not express the TβRII-AS transgene. The cuboidal BRI-JM01 cells exhibit a short doubling time and are able to form confluent monolayers. The BRI-JM04 and BRI-JM05 cell lines are morphologically much less uniform, grow at a much slower rate, and do not form confluent monolayers. Only the BRI-JM05 cells can form colonies in soft agar. In contrast, all three cell lines form colonies in Matrigel, although the BRI-JM04 and BRI-JM05 cell lines do so more efficiently than the BRI-JM01 cell line. All three cell lines express the cell surface marker E-cadherin, confirming their epithelial character. Proliferation assays showed that the three cell lines respond differently to recombinant human EGF and heregulin-β(1), and that all are growth inhibited by TGF-β(1), but that only the BRI-JM01 cell line undergoes an EMT and exhibits increased motility upon TGF-β(1 )treatment. CONCLUSION: We suggest that the BRI-JM04 and BRI-JM05 cell lines can be used to investigate Neu oncogene driven mammary tumorigenesis, whereas the BRI-JM01 cell line will be useful for studying TGF-β(1)-induced EMT

    Benchmarking natural-language parsers for biological applications using dependency graphs

    Get PDF
    BACKGROUND: Interest is growing in the application of syntactic parsers to natural language processing problems in biology, but assessing their performance is difficult because differences in linguistic convention can falsely appear to be errors. We present a method for evaluating their accuracy using an intermediate representation based on dependency graphs, in which the semantic relationships important in most information extraction tasks are closer to the surface. We also demonstrate how this method can be easily tailored to various application-driven criteria. RESULTS: Using the GENIA corpus as a gold standard, we tested four open-source parsers which have been used in bioinformatics projects. We first present overall performance measures, and test the two leading tools, the Charniak-Lease and Bikel parsers, on subtasks tailored to reflect the requirements of a system for extracting gene expression relationships. These two tools clearly outperform the other parsers in the evaluation, and achieve accuracy levels comparable to or exceeding native dependency parsers on similar tasks in previous biological evaluations. CONCLUSION: Evaluating using dependency graphs allows parsers to be tested easily on criteria chosen according to the semantics of particular biological applications, drawing attention to important mistakes and soaking up many insignificant differences that would otherwise be reported as errors. Generating high-accuracy dependency graphs from the output of phrase-structure parsers also provides access to the more detailed syntax trees that are used in several natural-language processing techniques
    corecore