323 research outputs found

    Preliminary evaluation of the CellFinder literature curation pipeline for gene expression in kidney cells and anatomical parts

    Get PDF
    Biomedical literature curation is the process of automatically and/or manually deriving knowledge from scientific publications and recording it into specialized databases for structured delivery to users. It is a slow, error-prone, complex, costly and, yet, highly important task. Previous experiences have proven that text mining can assist in its many phases, especially, in triage of relevant documents and extraction of named entities and biological events. Here, we present the curation pipeline of the CellFinder database, a repository of cell research, which includes data derived from literature curation and microarrays to identify cell types, cell lines, organs and so forth, and especially patterns in gene expression. The curation pipeline is based on freely available tools in all text mining steps, as well as the manual validation of extracted data. Preliminary results are presented for a data set of 2376 full texts from which >4500 gene expression events in cell or anatomical part have been extracted. Validation of half of this data resulted in a precision of ~50% of the extracted data, which indicates that we are on the right track with our pipeline for the proposed task. However, evaluation of the methods shows that there is still room for improvement in the named-entity recognition and that a larger and more robust corpus is needed to achieve a better performance for event extraction. Database URL: http://www.cellfinder.org

    CellFinder: a cell data repository

    Get PDF
    CellFinder (http://www.cellfinder.org) is a comprehensive one-stop resource for molecular data characterizing mammalian cells in different tissues and in different development stages. It is built from carefully selected data sets stemming from other curated databases and the biomedical literature. To date, CellFinder describes 3394 cell types and 50 951 cell lines. The database currently contains 3055 microscopic and anatomical images, 205 whole-genome expression profiles of 194 cell/tissue types from RNA-seq and microarrays and 553 905 protein expressions for 535 cells/tissues. Text mining of a corpus of >2000 publications followed by manual curation confirmed expression information on ∌900 proteins and genes. CellFinder's data model is capable to seamlessly represent entities from single cells to the organ level, to incorporate mappings between homologous entities in different species and to describe processes of cell development and differentiation. Its ontological backbone currently consists of 204 741 ontology terms incorporated from 10 different ontologies unified under the novel CELDA ontology. CellFinder's web portal allows searching, browsing and comparing the stored data, interactive construction of developmental trees and navigating the partonomic hierarchy of cells and tissues through a unique body browser designed for life scientists and clinicians

    Validity constraints for data analysis workflows

    Get PDF
    \ua9 2024Porting a scientific data analysis workflow (DAW) to a cluster infrastructure, a new software stack, or even only a new dataset with some notably different properties is often challenging. Despite the structured definition of the steps (tasks) and their interdependencies during a complex data analysis in the DAW specification, relevant assumptions may remain unspecified and implicit. Such hidden assumptions often lead to crashing tasks without a reasonable error message, poor performance in general, non-terminating executions, or silent wrong results of the DAW, to name only a few possible consequences. Searching for the causes of such errors and drawbacks in a distributed compute cluster managed by a complex infrastructure stack, where DAWs for large datasets typically are executed, can be tedious and time-consuming. We propose validity constraints (VCs) as a new concept for DAW languages to alleviate this situation. A VC is a constraint specifying logical conditions that must be fulfilled at certain times for DAW executions to be valid. When defined together with a DAW, VCs help to improve the portability, adaptability, and reusability of DAWs by making implicit assumptions explicit. Once specified, VCs can be controlled automatically by the DAW infrastructure, and violations can lead to meaningful error messages and graceful behavior (e.g., termination or invocation of repair mechanisms). We provide a broad list of possible VCs, classify them along multiple dimensions, and compare them to similar concepts one can find in related fields. We also provide a proof-of-concept implementation for the workflow system Nextflow

    Computer-assisted curation of a human regulatory core network from the biological literature

    Get PDF
    Motivation: A highly interlinked network of transcription factors (TFs) orchestrates the context-dependent expression of human genes. ChIP-chip experiments that interrogate the binding of particular TFs to genomic regions are used to reconstruct gene regulatory networks at genome-scale, but are plagued by high false-positive rates. Meanwhile, a large body of knowledge on high-quality regulatory interactions remains largely unexplored, as it is available only in natural language descriptions scattered over millions of scientific publications. Such data are hard to extract and regulatory data currently contain together only 503 regulatory relations between human TFs. Results: We developed a text-mining-assisted workflow to systematically extract knowledge about regulatory interactions between human TFs from the biological literature. We applied this workflow to the entire Medline, which helped us to identify more than 45 000 sentences potentially describing such relationships. We ranked these sentences by a machine-learning approach. The top-2500 sentences contained ∌900 sentences that encompass relations already known in databases. By manually curating the remaining 1625 top-ranking sentences, we obtained more than 300 validated regulatory relationships that were not present in a regulatory database before. Full-text curation allowed us to obtain detailed information on the strength of experimental evidences supporting a relationship. Conclusions: We were able to increase curated information about the human core transcriptional network by >60% compared with the current content of regulatory databases. We observed improved performance when using the network for disease gene prioritization compared with the state-of-the-art. Availability and implementation: Web-service is freely accessible athttp://fastforward.sys-bio.net/.FWN – Publicaties zonder aanstelling Universiteit Leide

    Adult enteric nervous system in health is maintained by a dynamic balance between neuronal apoptosis and neurogenesis

    Get PDF
    According to current dogma, there is little or no ongoing neurogenesis in the fully developed adult enteric nervous system. This lack of neurogenesis leaves unanswered the question of how enteric neuronal populations are maintained in adult guts, given previous reports of ongoing neuronal death. Here, we confirm that despite ongoing neuronal cell loss because of apoptosis in the myenteric ganglia of the adult small intestine, total myenteric neuronal numbers remain constant. This observed neuronal homeostasis is maintained by new neurons formed in vivo from dividing precursor cells that are located within myenteric ganglia and express both Nestin and p75NTR, but not the pan-glial marker Sox10. Mutation of the phosphatase and tensin homolog gene in this pool of adult precursors leads to an increase in enteric neuronal number, resulting in ganglioneuromatosis, modeling the corresponding disorder in humans. Taken together, our results show significant turnover and neurogenesis of adult enteric neurons and provide a paradigm for understanding the enteric nervous system in health and disease

    Variational Methods for Biomolecular Modeling

    Full text link
    Structure, function and dynamics of many biomolecular systems can be characterized by the energetic variational principle and the corresponding systems of partial differential equations (PDEs). This principle allows us to focus on the identification of essential energetic components, the optimal parametrization of energies, and the efficient computational implementation of energy variation or minimization. Given the fact that complex biomolecular systems are structurally non-uniform and their interactions occur through contact interfaces, their free energies are associated with various interfaces as well, such as solute-solvent interface, molecular binding interface, lipid domain interface, and membrane surfaces. This fact motivates the inclusion of interface geometry, particular its curvatures, to the parametrization of free energies. Applications of such interface geometry based energetic variational principles are illustrated through three concrete topics: the multiscale modeling of biomolecular electrostatics and solvation that includes the curvature energy of the molecular surface, the formation of microdomains on lipid membrane due to the geometric and molecular mechanics at the lipid interface, and the mean curvature driven protein localization on membrane surfaces. By further implicitly representing the interface using a phase field function over the entire domain, one can simulate the dynamics of the interface and the corresponding energy variation by evolving the phase field function, achieving significant reduction of the number of degrees of freedom and computational complexity. Strategies for improving the efficiency of computational implementations and for extending applications to coarse-graining or multiscale molecular simulations are outlined.Comment: 36 page

    Characterizing the gamma-ray long-term variability of PKS 2155-304 with H.E.S.S. and Fermi-LAT

    Get PDF
    Studying the temporal variability of BL Lac objects at the highest energies provides unique insights into the extreme physical processes occurring in relativistic jets and in the vicinity of super-massive black holes. To this end, the long-term variability of the BL Lac object PKS 2155-304 is analyzed in the high (HE, 100 MeV 200 GeV) gamma-ray domain. Over the course of ~9 yr of H.E.S.S observations the VHE light curve in the quiescent state is consistent with a log-normal behavior. The VHE variability in this state is well described by flicker noise (power-spectral-density index {\ss}_VHE = 1.10 +0.10 -0.13) on time scales larger than one day. An analysis of 5.5 yr of HE Fermi LAT data gives consistent results ({\ss}_HE = 1.20 +0.21 -0.23, on time scales larger than 10 days) compatible with the VHE findings. The HE and VHE power spectral densities show a scale invariance across the probed time ranges. A direct linear correlation between the VHE and HE fluxes could neither be excluded nor firmly established. These long-term-variability properties are discussed and compared to the red noise behavior ({\ss} ~ 2) seen on shorter time scales during VHE-flaring states. The difference in power spectral noise behavior at VHE energies during quiescent and flaring states provides evidence that these states are influenced by different physical processes, while the compatibility of the HE and VHE long-term results is suggestive of a common physical link as it might be introduced by an underlying jet-disk connection.Comment: 11 pages, 16 figure

    Detection of variable VHE gamma-ray emission from the extra-galactic gamma-ray binary LMC P3

    Full text link
    Context. Recently, the high-energy (HE, 0.1-100 GeV) Îł\gamma-ray emission from the object LMC P3 in the Large Magellanic Cloud (LMC) has been discovered to be modulated with a 10.3-day period, making it the first extra-galactic Îł\gamma-ray binary. Aims. This work aims at the detection of very-high-energy (VHE, >100 GeV) Îł\gamma-ray emission and the search for modulation of the VHE signal with the orbital period of the binary system. Methods. LMC P3 has been observed with the High Energy Stereoscopic System (H.E.S.S.); the acceptance-corrected exposure time is 100 h. The data set has been folded with the known orbital period of the system in order to test for variability of the emission. Energy spectra are obtained for the orbit-averaged data set, and for the orbital phase bin around the VHE maximum. Results. VHE Îł\gamma-ray emission is detected with a statistical significance of 6.4 σ\sigma. The data clearly show variability which is phase-locked to the orbital period of the system. Periodicity cannot be deduced from the H.E.S.S. data set alone. The orbit-averaged luminosity in the 1−101-10 TeV energy range is (1.4±0.2)×1035(1.4 \pm 0.2) \times 10^{35} erg/s. A luminosity of (5±1)×1035(5 \pm 1) \times 10^{35} erg/s is reached during 20% of the orbit. HE and VHE Îł\gamma-ray emissions are anti-correlated. LMC P3 is the most luminous Îł\gamma-ray binary known so far.Comment: 5 pages, 3 figures, 1 table, accepted for publication in A&

    Mining phenotypes for gene function prediction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Health and disease of organisms are reflected in their phenotypes. Often, a genetic component to a disease is discovered only after clearly defining its phenotype. In the past years, many technologies to systematically generate phenotypes in a high-throughput manner, such as RNA interference or gene knock-out, have been developed and used to decipher functions for genes. However, there have been relatively few efforts to make use of phenotype data beyond the single genotype-phenotype relationships.</p> <p>Results</p> <p>We present results on a study where we use a large set of phenotype data – in textual form – to predict gene annotation. To this end, we use text clustering to group genes based on their phenotype descriptions. We show that these clusters correlate well with several indicators for biological coherence in gene groups, such as functional annotations from the Gene Ontology (GO) and protein-protein interactions. We exploit these clusters for predicting gene function by carrying over annotations from well-annotated genes to other, less-characterized genes in the same cluster. For a subset of groups selected by applying objective criteria, we can predict GO-term annotations from the biological process sub-ontology with up to 72.6% precision and 16.7% recall, as evaluated by cross-validation. We manually verified some of these clusters and found them to exhibit high biological coherence, e.g. a group containing all available antennal Drosophila odorant receptors despite inconsistent GO-annotations.</p> <p>Conclusion</p> <p>The intrinsic nature of phenotypes to visibly reflect genetic activity underlines their usefulness in inferring new gene functions. Thus, systematically analyzing these data on a large scale offers many possibilities for inferring functional annotation of genes. We show that text clustering can play an important role in this process.</p
    • 

    corecore