8,136 research outputs found

    Improving average ranking precision in user searches for biomedical research datasets

    Full text link
    Availability of research datasets is keystone for health and life science study reproducibility and scientific progress. Due to the heterogeneity and complexity of these data, a main challenge to be overcome by research data management systems is to provide users with the best answers for their search queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we investigate a novel ranking pipeline to improve the search of datasets used in biomedical experiments. Our system comprises a query expansion model based on word embeddings, a similarity measure algorithm that takes into consideration the relevance of the query terms, and a dataset categorisation method that boosts the rank of datasets matching query constraints. The system was evaluated using a corpus with 800k datasets and 21 annotated user queries. Our system provides competitive results when compared to the other challenge participants. In the official run, it achieved the highest infAP among the participants, being +22.3% higher than the median infAP of the participant's best submissions. Overall, it is ranked at top 2 if an aggregated metric using the best official measures per participant is considered. The query expansion method showed positive impact on the system's performance increasing our baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively. Our similarity measure algorithm seems to be robust, in particular compared to Divergence From Randomness framework, having smaller performance variations under different training conditions. Finally, the result categorization did not have significant impact on the system's performance. We believe that our solution could be used to enhance biomedical dataset management systems. In particular, the use of data driven query expansion methods could be an alternative to the complexity of biomedical terminologies

    Design and Implementation of the UniProt Website

    Get PDF
    The UniProt consortium is the main provider of protein sequence and annotation data for much of the life sciences community. The "www.uniprot.org":http://www.uniprot.org website is the primary access point to this data and to documentation and basic tools for the data. This paper discusses the design and implementation of the new website, which was released in July 2008, and shows how it improves data access for users with different levels of experience, as well as to machines for programmatic access

    Hydroxymethylglutaryl-CoA reductase inhibition with simvastatin in acute lung injury to reduce pulmonary dysfunction (HARP-2) trial : study protocol for a randomized controlled trial

    Get PDF
    Acute lung injury (ALI) is a common devastating clinical syndrome characterized by life-threatening respiratory failure requiring mechanical ventilation and multiple organ failure. There are in vitro, animal studies and pre-clinical data suggesting that statins may be beneficial in ALI. The Hydroxymethylglutaryl-CoA reductase inhibition with simvastatin in Acute lung injury to Reduce Pulmonary dysfunction (HARP-2) trial is a multicenter, prospective, randomized, allocation concealed, double-blind, placebo-controlled clinical trial which aims to test the hypothesis that treatment with simvastatin will improve clinical outcomes in patients with ALI

    BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology.

    Get PDF
    BindingDB, www.bindingdb.org, is a publicly accessible database of experimental protein-small molecule interaction data. Its collection of over a million data entries derives primarily from scientific articles and, increasingly, US patents. BindingDB provides many ways to browse and search for data of interest, including an advanced search tool, which can cross searches of multiple query types, including text, chemical structure, protein sequence and numerical affinities. The PDB and PubMed provide links to data in BindingDB, and vice versa; and BindingDB provides links to pathway information, the ZINC catalog of available compounds, and other resources. The BindingDB website offers specialized tools that take advantage of its large data collection, including ones to generate hypotheses for the protein targets bound by a bioactive compound, and for the compounds bound by a new protein of known sequence; and virtual compound screening by maximal chemical similarity, binary kernel discrimination, and support vector machine methods. Specialized data sets are also available, such as binding data for hundreds of congeneric series of ligands, drawn from BindingDB and organized for use in validating drug design methods. BindingDB offers several forms of programmatic access, and comes with extensive background material and documentation. Here, we provide the first update of BindingDB since 2007, focusing on new and unique features and highlighting directions of importance to the field as a whole

    Services that Enable Integration and Cross-Linking Across Different Types of Identifiers and Data Types

    Get PDF
    This report summarises progress for disciplinary cross-linking of identifier systems and the results obtained from the perspective of each THOR project partner organisation, in particular disciplinary data repositories. We describe requirements, results, and challenges informed by implementations in the life sciences, earth and environmental sciences, and high-energy physics

    Uniformly curated signaling pathways reveal tissue-specific cross-talks and support drug target discovery

    Full text link
    Motivation: Signaling pathways control a large variety of cellular processes. However, currently, even within the same database signaling pathways are often curated at different levels of detail. This makes comparative and cross-talk analyses difficult. Results: We present SignaLink, a database containing 8 major signaling pathways from Caenorhabditis elegans, Drosophila melanogaster, and humans. Based on 170 review and approx. 800 research articles, we have compiled pathways with semi-automatic searches and uniform, well-documented curation rules. We found that in humans any two of the 8 pathways can cross-talk. We quantified the possible tissue- and cancer-specific activity of cross-talks and found pathway-specific expression profiles. In addition, we identified 327 proteins relevant for drug target discovery. Conclusions: We provide a novel resource for comparative and cross-talk analyses of signaling pathways. The identified multi-pathway and tissue-specific cross-talks contribute to the understanding of the signaling complexity in health and disease and underscore its importance in network-based drug target selection. Availability: http://SignaLink.orgComment: 9 pages, 4 figures, 2 tables and a supplementary info with 5 Figures and 13 Table
    corecore