8,136 research outputs found
Improving average ranking precision in user searches for biomedical research datasets
Availability of research datasets is keystone for health and life science
study reproducibility and scientific progress. Due to the heterogeneity and
complexity of these data, a main challenge to be overcome by research data
management systems is to provide users with the best answers for their search
queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we
investigate a novel ranking pipeline to improve the search of datasets used in
biomedical experiments. Our system comprises a query expansion model based on
word embeddings, a similarity measure algorithm that takes into consideration
the relevance of the query terms, and a dataset categorisation method that
boosts the rank of datasets matching query constraints. The system was
evaluated using a corpus with 800k datasets and 21 annotated user queries. Our
system provides competitive results when compared to the other challenge
participants. In the official run, it achieved the highest infAP among the
participants, being +22.3% higher than the median infAP of the participant's
best submissions. Overall, it is ranked at top 2 if an aggregated metric using
the best official measures per participant is considered. The query expansion
method showed positive impact on the system's performance increasing our
baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively.
Our similarity measure algorithm seems to be robust, in particular compared to
Divergence From Randomness framework, having smaller performance variations
under different training conditions. Finally, the result categorization did not
have significant impact on the system's performance. We believe that our
solution could be used to enhance biomedical dataset management systems. In
particular, the use of data driven query expansion methods could be an
alternative to the complexity of biomedical terminologies
Design and Implementation of the UniProt Website
The UniProt consortium is the main provider of protein sequence and annotation data for much of the life sciences community. The "www.uniprot.org":http://www.uniprot.org website is the primary access point to this data and to documentation and basic tools for the data. This paper discusses the design and implementation of the new website, which was released in July 2008, and shows how it improves data access for users with different levels of experience, as well as to machines for programmatic access
Hydroxymethylglutaryl-CoA reductase inhibition with simvastatin in acute lung injury to reduce pulmonary dysfunction (HARP-2) trial : study protocol for a randomized controlled trial
Acute lung injury (ALI) is a common devastating clinical syndrome characterized by life-threatening respiratory failure requiring mechanical ventilation and multiple organ failure. There are in vitro, animal studies and pre-clinical data suggesting that statins may be beneficial in ALI. The Hydroxymethylglutaryl-CoA reductase inhibition with simvastatin in Acute lung injury to Reduce Pulmonary dysfunction (HARP-2) trial is a multicenter, prospective, randomized, allocation concealed, double-blind, placebo-controlled clinical trial which aims to test the hypothesis that treatment with simvastatin will improve clinical outcomes in patients with ALI
BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology.
BindingDB, www.bindingdb.org, is a publicly accessible database of experimental protein-small molecule interaction data. Its collection of over a million data entries derives primarily from scientific articles and, increasingly, US patents. BindingDB provides many ways to browse and search for data of interest, including an advanced search tool, which can cross searches of multiple query types, including text, chemical structure, protein sequence and numerical affinities. The PDB and PubMed provide links to data in BindingDB, and vice versa; and BindingDB provides links to pathway information, the ZINC catalog of available compounds, and other resources. The BindingDB website offers specialized tools that take advantage of its large data collection, including ones to generate hypotheses for the protein targets bound by a bioactive compound, and for the compounds bound by a new protein of known sequence; and virtual compound screening by maximal chemical similarity, binary kernel discrimination, and support vector machine methods. Specialized data sets are also available, such as binding data for hundreds of congeneric series of ligands, drawn from BindingDB and organized for use in validating drug design methods. BindingDB offers several forms of programmatic access, and comes with extensive background material and documentation. Here, we provide the first update of BindingDB since 2007, focusing on new and unique features and highlighting directions of importance to the field as a whole
Services that Enable Integration and Cross-Linking Across Different Types of Identifiers and Data Types
This report summarises progress for disciplinary cross-linking of identifier systems and the results obtained from the perspective of each THOR project partner organisation, in particular disciplinary data repositories. We describe requirements, results, and challenges informed by implementations in the life sciences, earth and environmental sciences, and high-energy physics
Uniformly curated signaling pathways reveal tissue-specific cross-talks and support drug target discovery
Motivation: Signaling pathways control a large variety of cellular processes.
However, currently, even within the same database signaling pathways are often
curated at different levels of detail. This makes comparative and cross-talk
analyses difficult. Results: We present SignaLink, a database containing 8
major signaling pathways from Caenorhabditis elegans, Drosophila melanogaster,
and humans. Based on 170 review and approx. 800 research articles, we have
compiled pathways with semi-automatic searches and uniform, well-documented
curation rules. We found that in humans any two of the 8 pathways can
cross-talk. We quantified the possible tissue- and cancer-specific activity of
cross-talks and found pathway-specific expression profiles. In addition, we
identified 327 proteins relevant for drug target discovery. Conclusions: We
provide a novel resource for comparative and cross-talk analyses of signaling
pathways. The identified multi-pathway and tissue-specific cross-talks
contribute to the understanding of the signaling complexity in health and
disease and underscore its importance in network-based drug target selection.
Availability: http://SignaLink.orgComment: 9 pages, 4 figures, 2 tables and a supplementary info with 5 Figures
and 13 Table
- …