406 research outputs found
Improving average ranking precision in user searches for biomedical research datasets
Availability of research datasets is keystone for health and life science
study reproducibility and scientific progress. Due to the heterogeneity and
complexity of these data, a main challenge to be overcome by research data
management systems is to provide users with the best answers for their search
queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we
investigate a novel ranking pipeline to improve the search of datasets used in
biomedical experiments. Our system comprises a query expansion model based on
word embeddings, a similarity measure algorithm that takes into consideration
the relevance of the query terms, and a dataset categorisation method that
boosts the rank of datasets matching query constraints. The system was
evaluated using a corpus with 800k datasets and 21 annotated user queries. Our
system provides competitive results when compared to the other challenge
participants. In the official run, it achieved the highest infAP among the
participants, being +22.3% higher than the median infAP of the participant's
best submissions. Overall, it is ranked at top 2 if an aggregated metric using
the best official measures per participant is considered. The query expansion
method showed positive impact on the system's performance increasing our
baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively.
Our similarity measure algorithm seems to be robust, in particular compared to
Divergence From Randomness framework, having smaller performance variations
under different training conditions. Finally, the result categorization did not
have significant impact on the system's performance. We believe that our
solution could be used to enhance biomedical dataset management systems. In
particular, the use of data driven query expansion methods could be an
alternative to the complexity of biomedical terminologies
Information retrieval and text mining technologies for chemistry
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European
Communityâs Horizon 2020 Program (project reference:
654021 - OpenMinted). M.K. additionally acknowledges the
Encomienda MINETAD-CNIO as part of the Plan for the
Advancement of Language Technology. O.R. and J.O. thank
the Foundation for Applied Medical Research (FIMA),
University of Navarra (Pamplona, Spain). This work was
partially funded by ConselleriÌa
de Cultura, EducacioÌn e OrdenacioÌn Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic
funding of UID/BIO/04469/2013 unit and COMPETE 2020
(POCI-01-0145-FEDER-006684). We thank InÌigo GarciaÌ -Yoldi
for useful feedback and discussions during the preparation of
the manuscript.info:eu-repo/semantics/publishedVersio
Beyond the paywall
In dieser Dissertation untersuche ich die Forschungswege von sechs Wissenschaftlern, die in verschiedenen Disziplinen und Institutionen in den Vereinigten Staaten und in der Tschechischen Republik arbeiten. Um dies zu tun, verwende ich sogenannte âmulti-sitedâ ethnographisch-methodische Strategien (d.h. Strategien, die Anthropologen verwenden, um Kulturen an zwei oder mehr geografischen Standorten zu vergleichen), mit dem Ziel, informationsbezogene Verhaltensweisen dieser Wissenschaftler im global vernetzten akademischen Umfeld zu untersuchen, englisch abgekĂŒrzt âGNAEâ, ein Begriff, der sich speziell auf die komplexe Bricolage von Netzwerkinfrastrukturen, Online-Informationsressourcen und Tools bezieht, die Wissenschaftler heutzutage nutzen, d.h. die weltweite akademische e-IS, oder akademische Infrastruktur (Edwards et al. 2013). Die zentrale Forschungsfrage (RQ1), die in dieser Dissertation beantwortet wird, ist: Gibt es, gemÀà der multi-sited ethnographischen Analyse der beteiligten Wissenschaftler in dieser StudieâPersonen, die Forschung in verschiedenen Disziplinen und Institutionen sowie an unterschiedlichen Standorten betreibenâHinweise darauf, dass ein signifikanter Anteil der nicht-institutionellen/informellen informationsbezogenen Forschung ĂŒber Mechanismen im GNAE, die nicht von Bibliotheken unterstĂŒtzt werden, betrieben wird, sowie (RQ2): Was fĂŒr Muster sind vorhanden und wie beziehen sie sich auf informationswissenschaftliche und andere sozialwissenschaftliche Theorien? Und drittens (RQ3): Haben die Resultate praxisnahe Bedeutungen fĂŒr die Entwicklung von Dienstleistungen in wissenschaftlichen Bibliotheken? Ethnographische Strategien sind bisher noch nicht in der Informationswissenschaft (IS) eingesetzt worden, um Fragen dieser Art zu untersuchen. Die Ergebnisse zeigen, dass eine informelle Informationsexploration nur bei zwei Wissenschaftlern, die mit offenen Daten und Tools einer verteilten Computing-Infrastruktur arbeiten, zu finden ist.In this dissertation I examine the pathways of information exploration and discovery of six scientists working in different research disciplines affiliated with several academic institutions in the United States and in the Czech Republic. To do so, I utilize multi-sited ethnographic methodological strategies (i.e., strategies developed by anthropologists to compare cultures across two or more geographic locations) to examine the information-related behaviors of these scholars within the global networked academic environment (GNAE), a term which specifically refers to the complex bricolage of network infrastructures, online information resources, and tools scholars use to perform their research today (i.e., the worldwide academic e-IS, or academic infrastructure [Edwards et al. 2013]). The central research question (RQ1) to be answered in this dissertation: According to the multi-sited ethnographic analysis of scientists participating in this studyâindividuals conducting research in various disciplines at different institutions in several geographical locationsâis there evidence indicating a significant allotment of non-institutional/informal information-related exploration and discovery occurring beyond official library-supported mechanisms in the GNAE?, andâpart two (RQ2) of the central research questionâWhat (if any) patterns are exhibited and how do these patterns relate to information science (IS) and other social science theories? Both RQ1 and RQ2 are exploratory. I additionally ask (RQ3): What might all this mean in the applied sense? by showing examples of services piloted during the research process in response to my observations in the field. Multi-sited ethnographic strategies have not yet been employed in IS, as of the date of publication of this thesis, to examine such questions. Results indicate informal information exploration occurring only with two scientists who use of open data and tools on a distributed computing infrastructure
Artificial Intelligence for Drug Discovery: Are We There Yet?
Drug discovery is adapting to novel technologies such as data science,
informatics, and artificial intelligence (AI) to accelerate effective treatment
development while reducing costs and animal experiments. AI is transforming
drug discovery, as indicated by increasing interest from investors, industrial
and academic scientists, and legislators. Successful drug discovery requires
optimizing properties related to pharmacodynamics, pharmacokinetics, and
clinical outcomes. This review discusses the use of AI in the three pillars of
drug discovery: diseases, targets, and therapeutic modalities, with a focus on
small molecule drugs. AI technologies, such as generative chemistry, machine
learning, and multi-property optimization, have enabled several compounds to
enter clinical trials. The scientific community must carefully vet known
information to address the reproducibility crisis. The full potential of AI in
drug discovery can only be realized with sufficient ground truth and
appropriate human intervention at later pipeline stages.Comment: 30 pages, 4 figures, 184 reference
- âŠ