5 research outputs found
Improving average ranking precision in user searches for biomedical research datasets
Availability of research datasets is keystone for health and life science
study reproducibility and scientific progress. Due to the heterogeneity and
complexity of these data, a main challenge to be overcome by research data
management systems is to provide users with the best answers for their search
queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we
investigate a novel ranking pipeline to improve the search of datasets used in
biomedical experiments. Our system comprises a query expansion model based on
word embeddings, a similarity measure algorithm that takes into consideration
the relevance of the query terms, and a dataset categorisation method that
boosts the rank of datasets matching query constraints. The system was
evaluated using a corpus with 800k datasets and 21 annotated user queries. Our
system provides competitive results when compared to the other challenge
participants. In the official run, it achieved the highest infAP among the
participants, being +22.3% higher than the median infAP of the participant's
best submissions. Overall, it is ranked at top 2 if an aggregated metric using
the best official measures per participant is considered. The query expansion
method showed positive impact on the system's performance increasing our
baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively.
Our similarity measure algorithm seems to be robust, in particular compared to
Divergence From Randomness framework, having smaller performance variations
under different training conditions. Finally, the result categorization did not
have significant impact on the system's performance. We believe that our
solution could be used to enhance biomedical dataset management systems. In
particular, the use of data driven query expansion methods could be an
alternative to the complexity of biomedical terminologies
On human gut microbial ecosystem: In vitro experiment, in vivo study and mathematical modelling.
The human gut microbiota is considered to be a highly specialized organ providing nourishment, regulating epithelial cell development, modulating innate immune responses and colonization resistances, and it significantly impacts human health and disease. Dispite of being extensively studied for several decades, the functionality of the microbiota colonization in the human gastrointestinal tract and the mechanisms of the interactions between the host and bacteria are still poorly understood. This research follows a novel and unique approach, which combines the complementary strengths of in vitro experiment, in vivo study and mathematical modelling. The work undertaken has three emphases: 1) probiotic strains and their impact on human health; 2) the development of gut microbiota in infants; 3) quantification of human gut microbial ecosystem at both the species level and the system level. In the first part of this research, a versatile anaerobic continuous culture platform was implemented following a novel and unique design, which allows easy and continuous sampling and monitoring of microbial growth. A number of carefully planned in vitro experiments have been conducted to investigate the growth and competition of probiotic strains under different culture conditions. These in vitro experiments improve the understanding for the growth behaviour of the specific probiotic strains. The second part of this project analyzed 50 faecal samples collected from 9 healthy infants with administration of probiotic strains and placebo. The analysis is based on the 454-pyrosequencing technology, which reveals the complete profiles of gut microbiota in these infants and confirmed the modulation effect of the specific probiotic strains. The last part of this research focused on the development of mathematical and computational models of human gut microbial ecosystem. The outcome from this part of the research includes: a) a new bacterial growth model that overcomes the parodox of competitative exclusion caused by previous models; b) a versatile computational framework to simulate in vitro fermentation experiments; and c) a comprehensive mathematical model for human gut and gut microbiota that is the first model for its nature