1,319 research outputs found
Explorative search of distributed bio-data to answer complex biomedical questions
Background
The huge amount of biomedical-molecular data increasingly produced is providing scientists with potentially valuable information. Yet, such data quantity makes difficult to find and extract those data that are most reliable and most related to the biomedical questions to be answered, which are increasingly complex and often involve many different biomedical-molecular aspects. Such questions can be addressed only by comprehensively searching and exploring different types of data, which frequently are ordered and provided by different data sources. Search Computing has been proposed for the management and integration of ranked results from heterogeneous search services. Here, we present its novel application to the explorative search of distributed biomedical-molecular data and the integration of the search results to answer complex biomedical questions.
Results
A set of available bioinformatics search services has been modelled and registered in the Search Computing framework, and a Bioinformatics Search Computing application (Bio-SeCo) using such services has been created and made publicly available at http://www.bioinformatics.deib.polimi.it/bio-seco/seco/. It offers an integrated environment which eases search, exploration and ranking-aware combination of heterogeneous data provided by the available registered services, and supplies global results that can support answering complex multi-topic biomedical questions.
Conclusions
By using Bio-SeCo, scientists can explore the very large and very heterogeneous biomedical-molecular data available. They can easily make different explorative search attempts, inspect obtained results, select the most appropriate, expand or refine them and move forward and backward in the construction of a global complex biomedical query on multiple distributed sources that could eventually find the most relevant results. Thus, it provides an extremely useful automated support for exploratory integrated bio search, which is fundamental for Life Science data driven knowledge discovery
Recommended from our members
GenEpi: gene-based epistasis discovery using machine learning.
BackgroundGenome-wide association studies (GWAS) provide a powerful means to identify associations between genetic variants and phenotypes. However, GWAS techniques for detecting epistasis, the interactions between genetic variants associated with phenotypes, are still limited. We believe that developing an efficient and effective GWAS method to detect epistasis will be a key for discovering sophisticated pathogenesis, which is especially important for complex diseases such as Alzheimer's disease (AD).ResultsIn this regard, this study presents GenEpi, a computational package to uncover epistasis associated with phenotypes by the proposed machine learning approach. GenEpi identifies both within-gene and cross-gene epistasis through a two-stage modeling workflow. In both stages, GenEpi adopts two-element combinatorial encoding when producing features and constructs the prediction models by L1-regularized regression with stability selection. The simulated data showed that GenEpi outperforms other widely-used methods on detecting the ground-truth epistasis. As real data is concerned, this study uses AD as an example to reveal the capability of GenEpi in finding disease-related variants and variant interactions that show both biological meanings and predictive power.ConclusionsThe results on simulation data and AD demonstrated that GenEpi has the ability to detect the epistasis associated with phenotypes effectively and efficiently. The released package can be generalized to largely facilitate the studies of many complex diseases in the near future
Data Integration in the Life Sciences: Scientific Workflows, Provenance, and Ranking
Biological research is a science which derives its findings from the proper analysis of experiments. Today, a large variety of experiments are carried-out in hundreds of labs around the world, and their results are reported in a myriad of different databases, web-sites, publications etc., using different formats, conventions, and schemas. Providing a uniform access to these diverse and distributed databases is the aim of data integration solutions, which have been designed and implemented within the bioinformatics community for more than 20 years. However, the perception of the problem of data integration research in the life sciences has changed: While early approaches concentrated on handling schema-dependent queries over heterogeneous and distributed databases, current research emphasizes instances rather than schemas, tries to place the human back into the loop, and intertwines data integration and data analysis. Transparency -- providing users with the illusion that they are using a centralized database and thus completely hiding the original databases -- was one of the main goals of federated databases. It is not a target anymore. Instead, users want to know exactly which data from which source was used in which way in studies (Provenance). The old model of "first integrate, then analyze" is replaced by a new, process-oriented paradigm: "integration is analysis - and analysis is integration". This paradigm change gives rise to some important research trends. First, the process of integration itself, i.e., the integration workflow, is becoming a research topic in its own. Scientific workflows actually implement the paradigm "integration is analysis". A second trend is the growing importance of sensible ranking, because data sets grow and grow and it becomes increasingly difficult for the biologist user to distinguish relevant data from large and noisy data sets. This HDR thesis outlines my contributions to the field of data integration in the life sciences. More precisely, my work takes place in the first two contexts mentioned above, namely, scientific workflows and biological data ranking. The reported results were obtained from 2005 to late 2014, first as a postdoctoral fellow at the Uniersity of Pennsylvania (Dec 2005 to Aug 2007) and then as an Associate Professor at Université Paris-Sud (LRI, UMR CNRS 8623, Bioinformactics team) and Inria (Saclay-Ile-de-France, AMIB team 2009-2014)
A comparison between the contexts learners in Grades 8, 9 and 10 prefer for mathematical literacy and gender
Magister Educationis - MEdFor many years, there have been calls for the mathematics curriculum in South African schools to be made more meaningful and relevant to young people's everyday lives. Despite efforts to address this issue, there is a widespread perception wihtin the mathematics education community that much remains to be seen. Broadly, this study focused on the contexts preferred by grade 8 , 9 and 10 learners as a domain in which to embed mathematics. The particular focus was on whether gender played a role in the preferences expressed by these learners for contexts.South Afric
Étude de la médiane de permutations sous la distance de Kendall-Tau
La distance de Kendall-τ compte le nombre de paires en désaccord entre deux permuta-
tions. La distance d’une permutation à un ensemble est simplement la somme des dis-
tances entre cette permutation et les permutations de l’ensemble. À partir d’un ensemble
donné de permutations, notre but est de trouver la permutation, appelée médiane, qui
minimise cette distance à l’ensemble.
Le problème de la médiane de permutations sous la distance de Kendall-τ, trouve
son application en bio-informatique, en science politique, en télécommunication et en
optimisation.
Ce problème d’apparence simple est prouvé difficile à résoudre. Dans ce mémoire,
nous présentons plusieurs approches pour résoudre le problème, pour trouver une bonne
solution approximative, pour le séparer en classes caractéristiques, pour mieux com-
prendre sa compléxité, pour réduire l’espace de recheche et pour accélérer les calculs.
Nous présentons aussi, vers la fin du mémoire, une généralisation de ce problème et nous
l’étudions avec ces mêmes approches.
La majorité du travail de ce mémoire se situe dans les trois articles qui le composent
et est complémenté par deux chapitres servant à les lier.The Kendall-τ distance counts the number of pairwise disagreements between two
permutations. The distance between a permutation and a set is simply the sum of the
distances between the considered permutation and the permutations of the set. Given a
set of permutations, we want to find the permutation, called median, that minimise that
distance to the set.
The problem of finding a median of permutations under the Kendall-Ď„ distance, finds
applications in bioinformatics, political science, telecommunications and optimization.
This simple appearing problem is proven difficult to solve. In this master thesis, we
present a few approaches to solve the problem, to find a good approximate solution, to
separate it into caracteristic classes, to deepen our understanding of its complexity, to
reduce the search space and to accelerate calculations. We also present, at the end of this
thesis, a generalization of this problem and we study it with the same approaches.
The majority of the work in this thesis is located in the three papers which compose
it and is complemented by two chapters, that bound them all together
- …