5,991 research outputs found
Exploiting Conceptual Modeling for Searching Genomic Metadata: A Quantitative and Qualitative Empirical Study
Providing a common data model for the metadata of several heterogenous genomic data sources is hard, as they do not share any standard or agreed practice for metadata description.
Two years ago we managed to discover a subset of common metadata present in most sources and to organize it as a smart genomic conceptual model (GCM); the model has been instrumental to our efforts in the development of a major software pipeline for data integration.
More recently, we developed a user-friendly search interface, based on a simplified version of GCM. In this paper, we report our evaluation of the effectiveness of this new user interface. Specifically, we present the results of a compendious empirical study to answer the research question:
How much is such a simple interface well-understood by a standard user? The target of this study is a mixed population, composed by biologists, bioinformaticians and computer scientists.
The result of our empirical study shows that the users were successful in producing search queries starting from their natural language description, as they did it with good accuracy and small error rate.
The study also shows that most users were generally satisfied; it provides indications on how to improve our search system and how to continue our effort in integration of genomic sources.
We are consequently adapting the user interface, that will be soon opened to public use
Conceptual models and databases for searching the genome
Genomics is an extremely complex domain, in terms of concepts, their relations, and their representations in data. This tutorial introduces the use of ER models in the context of genomic systems: conceptual models are of great help for simplifying this domain and making it actionable. We carry out a review of successful models presented in the literature for representing biologically relevant entities and grounding them in databases. We draw a difference between conceptual models that aim to explain the domain and conceptual models that aim to support database design and heterogeneous data integration. Genomic experiments and/or sequences are described by several metadata, specifying information on the sampled organism, the used technology, and the organizational process behind the experiment. Instead, we call data the actual regions of the genome that have been read by sequencing technologies and encoded into a machiner readable representation. First, we show how data and metadata can be modeled, then we exploit the proposed models for designing search systems, visualizers, and analysis environments. Both domains of human genomics and viral genomics are addressed, surveying several use cases and applications of broader public interest. The tutorial is relevant to the EDBT community because it demonstrates the usefulness of conceptual models’ principles within very current domains; in addition, it offers a concrete example of conceptual models’ use, setting the premises for interdisciplinary collaboration with a greater public (possibly including life science researchers)
Combined population dynamics and entropy modelling supports patient stratification in chronic myeloid leukemia
Modelling the parameters of multistep carcinogenesis is key for a better understanding of cancer
progression, biomarker identification and the design of individualized therapies. Using chronic
myeloid leukemia (CML) as a paradigm for hierarchical disease evolution we show that combined
population dynamic modelling and CML patient biopsy genomic analysis enables patient stratification
at unprecedented resolution. Linking CD34+ similarity as a disease progression marker to patientderived
gene expression entropy separated established CML progression stages and uncovered
additional heterogeneity within disease stages. Importantly, our patient data informed model enables
quantitative approximation of individual patients’ disease history within chronic phase (CP) and
significantly separates “early” from “late” CP. Our findings provide a novel rationale for personalized
and genome-informed disease progression risk assessment that is independent and complementary to
conventional measures of CML disease burden and prognosis
- …