202 research outputs found

    Developing a European grid infrastructure for cancer research: vision, architecture and services

    Get PDF
    Life sciences are currently at the centre of an information revolution. The nature and amount of information now available opens up areas of research that were once in the realm of science fiction. During this information revolution, the data-gathering capabilities have greatly surpassed the data-analysis techniques. Data integration across heterogeneous data sources and data aggregation across different aspects of the biomedical spectrum, therefore, is at the centre of current biomedical and pharmaceutical R&D

    Automated workflow-based exploitation of pathway databases provides new insights into genetic associations of metabolite profiles

    Get PDF
    Background: Genome-wide association studies (GWAS) have identified many common single nucleotide polymorphisms (SNPs) that associate with clinical phenotypes, but these SNPs usually explain just a small part of the heritability and have relatively modest effect sizes. In contrast, SNPs that associate with metabolite levels generally explain a higher percentage of the genetic variation and demonstrate larger effect sizes. Still, the discovery of SNPs associated with metabolite levels is challenging since testing all metabolites measured in typical metabolomics studies with all SNPs comes with a severe multiple testing penalty. We have developed an automated workflow approach that utilizes prior knowledge of biochemical pathways present in databases like KEGG and BioCyc to generate a smaller SNP set relevant to the metabolite. This paper explores the opportunities and challenges in the analysis of GWAS of metabolomic phenotypes and provides novel insights into the genetic basis of metabolic variation through the re-analysis of published GWAS datasets. Results: Re-analysis of the published GWAS dataset from Illig et al. (Nature Genetics, 2010) using a pathway-based workflow (http://www.myexperiment.org/packs/319.html), confirmed previously identified hits and identified a new locus of human metabolic individuality, associating Aldehyde dehydrogenase family1 L1 (ALDH1L1) with serine/glycine ratios in blood. Replication in an independent GWAS dataset of phospholipids (Demirkan et al., PLoS Genetics, 2012) identified two novel loci supported by additional literature evidence: GPAM (Glycerol-3 phosphate acyltransferase) and CBS (Cystathionine beta-synthase). In addition, the workflow approach provided novel insight into the affected pathways and relevance of some of these gene-metabolite pairs in disease development and progression. Conclusions: We demonstrate the utility of automated exploitation of background knowledge present in pathway databases for the analysis of GWAS datasets of metabolomic phenotypes. We report novel loci and potential biochemical mechanisms that contribute to our understanding of the genetic basis of metabolic variation and its relationship to disease development and progression

    Digital Scientific Practice in Systems Medicine

    Get PDF
    The development of systems medicine has only been possible through the application of Information and Communication Technology (ICT) to handle the large volume and variety of data on health and disease. High-capacity databases and infrastructures based on ICT were established to support systematization and integration of data about complex physiological and pathological processes in cells and organisms. Although such infrastructures are essential for research and collaboration, they are often not regarded as being an integral part of the knowledge production itself. On the contrary, we argue that ICT is not only a science-supporting technology, but is deeply engraved in its scientific practices of knowledge generation. Findings supporting our argument are derived from an empirical case study in which we analysed the complex and dynamic relationship between systems medicine and information technology. The case under study was an international research project in which an integrated European ICT infrastructure was designed and developed in support of the systems oriented research community in oncology. By tracing the specific ways that systems medicine research produces, stores, and manages data in an ICT environment, this paper discusses the impact of ICT employed and to assess the consequences it may have for the process of knowledge production in systems medicine

    Integration of Data Mining into Scientific Data Analysis Processes

    Get PDF
    In recent years, using advanced semi-interactive data analysis algorithms such as those from the field of data mining gained more and more importance in life science in general and in particular in bioinformatics, genetics, medicine and biodiversity. Today, there is a trend away from collecting and evaluating data in the context of a specific problem or study only towards extensively collecting data from different sources in repositories which is potentially useful for subsequent analysis, e.g. in the Gene Expression Omnibus (GEO) repository of high throughput gene expression data. At the time the data are collected, it is analysed in a specific context which influences the experimental design. However, the type of analyses that the data will be used for after they have been deposited is not known. Content and data format are focused only to the first experiment, but not to the future re-use. Thus, complex process chains are needed for the analysis of the data. Such process chains need to be supported by the environments that are used to setup analysis solutions. Building specialized software for each individual problem is not a solution, as this effort can only be carried out for huge projects running for several years. Hence, data mining functionality was developed to toolkits, which provide data mining functionality in form of a collection of different components. Depending on the different research questions of the users, the solutions consist of distinct compositions of these components. Today, existing solutions for data mining processes comprise different components that represent different steps in the analysis process. There exist graphical or script-based toolkits for combining such components. The data mining tools, which can serve as components in analysis processes, are based on single computer environments, local data sources and single users. However, analysis scenarios in medical- and bioinformatics have to deal with multi computer environments, distributed data sources and multiple users that have to cooperate. Users need support for integrating data mining into analysis processes in the context of such scenarios, which lacks today. Typically, analysts working with single computer environments face the problem of large data volumes since tools do not address scalability and access to distributed data sources. Distributed environments such as grid environments provide scalability and access to distributed data sources, but the integration of existing components into such environments is complex. In addition, new components often cannot be directly developed in distributed environments. Moreover, in scenarios involving multiple computers, multiple distributed data sources and multiple users, the reuse of components, scripts and analysis processes becomes more important as more steps and configuration are necessary and thus much bigger efforts are needed to develop and set-up a solution. In this thesis we will introduce an approach for supporting interactive and distributed data mining for multiple users based on infrastructure principles that allow building on data mining components and processes that are already available instead of designing of a completely new infrastructure, so that users can keep working with their well-known tools. In order to achieve the integration of data mining into scientific data analysis processes, this thesis proposes an stepwise approach of supporting the user in the development of analysis solutions that include data mining. We see our major contributions as the following: first, we propose an approach to integrate data mining components being developed for a single processor environment into grid environments. By this, we support users in reusing standard data mining components with small effort. The approach is based on a metadata schema definition which is used to grid-enable existing data mining components. Second, we describe an approach for interactively developing data mining scripts in grid environments. The approach efficiently supports users when it is necessary to enhance available components, to develop new data mining components, and to compose these components. Third, building on that, an approach for facilitating the reuse of existing data mining processes based on process patterns is presented. It supports users in scenarios that cover different steps of the data mining process including several components or scripts. The data mining process patterns support the description of data mining processes at different levels of abstraction between the CRISP model as most general and executable workflows as most concrete representation

    CLINICAL GENOMIC RESEARCH MANAGEMENT

    Get PDF
    Technological advancement in Genomics has propelled research in a new era, where methods of conducting experiments have completely been renovated. Riding the wave of Information Technology, equipped with statistical tools, Genomics provide a revolutionized perspective unthought-of in the past. With the completion of the Human Genome project, we have a common reference for analysis at the level of the complete genome. High throughput technologies for gene expression, genotyping and sequencing are propelling present research. Attempts are now being made for the incorporation of these methods in the health care in a structured format. Clinicians cherish the use of genomics for the assessment disease predisposition and realizing personalized medical care for a better health care. As genome sequencing is becoming swifter and its cost reducing, the public genomic data has increased many folds. Data from other high throughput technologies and annotations further increase the storage requirements. Laboratory management software, LIMS, is now becoming the limiting factor as automation and integration increases. Thus genomics now faces the challenge of management of this enormous data catering to varied needs, not limited only for the research laboratories, but extends also to health care institutions and individual clinicians. Further, there is a growing need for the analysis and visualization of the generated data to be integrated into the same platform for a continuous research experience and systematic supervision. Data security is of prime concern, especially in health care concerning human subjects. The interest of the clinicians adds another management requirement, a delivery system for the concerned subject. Hypertension is a complex disorder with world-wide prevalence. HYPERGENES project was centered on the objective of integrating biological data and processes with Hypertension as the disease model. The HYPERGENES project focuses on the definition of a comprehensive genetic epidemiological model of complex traits like Essential Hypertension (EH) and intermediate phenotypes of hypertension such as Target Organ Damage (TOD). During the HYPERGENES project, the above mentioned challenges were comprehended and evaluated, leading to the present work as an endeavor to provide a generalized integrated solution towards the management of genomic and clinical data for clinical genomic research. This PhD thesis represents the description of AD2BioDB, biological data management platform and SeqPipe, dynamic pipeline management software, in the path of meeting the challenges posed in the area of clinical genomics. AD2BioDB provides the platform where data generated using different technologies can be managed and analyzed with reporting and visualization modules for improved understanding of the results among all research collaborators. AD2BioDB is the management software environment in which the in-silico data can be shared and analyzed. The analysis software is connected within AD2BioDB through the plug-in system. SeqPipe software provides opportunity to dynamically create pipeline workflows for the multi-step analysis of data. The interactive graphical user interface provides the opportunity for coding free pipeline creation and analysis. This tool is especially useful in the dynamic NGS analysis, where multiple tools i with different versions are in use. SeqPipe can be used as independent software or as a plug-in analysis tool within an application like AD2BioDB. The key features of AD2BioDB can be summarized as: \uf0b7 Clinical genomics data management \uf0b7 Project management \uf0b7 Data security \uf0b7 Dynamic creation of graphical representation. \uf0b7 Distributed workflow analysis \uf0b7 Reporting and alert features. \uf0b7 Dynamic integration of high throughput technologies We developed AD2BioDB as a prototype in our laboratory for providing support to the increasing genomic data and complexity of analysis. The software aims at providing a continuous research experience with a versatile platform that supports data management, analysis and public knowledge integration. Through the integration of SeqPipe into AD2BioDB, the management system becomes robust in providing a distributed analysis environment

    Ovarian cancer proteomics

    Get PDF

    How have advances in genetic technology modified movement disorder nosology?

    Get PDF
    The role of genetics and its technological development have been fundamental in advancing the field of movement disorders, opening the door to precision medicine. Starting from the revolutionary discovery of the locus of the Huntington’s disease gene, we review the milestones of genetic discoveries in movement disorders and their impact on clinical practice and research efforts. Before the 1980s, early techniques did not allow the identification of genetic alteration in complex diseases. Further advances increasingly defined a large number of pathogenic genetic alterations. Moreover, these techniques allowed epigenomic, transcriptomic and microbiome analyses. In the 2020s, these new technologies are poised to displace phenotype-based classifications towards a nosology based on genetic/biological data. Advances in genetic technologies are engineering a reversal of the phenotype-to-genotype order of nosology development, replacing convergent clinicopathological disease models with the genotypic divergence required for future precision medicine applications.Fil: Sturchio, A.. University of Cincinnati; Estados UnidosFil: Marsili, L.. University of Cincinnati; Estados UnidosFil: Mahajan, A.. University of Cincinnati; Estados UnidosFil: Grimberg, M.B.. University of Cincinnati; Estados UnidosFil: Kauffman, Marcelo Andres. Universidad Austral. Facultad de Ciencias Biomédicas. Instituto de Investigaciones en Medicina Traslacional. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones en Medicina Traslacional; ArgentinaFil: Espay, A.J.. University of Cincinnati; Estados Unido
    • …
    corecore