7 research outputs found

    Enhancing the usability and performance of structured association mapping algorithms using automation, parallelization, and visualization in the GenAMap software system

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Structured association mapping is proving to be a powerful strategy to find genetic polymorphisms associated with disease. However, these algorithms are often distributed as command line implementations that require expertise and effort to customize and put into practice. Because of the difficulty required to use these cutting-edge techniques, geneticists often revert to simpler, less powerful methods.</p> <p>Results</p> <p>To make structured association mapping more accessible to geneticists, we have developed an automatic processing system called Auto-SAM. Auto-SAM enables geneticists to run structured association mapping algorithms automatically, using parallelization. Auto-SAM includes algorithms to discover gene-networks and find population structure. Auto-SAM can also run popular association mapping algorithms, in addition to five structured association mapping algorithms.</p> <p>Conclusions</p> <p>Auto-SAM is available through GenAMap, a front-end desktop visualization tool. GenAMap and Auto-SAM are implemented in JAVA; binaries for GenAMap can be downloaded from <url>http://sailing.cs.cmu.edu/genamap</url>.</p

    Finding genome-transcriptome-phenome association with structured association mapping and visualization in GenAMap.

    No full text
    <p>Despite the success of genome-wide association studies in detecting novel disease variants, we are still far from a complete understanding of the mechanisms through which variants cause disease. Most of previous studies have considered only genome-phenome associations. However, the integration of transcriptome data may help further elucidate the mechanisms through which genetic mutations lead to disease and uncover potential pathways to target for treatment. We present a novel structured association mapping strategy for finding genome-transcriptome-phenome associations when SNP, gene-expression, and phenotype data are available for the same cohort. We do so via a two-step procedure where genome-transcriptome associations are identified by GFlasso, a sparse regression technique presented previously. Transcriptome-phenome associations are then found by a novel proposed method called gGFlasso, which leverages structure inherent in the genes and phenotypic traits. Due to the complex nature of three-way association results, visualization tools can aid in the discovery of causal SNPs and regulatory mechanisms affecting diseases. Using wellgrounded visualization techniques, we have designed new visualizations that filter through large three-way association results to detect interesting SNPs and associated genes and traits. The two-step GFlasso-gGFlasso algorithmic approach and new visualizations are integrated into GenAMap, a visual analytics system for structured association mapping. Results on simulated datasets show that our approach has the potential to increase the sensitivity and specificity of association studies, compared to existing procedures that do not exploit the full structural information of the data. We report results from an analysis on a publically available mouse dataset, showing that identified SNP-gene-trait associations are compatible with known biology.</p

    Analysing datafied life

    No full text
    Our life is being increasingly quantified by data. To obtain information from quantitative data, we need to develop various analysis methods, which can be drawn from diverse fields, such as computer science, information theory and statistics. This thesis focuses on investigating methods for analysing data generated for medical research. Its focus is on the purpose of using various data to quantify patients for personalized treatment. From the perspective of data type, this thesis proposes analysis methods for the data from the fields of Bioinformatics and medical imaging. We will discuss the need of using data from molecular level to pathway level and also incorporating medical imaging data. Different preprocessing methods should be developed for different data types, while some post-processing steps for various data types, such as classification and network analysis, can be done by a generalized approach. From the perspective of research questions, this thesis studies methods for answering five typical questions from simple to complex. These questions are detecting associations, identifying groups, constructing classifiers, deriving connectivity and building dynamic models. Each research question is studied in a specific field. For example, detecting associations is investigated for fMRI signals. However, the proposed methods can be naturally extended to solve questions in other fields. This thesis has successfully demonstrated that applying a method traditionally used in one field to a new field can bring lots of new insights. Five main research contributions for different research questions have been made in this thesis. First, to detect active brain regions associated to tasks using fMRI signals, a new significance index, CR-value, has been proposed. It is originated from the idea of using sparse modelling in gene association study. Secondly, in quantitative Proteomics analysis, a clustering based method has been developed to extract more information from large scale datasets than traditional methods. Clustering methods, which are usually used in finding subgroups of samples or features, are used to match similar identities across samples. Thirdly, a pipeline originally proposed in the field of Bioinformatics has been adapted to multivariate analysis of fMRI signals. Fourthly, the concept of elastic computing in computer science has been used to develop a new method for generating functional connectivity from fMRI data. Finally, sparse signal recovery methods from the domain of signal processing are suggested to solve the underdetermined problem of network model inference.Open Acces
    corecore