379 research outputs found

    An Outranking Approach for Gene Prioritization Using Multinetworks

    Get PDF
    High-throughput experimental techniques such as genome-wide association studies have been instrumental in the identification of disease-associated genes. These methods often produce large lists of disease candidate genes which are time-consuming and expensive to experimentally validate. Computational gene prioritization methods are required to identify relevant genes from a larger pool of candidates. Research has shown that the integration of diverse “omic” evidence can reduce the candidate-gene search space. In this paper we present a general framework that integrates “omic” data using a multinetwork approach and topological analysis to prioritize disease-candidate genes. Specifically, we propose a data integration method within a multicriteria decision analysis context using aggregation mechanisms based on decision rules identifying positive and negative criteria for judging gene-candidates ranks. The proposed multinetwork disease gene prioritization method is applied to the prioritization of disease genes in ovarian cancer progression. Using this approach we uncovered known ovarian cancer genes GSTA1, ERBB2, IL1A, MAGEB2, along with significantly enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways ErbB signaling and pathways in cancer. Relatively high predictive performance (area under Receiver Operating Characteristic [ROC] curve 0.704) was observed when classifying epithelial ovarian high-grade serous carcinoma cancer early and late stage RNA-Seq expression profiles from individuals using 10-fold cross-validation

    An integrative workflow to study large-scale biochemical networks

    Get PDF
    I propose an integrative workflow to study large-scale biochemical networks by combining omics data, network structure and dynamical analysis to unravel disease mechanisms. Using the workflow, I identified core regulatory networks from the E2F1 network underlying EMT in bladder and breast cancer and detected disease signatures and drug targets, which were experimentally validated. Further, I developed a hybrid modeling framework that combines ODE- with logical-models to analyze the dynamics of large-scale non-linear systems. This thesis is a contribution to interdisciplinary cancer research

    Topological Data Analysis of High-dimensional Correlation Structures with Applications in Epigenetics

    Get PDF
    This thesis comprises a comprehensive study of the correlation of highdimensional datasets from a topological perspective. Derived from a lack of efficient algorithms of big data analysis and motivated by the importance of finding a structure of correlations in genomics, we have developed two analytical tools inspired by the topological data analysis approach that describe and predict the behavior of the correlated design. Those models allowed us to study epigenetic interactions from a local and global perspective, taking into account the different levels of complexity. We applied graph-theoretic and algebraic topology principles to quantify structural patterns on local correlation networks and, based on them, we proposed a network model that was able to predict the locally high correlations of DNA methylation data. This model provided with an efficient tool to measure the evolution of the correlation with the aging process. Furthermore, we developed a powerful computational algorithm to analyze the correlation structure globally that was able to detect differentiated methylation patterns over sample groups. This methodology aimed to serve as a diagnostic tool, as it provides with selected epigenetic biomarkers associated with a specific phenotype of interest. Overall, this work establishes a novel perspective of analysis and modulation of hidden correlation structures, specifically those of great dimension and complexity, contributing to the understanding of the epigenetic processes, and that is designed to be useful for non-biological fields too

    Statistical and integrative system-level analysis of DNA methylation data

    Get PDF
    Epigenetics plays a key role in cellular development and function. Alterations to the epigenome are thought to capture and mediate the effects of genetic and environmental risk factors on complex disease. Currently, DNA methylation is the only epigenetic mark that can be measured reliably and genome-wide in large numbers of samples. This Review discusses some of the key statistical challenges and algorithms associated with drawing inferences from DNA methylation data, including cell-type heterogeneity, feature selection, reverse causation and system-level analyses that require integration with other data types such as gene expression, genotype, transcription factor binding and other epigenetic information

    Flexible model-based joint probabilistic clustering of binary and continuous inputs and its application to genetic regulation and cancer

    Get PDF
    Clustering is used widely in ‘omics’ studies and is often tackled with standard methods such as hierarchical clustering or k-means which are limited to a single data type. In addition, these methods are further limited by having to select a cut-off point at specific level of dendrogram- a tree diagram or needing a pre-defined number of clusters respectively. The increasing need for integration of multiple data sets leads to a requirement for clustering methods applicable to mixed data types, where the straightforward application of standard methods is not necessarily the best approach. A particularly common problem involves clustering entities characterized by a mixture of binary data, for example, presence or absence of mutations, binding, motifs, and/or epigenetic marks and continuous data, for example, gene expression, protein abundance and/or metabolite levels. In this work, we presented a generic method based on a probabilistic model for clustering this mixture of data types, and illustrate its application to genetic regulation and the clustering of cancer samples. It uses penalized maximum likelihood (ML) estimation of mixture model parameters using information criteria (model selection objective function) and meta-heuristic searches for optimum clusters. Compatibility of several information criteria with our model-based joint clustering was tested, including the well-known Akaike Information Criterion (AIC) and its empirically determined derivatives (AICλ), Bayesian Information Criterion (BIC) and its derivative (CAIC), and Hannan-Quinn Criterion (HQC). We have experimentally shown with simulated data that AIC and AIC (λ=2.5) worked well with our method. We show that the resulting clusters lead to useful hypotheses: in the case of genetic regulation these concern regulation of groups of genes by specific sets of transcription factors and in the case of cancer samples combinations of gene mutations are related to patterns of gene expression. The clusters have potential mechanistic significance and in the latter case are significantly linked to survival

    Shining a light: Active participation in a mental health Internet support group

    Get PDF
    Internet Support Groups (ISGs) are a valued and popular source of health information and support among consumers and carers. Although ISGs are premised upon mutual help, it has been observed that only a small minority of users, of the order of 1%, are responsible for the majority of activity. Despite their potential importance to the outcomes and sustainability of online groups, little is known about the characteristics of these participants or the nature of their participation. This thesis comprises a systematic review of the literature on styles of participation in ISGs followed by a series of five empirical studies focusing on the nature of participation in a Mental Health Internet Support Group (MHISG). These studies sought to address fundamental gaps in our knowledge regarding active participation in an MHISG, posing the questions: ‘Who participates?’, ‘With whom do they communicate?’, ‘What do they communicate about?’ and ‘How do these factors differ as a function of user engagement?’. These questions were addressed using log data generated by all active users (n=2932) of the MHISG ‘BlueBoard’ and a mixture of qualitative and quantitative methods including novel analyses, such as social network modularity and topic modelling algorithms. It was found that the demographic characteristics of higher- and lower-engaged users were broadly similar, although the members of the higher-engaged group were older and more likely to identify as consumers. Network analysis demonstrated users communicated with each other in a pattern that resembled five generational cohorts transcending disorder-specific subforums, in which the highest-engaged users of each cohort were central and registered earlier than the majority of other users. Topic modelling and qualitative content analysis revealed the content of the communications of the two groups differed. The communications of higherengaged users appeared to reflect a consumer model of recovery and those of lower-engaged users a medical model of recovery. However, higher-engaged users modified the content of their responses when communicating with lower-engaged users. Qualitative analysis of users’ initial posts revealed higher- and lower-engaged users differed in terms of their ‘awareness’ characteristics at the outset of participation, with higher-engaged users demonstrating greater interpersonal-, mental health- and self-awareness. Based on these findings, this thesis presents ‘The Tripartite Model of MHISG Participation’ which, contrary to prevailing assumptions, posits that differences in posting frequency are associated with different styles of active participation across the spectrum of engagement. The higher end comprises a minority group of users—referred to as ‘mutual helpers’—who are central, aware and proactive about participating in peer support for their ongoing recovery. At the lower end, the majority of users, referred to as ‘active help seekers’ and ‘active help providers’, participate in transient and asymmetrical exchanges, often with ‘mutual helpers’. Those who do not post are ‘passive followers and help seekers’. The model is iterated for each cohort. In addition to extending our scientific knowledge base, and informing the above new model of user participation, these findings are of potential relevance to the design of future research studies, managers of Internet support groups and policy makers

    Scalable Feature Selection Applications for Genome-Wide Association Studies of Complex Diseases

    Get PDF
    Personalized medicine will revolutionize our capabilities to combat disease. Working toward this goal, a fundamental task is the deciphering of geneticvariants that are predictive of complex diseases. Modern studies, in the formof genome-wide association studies (GWAS) have aïŹ€orded researchers with the opportunity to reveal new genotype-phenotype relationships through the extensive scanning of genetic variants. These studies typically contain over half a million genetic features for thousands of individuals. Examining this with methods other than univariate statistics is a challenging task requiring advanced algorithms that are scalable to the genome-wide level. In the future, next-generation sequencing studies (NGS) will contain an even larger number of common and rare variants. Machine learning-based feature selection algorithms have been shown to have the ability to eïŹ€ectively create predictive models for various genotype-phenotype relationships. This work explores the problem of selecting genetic variant subsets that are the most predictive of complex disease phenotypes through various feature selection methodologies, including ïŹlter, wrapper and embedded algorithms. The examined machine learning algorithms were demonstrated to not only be eïŹ€ective at predicting the disease phenotypes, but also doing so eïŹƒciently through the use of computational shortcuts. While much of the work was able to be run on high-end desktops, some work was further extended so that it could be implemented on parallel computers helping to assure that they will also scale to the NGS data sets. Further, these studies analyzed the relationships between various feature selection methods and demonstrated the need for careful testing when selecting an algorithm. It was shown that there is no universally optimal algorithm for variant selection in GWAS, but rather methodologies need to be selected based on the desired outcome, such as the number of features to be included in the prediction model. It was also demonstrated that without proper model validation, for example using nested cross-validation, the models can result in overly-optimistic prediction accuracies and decreased generalization ability. It is through the implementation and application of machine learning methods that one can extract predictive genotype–phenotype relationships and biological insights from genetic data sets.Siirretty Doriast

    Genetic Modification of Inherited Retinopathy in Mice

    Get PDF
    The retina, as a critical component of the sensory system, consists of multiple cell types, of which, photoreceptors play a key role in receiving, integrating and transmitting light signals. The biofunctions of photoreceptors rely on their proper growth and development, which is predominantly governed by a cluster of molecules that comprise the transcriptional regulation for photoreceptor development. Any disruption of these molecules potentially incurs retinal pathologies. It is known that deficiencies of nuclear receptor subfamily 2 group E member 3 (NR2E3) or neural retina leucine-zipper (NRL), two molecules in regulating photoreceptor cell development, cause photoreceptor dysplasia. In a sensitized chemical mutagenesis study to identify genetic modifiers in retinal degeneration (rd) 7 mice (Nr2e3rd7), Tvrm222, was established, in which photoreceptor dysplasia was significantly rescued compared to that in Nr2e3rd7 mutants. Notably, the Tvrm222 allele also ameliorates photoreceptor dysplasia in Nrl knockout mice. According to whole-genome mapping and exome sequencing, the modifier was localized to Chromosome 6 and was identified as a missense variant in the FERM domain containing 4B (Frmd4b) gene, which is predicted to cause the substitution of serine residue 938 with proline (S938P). Furthermore, we observed that the Frmd4bTvrm222 allele preserved the integrity of the fragmented external limiting membrane (ELM) present in both rd7 and Nrl–/– mouse retinas. FRMD4B, as a binding partner of cytohesin 3 (CYTH3), has been proposed to participate in cell junction remodeling. However, its function in ELM maintenance and photoreceptor dysplasia has not been previously examined. This study revealed that the S938P variation significantly reduces in vitro membrane recruitment of FRMD4B. Notably, in an attempt to explore the molecular mechanisms underlying the modifying effect of FRMD4B938P on dysplastic retinas, we observed an increased activation of ADP-ribosylation factor 6, a direct substrate for CYTH3, both in vitro and in vivo, as well as decreased phosphorylation of AKT in Tvrm222 retinas. These changes were accompanied by an elevation in cell membrane-associated zonula adherens and occludens proteins in Tvrm222 retinas. Taken together, this study determines a critical role of FRMD4B in maintaining the integrity of adhesive support (at the ELM) and in rescuing photoreceptor dysplasia in mice
    • 

    corecore