379 research outputs found
An Outranking Approach for Gene Prioritization Using Multinetworks
High-throughput experimental techniques such as genome-wide association studies have been instrumental in the identification of disease-associated genes. These methods often produce large lists of disease candidate genes which are time-consuming and expensive to experimentally validate. Computational gene prioritization methods are required to identify relevant genes from a larger pool of candidates. Research has shown that the integration of diverse âomicâ evidence can reduce the candidate-gene search space. In this paper we present a general framework that integrates âomicâ data using a multinetwork approach and topological analysis to prioritize disease-candidate genes. Specifically, we propose a data integration method within a multicriteria decision analysis context using aggregation mechanisms based on decision rules identifying positive and negative criteria for judging gene-candidates ranks. The proposed multinetwork disease gene prioritization method is applied to the prioritization of disease genes in ovarian cancer progression. Using this approach we uncovered known ovarian cancer genes GSTA1, ERBB2, IL1A, MAGEB2, along with significantly enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways ErbB signaling and pathways in cancer. Relatively high predictive performance (area under Receiver Operating Characteristic [ROC] curve 0.704) was observed when classifying epithelial ovarian high-grade serous carcinoma cancer early and late stage RNA-Seq expression profiles from individuals using 10-fold cross-validation
An integrative workflow to study large-scale biochemical networks
I propose an integrative workflow to study large-scale biochemical networks by combining omics data, network structure and dynamical analysis to unravel disease mechanisms. Using the workflow, I identified core regulatory networks from the E2F1 network underlying EMT in bladder and breast cancer and detected disease signatures and drug targets, which were experimentally validated. Further, I developed a hybrid modeling framework that combines ODE- with logical-models to analyze the dynamics of large-scale non-linear systems. This thesis is a contribution to interdisciplinary cancer research
Topological Data Analysis of High-dimensional Correlation Structures with Applications in Epigenetics
This thesis comprises a comprehensive study of the correlation of highdimensional
datasets from a topological perspective. Derived from a lack of efficient algorithms of big data analysis
and motivated by the importance of finding a structure of correlations in genomics, we have developed two
analytical tools inspired by the topological data analysis approach that describe and predict the behavior of the
correlated design. Those models allowed us to study epigenetic interactions from a local and global perspective,
taking into account the different levels of complexity. We applied graph-theoretic and algebraic topology principles
to quantify structural patterns on local correlation networks and, based on them, we proposed a network model that
was able to predict the locally high correlations of DNA methylation data. This model provided with an efficient tool
to measure the evolution of the correlation with the aging process. Furthermore, we developed a powerful
computational algorithm to analyze the correlation structure globally that was able to detect differentiated
methylation patterns over sample groups. This methodology aimed to serve as a diagnostic tool, as it provides with
selected epigenetic biomarkers associated with a specific phenotype of interest. Overall, this work establishes a
novel perspective of analysis and modulation of hidden correlation structures, specifically those of great dimension
and complexity, contributing to the understanding of the epigenetic processes, and that is designed to be useful for
non-biological fields too
Statistical and integrative system-level analysis of DNA methylation data
Epigenetics plays a key role in cellular development and function. Alterations to the epigenome are thought to capture and mediate the effects of genetic and environmental risk factors on complex disease. Currently, DNA methylation is the only epigenetic mark that can be measured reliably and genome-wide in large numbers of samples. This Review discusses some of the key statistical challenges and algorithms associated with drawing inferences from DNA methylation data, including cell-type heterogeneity, feature selection, reverse causation and system-level analyses that require integration with other data types such as gene expression, genotype, transcription factor binding and other epigenetic information
Flexible model-based joint probabilistic clustering of binary and continuous inputs and its application to genetic regulation and cancer
Clustering is used widely in âomicsâ studies and is often tackled with standard methods such as hierarchical clustering or k-means which are limited to a single data type. In addition, these methods are further limited by having to select a cut-off point at specific level of dendrogram- a tree diagram or needing a pre-defined number of clusters respectively. The increasing need for integration of multiple data sets leads to a requirement for clustering methods applicable to mixed data types, where the straightforward application of standard methods is not necessarily the best approach. A particularly common problem involves clustering entities characterized by a mixture of binary data, for example, presence or absence of mutations, binding, motifs, and/or epigenetic marks and continuous data, for example, gene expression, protein abundance and/or metabolite levels.
In this work, we presented a generic method based on a probabilistic model for clustering this mixture of data types, and illustrate its application to genetic regulation and the clustering of cancer samples. It uses penalized maximum likelihood (ML) estimation of mixture model parameters using information criteria (model selection objective function) and meta-heuristic searches for optimum clusters. Compatibility of several information criteria with our model-based joint clustering was tested, including the well-known Akaike Information Criterion (AIC) and its empirically determined derivatives (AICλ), Bayesian Information Criterion (BIC) and its derivative (CAIC), and Hannan-Quinn Criterion (HQC). We have experimentally shown with simulated data that AIC and AIC (λ=2.5) worked well with our method.
We show that the resulting clusters lead to useful hypotheses: in the case of genetic regulation these concern regulation of groups of genes by specific sets of transcription factors and in the case of cancer samples combinations of gene mutations are related to patterns of gene expression. The clusters have potential mechanistic significance and in the latter case are significantly linked to survival
Shining a light: Active participation in a mental health Internet support group
Internet Support Groups (ISGs) are a valued and popular source of
health information and
support among consumers and carers. Although ISGs are premised
upon mutual help, it has
been observed that only a small minority of users, of the order
of 1%, are responsible for the
majority of activity. Despite their potential importance to the
outcomes and sustainability of
online groups, little is known about the characteristics of these
participants or the nature of
their participation.
This thesis comprises a systematic review of the literature on
styles of participation in ISGs
followed by a series of five empirical studies focusing on the
nature of participation in a
Mental Health Internet Support Group (MHISG). These studies
sought to address fundamental
gaps in our knowledge regarding active participation in an MHISG,
posing the questions: âWho
participates?â, âWith whom do they communicate?â, âWhat
do they communicate about?â and
âHow do these factors differ as a function of user
engagement?â. These questions were
addressed using log data generated by all active users (n=2932)
of the MHISG âBlueBoardâ and
a mixture of qualitative and quantitative methods including novel
analyses, such as social
network modularity and topic modelling algorithms.
It was found that the demographic characteristics of higher- and
lower-engaged users were
broadly similar, although the members of the higher-engaged group
were older and more
likely to identify as consumers. Network analysis demonstrated
users communicated with each
other in a pattern that resembled five generational cohorts
transcending disorder-specific subforums,
in which the highest-engaged users of each cohort were central
and registered earlier
than the majority of other users. Topic modelling and qualitative
content analysis revealed the
content of the communications of the two groups differed. The
communications of higherengaged
users appeared to reflect a consumer model of recovery and those
of lower-engaged
users a medical model of recovery. However, higher-engaged users
modified the content of
their responses when communicating with lower-engaged users.
Qualitative analysis of
usersâ initial posts revealed higher- and lower-engaged users
differed in terms of their
âawarenessâ characteristics at the outset of participation,
with higher-engaged users
demonstrating greater interpersonal-, mental health- and
self-awareness.
Based on these findings, this thesis presents âThe Tripartite
Model of MHISG Participationâ
which, contrary to prevailing assumptions, posits that
differences in posting frequency are
associated with different styles of active participation across
the spectrum of
engagement. The higher end comprises a minority group of
usersâreferred to as âmutual
helpersââwho are central, aware and proactive about
participating in peer support for their
ongoing recovery. At the lower end, the majority of users,
referred to as âactive help seekersâ
and âactive help providersâ, participate in transient and
asymmetrical exchanges, often with
âmutual helpersâ. Those who do not post are âpassive
followers and help seekersâ. The model is
iterated for each cohort. In addition to extending our scientific
knowledge base, and informing
the above new model of user participation, these findings are of
potential relevance to the
design of future research studies, managers of Internet support
groups and policy makers
Scalable Feature Selection Applications for Genome-Wide Association Studies of Complex Diseases
Personalized medicine will revolutionize our capabilities to combat disease. Working toward this goal, a fundamental task is the deciphering of geneticvariants that are predictive of complex diseases. Modern studies, in the formof genome-wide association studies (GWAS) have aïŹorded researchers with the opportunity to reveal new genotype-phenotype relationships through the extensive scanning of genetic variants. These studies typically contain over half a million genetic features for thousands of individuals. Examining this with methods other than univariate statistics is a challenging task requiring advanced algorithms that are scalable to the genome-wide level. In the future, next-generation sequencing studies (NGS) will contain an even larger number of common and rare variants.
Machine learning-based feature selection algorithms have been shown to have the ability to eïŹectively create predictive models for various genotype-phenotype relationships. This work explores the problem of selecting genetic variant subsets that are the most predictive of complex disease phenotypes through various feature selection methodologies, including ïŹlter, wrapper and embedded algorithms. The examined machine learning algorithms were demonstrated to not only be eïŹective at predicting the disease phenotypes, but also doing so eïŹciently through the use of computational shortcuts. While much of the work was able to be run on high-end desktops, some work was further extended so that it could be implemented on parallel computers helping to assure that they will also scale to the NGS data sets.
Further, these studies analyzed the relationships between various feature selection methods and demonstrated the need for careful testing when selecting an algorithm. It was shown that there is no universally optimal algorithm for variant selection in GWAS, but rather methodologies need to be selected based on the desired outcome, such as the number of features to be included in the prediction model. It was also demonstrated that without proper model validation, for example using nested cross-validation, the models can result in overly-optimistic prediction accuracies and decreased generalization ability. It is through the implementation and application of machine learning methods that one can extract predictive genotypeâphenotype relationships and biological insights from genetic data sets.Siirretty Doriast
Genetic Modification of Inherited Retinopathy in Mice
The retina, as a critical component of the sensory system, consists of multiple cell types, of which, photoreceptors play a key role in receiving, integrating and transmitting light signals. The biofunctions of photoreceptors rely on their proper growth and development, which is predominantly governed by a cluster of molecules that comprise the transcriptional regulation for photoreceptor development. Any disruption of these molecules potentially incurs retinal pathologies.
It is known that deficiencies of nuclear receptor subfamily 2 group E member 3 (NR2E3) or neural retina leucine-zipper (NRL), two molecules in regulating photoreceptor cell development, cause photoreceptor dysplasia. In a sensitized chemical mutagenesis study to identify genetic modifiers in retinal degeneration (rd) 7 mice (Nr2e3rd7), Tvrm222, was established, in which photoreceptor dysplasia was significantly rescued compared to that in Nr2e3rd7 mutants. Notably, the Tvrm222 allele also ameliorates photoreceptor dysplasia in Nrl knockout mice. According to whole-genome mapping and exome sequencing, the modifier was localized to Chromosome 6 and was identified as a missense variant in the FERM domain containing 4B (Frmd4b) gene, which is predicted to cause the substitution of serine residue 938 with proline (S938P).
Furthermore, we observed that the Frmd4bTvrm222 allele preserved the integrity of the fragmented external limiting membrane (ELM) present in both rd7 and Nrlâ/â mouse retinas. FRMD4B, as a binding partner of cytohesin 3 (CYTH3), has been proposed to participate in cell junction remodeling. However, its function in ELM maintenance and photoreceptor dysplasia has not been previously examined. This study revealed that the S938P variation significantly reduces in vitro membrane recruitment of FRMD4B.
Notably, in an attempt to explore the molecular mechanisms underlying the modifying effect of FRMD4B938P on dysplastic retinas, we observed an increased activation of ADP-ribosylation factor 6, a direct substrate for CYTH3, both in vitro and in vivo, as well as decreased phosphorylation of AKT in Tvrm222 retinas. These changes were accompanied by an elevation in cell membrane-associated zonula adherens and occludens proteins in Tvrm222 retinas. Taken together, this study determines a critical role of FRMD4B in maintaining the integrity of adhesive support (at the ELM) and in rescuing photoreceptor dysplasia in mice
- âŠ