15 research outputs found

    Statistical methods for gene selection and genetic association studies

    Get PDF
    This dissertation includes five Chapters. A brief description of each chapter is organized as follows. In Chapter One, we propose a signed bipartite genotype and phenotype network (GPN) by linking phenotypes and genotypes based on the statistical associations. It provides a new insight to investigate the genetic architecture among multiple correlated phenotypes and explore where phenotypes might be related at a higher level of cellular and organismal organization. We show that multiple phenotypes association studies by considering the proposed network are improved by incorporating the genetic information into the phenotype clustering. In Chapter Two, we first illustrate the proposed GPN to GWAS summary statistics. Then, we assess contributions to constructing a well-defined GPN with a clear representation of genetic associations by comparing the network properties with a random network, including connectivity, centrality, and community structure. The network topology annotations based on the sparse representations of GPN can be used to understand the disease heritability for the highly correlated phenotypes. In applications of phenome-wide association studies, the proposed GPN can identify more significant pairs of genetic variant and phenotype categories. In Chapter Three, a powerful and computationally efficient gene-based association test is proposed, aggregating information from different gene-based association tests and also incorporating expression quantitative trait locus information. We show that the proposed method controls the type I error rates very well and has higher power in the simulation studies and can identify more significant genes in the real data analyses. In Chapter Four, we develop six statistical selection methods based on the penalized regression for inferring target genes of a transcription factor (TF). In this study, the proposed selection methods combine statistics, machine learning , and convex optimization approach, which have great efficacy in identifying the true target genes. The methods will fill the gap of lacking the appropriate methods for predicting target genes of a TF, and are instrumental for validating experimental results yielding from ChIP-seq and DAP-seq, and conversely, selection and annotation of TFs based on their target genes. In Chapter Five, we propose a gene selection approach by capturing gene-level signals in network-based regression into case-control association studies with DNA sequence data or DNA methylation data, inspired by the popular gene-based association tests using a weighted combination of genetic variants to capture the combined effect of individual genetic variants within a gene. We show that the proposed gene selection approach have higher true positive rates than using traditional dimension reduction techniques in the simulation studies and select potentially rheumatoid arthritis related genes that are missed by existing methods

    From Genes to Communities: An Integrative Approach to the Evolution of Varanidae

    Get PDF
    Why do organisms look the way they do? Why do they live where they do? Why are some groups more diverse than others? These basic questions are often addressed at different scales using a particular set of methods. For example, the first question could be addressed by either looking at phenotypes across a phylogeny in a comparative framework or by looking at fine scale variation across the landscape within a species. However, it has been challenging to build a conceptual and methodological bridge linking ecological processes and population dynamics with evolutionary and biogeographic patterns above the species level. In this thesis, I present research spanning a broad range in the continuum between micro- and macroevolution. Appropriately, my study system is monitor lizards (Squamata: Varanidae), the terrestrial vertebrate genus showing the largest disparity in body size. These charismatic reptiles display notable variation in species richness, morphology, and ecology across the three continents and numerous oceanic islands they call home. I gathered large molecular, morphological, and environmental datasets and analysed them using process-based methods linking ecological and population-level processes with speciation and macroevolutionary patterns. I used this integrative approach to identify the drivers of genetic, phenotypic, and lineage diversification in Varanidae at different evolutionary scales. In Chapter I, I show that the diversification dynamics of three endemic varanid radiations in Indo-Australasia have been dictated by a combination of geography and interspecific interactions. In Chapter II, I demonstrate that ontogenetic lability is behind morphological diversification in varanids and their kin, and that ontogenetic ecological shifts in ecology explain some of the ontogenetic variation in the group. In Chapter III, I used a comprehensive approach to uncover signs of ancient hybridization between the iconic Komodo dragon and a group of Australian varanids, corroborating the Australian origin of the former. In Chapter IV, I evaluate species limits in spiny-tailed monitors and present genomic and phenotypic evidence for local adaptation despite extensive gene flow. Together, these chapters show how the integration of multiple sources of evidence can offer insight into the long-term evolutionary consequences of developmental, ecological, and population-level processes

    A comparison of the CAR and DAGAR spatial random effects models with an application to diabetics rate estimation in Belgium

    Get PDF
    When hierarchically modelling an epidemiological phenomenon on a finite collection of sites in space, one must always take a latent spatial effect into account in order to capture the correlation structure that links the phenomenon to the territory. In this work, we compare two autoregressive spatial models that can be used for this purpose: the classical CAR model and the more recent DAGAR model. Differently from the former, the latter has a desirable property: its ρ parameter can be naturally interpreted as the average neighbor pair correlation and, in addition, this parameter can be directly estimated when the effect is modelled using a DAGAR rather than a CAR structure. As an application, we model the diabetics rate in Belgium in 2014 and show the adequacy of these models in predicting the response variable when no covariates are available

    A Statistical Approach to the Alignment of fMRI Data

    Get PDF
    Multi-subject functional Magnetic Resonance Image studies are critical. The anatomical and functional structure varies across subjects, so the image alignment is necessary. We define a probabilistic model to describe functional alignment. Imposing a prior distribution, as the matrix Fisher Von Mises distribution, of the orthogonal transformation parameter, the anatomical information is embedded in the estimation of the parameters, i.e., penalizing the combination of spatially distant voxels. Real applications show an improvement in the classification and interpretability of the results compared to various functional alignment methods

    Dynamical Modeling Techniques for Biological Time Series Data

    Get PDF
    The present thesis is articulated over two main topics which have in common the modeling of the dynamical properties of complex biological systems from large-scale time-series data. On one hand, this thesis analyzes the inverse problem of reconstructing Gene Regulatory Networks (GRN) from gene expression data. This first topic seeks to reverse-engineer the transcriptional regulatory mechanisms involved in few biological systems of interest, vital to understand the specificities of their different responses. In the light of recent mathematical developments, a novel, flexible and interpretable modeling strategy is proposed to reconstruct the dynamical dependencies between genes from short-time series data. In addition, experimental trade-offs and optimal modeling strategies are investigated for given data availability. Consistent literature on these topics was previously surprisingly lacking. The proposed methodology is applied to the study of circadian rhythms, which consists in complex GRN driving most of daily biological activity across many species. On the other hand, this manuscript covers the characterization of dynamically differentiable brain states in Zebrafish in the context of epilepsy and epileptogenesis. Zebrafish larvae represent a valuable animal model for the study of epilepsy due to both their genetic and dynamical resemblance with humans. The fundamental premise of this research is the early apparition of subtle functional changes preceding the clinical symptoms of seizures. More generally, this idea, based on bifurcation theory, can be described by a progressive loss of resilience of the brain and ultimately, its transition from a healthy state to another characterizing the disease. First, the morphological signatures of seizures generated by distinct pathological mechanisms are investigated. For this purpose, a range of mathematical biomarkers that characterizes relevant dynamical aspects of the neurophysiological signals are considered. Such mathematical markers are later used to address the subtle manifestations of early epileptogenic activity. Finally, the feasibility of a probabilistic prediction model that indicates the susceptibility of seizure emergence over time is investigated. The existence of alternative stable system states and their sudden and dramatic changes have notably been observed in a wide range of complex systems such as in ecosystems, climate or financial markets

    Scaling Multidimensional Inference for Big Structured Data

    Get PDF
    In information technology, big data is a collection of data sets so large and complex that it becomes difficult to process using traditional data processing applications [151]. In a world of increasing sensor modalities, cheaper storage, and more data oriented questions, we are quickly passing the limits of tractable computations using traditional statistical analysis methods. Methods which often show great results on simple data have difficulties processing complicated multidimensional data. Accuracy alone can no longer justify unwarranted memory use and computational complexity. Improving the scaling properties of these methods for multidimensional data is the only way to make these methods relevant. In this work we explore methods for improving the scaling properties of parametric and nonparametric models. Namely, we focus on the structure of the data to lower the complexity of a specific family of problems. The two types of structures considered in this work are distributive optimization with separable constraints (Chapters 2-3), and scaling Gaussian processes for multidimensional lattice input (Chapters 4-5). By improving the scaling of these methods, we can expand their use to a wide range of applications which were previously intractable open the door to new research questions

    Reverse engineering of gene regulatory networks governing cell-cell communication in the microenvironment of pancreatic cancer

    Get PDF
    Background: Pancreatic ductal adenocarcinoma (PDAC) is one of the leading causes of cancer death, with a five-year survival rate of <5% and a median survival of 6 months. Extensive desmoplastic reaction is a characteristic feature and a prognostic factor of PDAC, which conveys its resistance. Desmoplastic stroma accounts for approx. 90% of tumor volume and consists predominantly of non-malignant fibroblasts (pancreatic stellate cells, PSC). Previous studies have revealed the PSC mesenchymal origins, capacity to switch between quiescent and activated states, proinflammatory features, expression of soluble factors, ability to migrate, and phagocytize. State of the art: Abundance of stroma has sparked previous attempts to dissect the interactions between PSC and tumor cells (TC) producing a common picture of a microenvironment supporting PDAC development. Unfortunately, focus on snapshot-like analysis has proven difficult to translate into therapeutical advances, as it discards the dynamic interactions in the microenvironment, as well as the temporal dynamics of gene expression itself. Gene regulatory networks (GRN) adapt to environmental cues by rewiring connections between genes, those induced modulations effectively lead to state-transitions e.g. PSC activation, or produce mutually exclusive cell-fate decisions e.g. differentiation, senescence, or death. We recognize that cell-specific assignment of stimuli, identification of genes forming the GRNs, as well as the identification of cellular state-changes remain undiscovered. We hypothesize that at an early stage, the quiescent → activated PSC transition yields a steady state PSC gene regulatory network (GRN), but the subsequent succession of impulse responses along TC→PSC→TC interaction axis drives both cell types into unstable states maintained only for the duration of the direct TC-PSC contact. Aims: Through the application of a high-throughput complexity reduction approach and in silico modeling I aim to reconstruct the GRNs underlying the cell-cell communication, and identify key soluble factors shaping the double-paracrine interactions. I aim to use the models to gain a mechanistic and functional insight into how the cues are integrated and how they affect GRN maintenance. I hope to capture cell-fate decisions and identify key dynamic changes with the ultimate goal of finding genetic markers to aid development of novel therapeutic options for this deadly malignancy. Results: We have individually stimulated PSC and TC with conditioned supernatant from the respective other cell type and recorded a time-series (1-24h) from which genome-wide microarray expression data has been generated. In this dissertation I used the time-resolved expression profiles to identify significant gene kinetics through an approach-involving gene ranking, filtering, and clustering followed by gene ontology and pathway analysis. I identified key gene interactions using a genetic algorithm embedded in a continuous time recurrent neural network (CTRNN) modeling scheme. Then I used the derived GRN’s to produce a picture of unique intercellular interactions. Through in silico simulations with the created models, and subsequent data analysis and interpretation I delivered targets for experimental testing on the inter- as well as intra-cellular levels. Experimental validation of the selected gene targets using gene silencing and qRT-PCR confirmed the in silico predicted TC network behavior; validation of the intercellular connections confirmed their dependence on the identified networks

    Intelligent Sensors for Human Motion Analysis

    Get PDF
    The book, "Intelligent Sensors for Human Motion Analysis," contains 17 articles published in the Special Issue of the Sensors journal. These articles deal with many aspects related to the analysis of human movement. New techniques and methods for pose estimation, gait recognition, and fall detection have been proposed and verified. Some of them will trigger further research, and some may become the backbone of commercial systems
    corecore