1,930 research outputs found

    CAUSAL ANALYSIS THEORY AND APPLICATION TO ALZHEIMER’S DISEASE (AD) AND HEART FAILURE (HF)

    Get PDF
    Alzheimer\u27s disease (AD) and heart failure (HF) are two complex diseases that are caused by the combination of genetic and epigenetic, environmental and other lifestyle factors. Understanding the relationships between genetic and epigenetic variants and other factors of such complex diseases could assist researchers discover disease mechanisms and develop targeted therapies. Much of the research in genetics/epigenetics studies regarding AD and heart diseases have been focused on association analysis. Many researchers have identified genetic/epigenetics variants and phenotypes that are significantly associated with disease pathology. While most of these studies utilize association analysis as the analytical platform, the signals identified by association studies can only explain a small proportion of the heritability of complex diseases and a large proportion of risk factors remain undiscovered, which is the limitation of genome- wide association studies (GWAS). In addition, the biological system usually functions in a systematic or causal way, thus causation analysis is key to uncover the risk mechanisms of complex diseases. The relationship between association and causation is that causation can be used to infer association, but the reverse cannot be guaranteed. Traditionally, the gold standard for causation analysis is using interventions in randomized controlled trials (RCT). However, RCT is not feasible for genetics/epigenetics data for either ethical or technical reasons. The major objective of this research is thus to propose methods to uncover the causal mechanisms between genetic/epigenetic factors and phenotypes such as environmental and lifestyle factors for complex diseases. First, I proposed a bivariate causal discovery method to uncover the pairwise causal relationships between factors. Second, I proposed a network analysis framework to construct the causal network among genetic/epigenetic variants and phenotypic factors. Finally, I applied the bivariate causal discovery method and causal network construction method to the two complex diseases: Alzheimer\u27s disease (AD) and heart failure (HF) data. Simulations and applications results were discussed in the following sections

    Evolutionary NAS with Gene Expression Programming of Cellular Encoding

    Full text link
    The renaissance of neural architecture search (NAS) has seen classical methods such as genetic algorithms (GA) and genetic programming (GP) being exploited for convolutional neural network (CNN) architectures. While recent work have achieved promising performance on visual perception tasks, the direct encoding scheme of both GA and GP has functional complexity deficiency and does not scale well on large architectures like CNN. To address this, we present a new generative encoding scheme -- symbolic linear generative encodingsymbolic\ linear\ generative\ encoding (SLGE) -- simple, yet powerful scheme which embeds local graph transformations in chromosomes of linear fixed-length string to develop CNN architectures of variant shapes and sizes via evolutionary process of gene expression programming. In experiments, the effectiveness of SLGE is shown in discovering architectures that improve the performance of the state-of-the-art handcrafted CNN architectures on CIFAR-10 and CIFAR-100 image classification tasks; and achieves a competitive classification error rate with the existing NAS methods using less GPU resources.Comment: Accepted at IEEE SSCI 2020 (7 pages, 3 figures

    An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs.

    Get PDF
    Reconstructing full-length transcript isoforms from sequence fragments (such as ESTs) is a major interest and challenge for bioinformatic analysis of pre-mRNA alternative splicing. This problem has been formulated as finding traversals across the splice graph, which is a directed acyclic graph (DAG) representation of gene structure and alternative splicing. In this manuscript we introduce a probabilistic formulation of the isoform reconstruction problem, and provide an expectation-maximization (EM) algorithm for its maximum likelihood solution. Using a series of simulated data and expressed sequences from real human genes, we demonstrate that our EM algorithm can correctly handle various situations of fragmentation and coupling in the input data. Our work establishes a general probabilistic framework for splice graph-based reconstructions of full-length isoforms

    WebGestalt: an integrated system for exploring gene sets in various biological contexts

    Get PDF
    High-throughput technologies have led to the rapid generation of large-scale datasets about genes and gene products. These technologies have also shifted our research focus from ‘single genes’ to ‘gene sets’. We have developed a web-based integrated data mining system, WebGestalt (), to help biologists in exploring large sets of genes. WebGestalt is composed of four modules: gene set management, information retrieval, organization/visualization, and statistics. The management module uploads, saves, retrieves and deletes gene sets, as well as performs Boolean operations to generate the unions, intersections or differences between different gene sets. The information retrieval module currently retrieves information for up to 20 attributes for all genes in a gene set. The organization/visualization module organizes and visualizes gene sets in various biological contexts, including Gene Ontology, tissue expression pattern, chromosome distribution, metabolic and signaling pathways, protein domain information and publications. The statistics module recommends and performs statistical tests to suggest biological areas that are important to a gene set and warrant further investigation. In order to demonstrate the use of WebGestalt, we have generated 48 gene sets with genes over-represented in various human tissue types. Exploration of all the 48 gene sets using WebGestalt is available for the public at

    Integrated Development and Parallelization of Automated Dicentric Chromosome Identification Software to Expedite Biodosimetry Analysis

    Get PDF
    Manual cytogenetic biodosimetry lacks the ability to handle mass casualty events. We present an automated dicentric chromosome identification (ADCI) software utilizing parallel computing technology. A parallelization strategy combining data and task parallelism, as well as optimization of I/O operations, has been designed, implemented, and incorporated in ADCI. Experiments on an eight-core desktop show that our algorithm can expedite the process of ADCI by at least four folds. Experiments on Symmetric Computing, SHARCNET, Blue Gene/Q multi-processor computers demonstrate the capability of parallelized ADCI to process thousands of samples for cytogenetic biodosimetry in a few hours. This increase in speed underscores the effectiveness of parallelization in accelerating ADCI. Our software will be an important tool to handle the magnitude of mass casualty ionizing radiation events by expediting accurate detection of dicentric chromosomes

    Evolutionary approaches for the reverse-engineering of gene regulatory networks: A study on a biologically realistic dataset

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Inferring gene regulatory networks from data requires the development of algorithms devoted to structure extraction. When only static data are available, gene interactions may be modelled by a Bayesian Network (BN) that represents the presence of direct interactions from regulators to regulees by conditional probability distributions. We used enhanced evolutionary algorithms to stochastically evolve a set of candidate BN structures and found the model that best fits data without prior knowledge.</p> <p>Results</p> <p>We proposed various evolutionary strategies suitable for the task and tested our choices using simulated data drawn from a given bio-realistic network of 35 nodes, the so-called insulin network, which has been used in the literature for benchmarking. We assessed the inferred models against this reference to obtain statistical performance results. We then compared performances of evolutionary algorithms using two kinds of recombination operators that operate at different scales in the graphs. We introduced a niching strategy that reinforces diversity through the population and avoided trapping of the algorithm in one local minimum in the early steps of learning. We show the limited effect of the mutation operator when niching is applied. Finally, we compared our best evolutionary approach with various well known learning algorithms (MCMC, K2, greedy search, TPDA, MMHC) devoted to BN structure learning.</p> <p>Conclusion</p> <p>We studied the behaviour of an evolutionary approach enhanced by niching for the learning of gene regulatory networks with BN. We show that this approach outperforms classical structure learning methods in elucidating the original model. These results were obtained for the learning of a bio-realistic network and, more importantly, on various small datasets. This is a suitable approach for learning transcriptional regulatory networks from real datasets without prior knowledge.</p
    • …
    corecore