926 research outputs found

    Comparative Phylogeographic, Population Genomic, and Selection Inference with Development of Hierarchical Co-Demographic Models

    Full text link
    Comparing demographic histories across assemblages of populations, species, and sister pairs has been a focus in phylogeography since its inception. Initial approaches utilized organelle genetic data and involved qualitative comparisons of genetic patterns for evaluating hypotheses of shared evolutionary responses to past environmental changes. This endeavor has progressed with coalescent model-based statistical techniques and advances in next-generation sequencing, yet there remains a need for methods that can analyze aggregated genomic-scale data from non-model organisms within a unified framework that considers individual taxon uncertainty and variance. To this end, the aggregate site frequency spectrum (aSFS), an expansion of the site frequency spectrum to exploit SNP data collected from multiple independent populations, and the aggregate joint site frequency spectrum (ajSFS), an extension of the aSFS for population-pairs, are introduced and explored here for the purpose of assemblage-level demographic inference. Furthermore, introduced and described here is the R package Multi-DICE, a wrapper program that exploits existing simulation software for straight-forward and flexible execution of hierarchical co-demographic model-based inference given either the aSFS or single-locus sequence data. These methodological developments were validated through a succession of in silico experiments that tested a range of sampling configurations, alternative inferential frameworks, and various prior specifications. Additionally, empirical demonstrations were conducted from published RAD-seq data of five threespine stickleback populations as well as eight local replicates of a lamprey species-pair. Synchronous demographic trajectories were detected for both of these analyses. Moreover, similar techniques were utilized to investigate LINE selection among population-level whole-genome vertebrate datasets. In brief, a null demographic background was inferred utilizing SNP data, which was then exploited to simulate a putative null distribution of summary statistics that was compared to LINE data for detecting selection. Subsequently, the null demographic model was leveraged to evaluate selection presence, directionality, and strength. There was a robust signal for purifying selection along with a pattern of LINE size affecting selection strength in two species. As large-scale SNP data become routine, the aSFS, Multi-DICE, ajSFS, and protocol employed here for detecting selection will collectively expand the potential for powerful comparative phylogeographic and population genomic inference

    KELVIN: A Software Package for Rigorous Measurement of Statistical Evidence in Human Genetics

    Get PDF
    This paper describes the software package KELVIN, which supports the PPL (posterior probability of linkage) framework for the measurement of statistical evidence in human (or more generally, diploid) genetic studies. In terms of scope, KELVIN supports two-point (trait-marker or marker-marker) and multipoint linkage analysis, based on either sex-averaged or sex-specific genetic maps, with an option to allow for imprinting; trait-marker linkage disequilibrium (LD), or association analysis, in case-control data, trio data, and/or multiplex family data, with options for joint linkage and trait-marker LD or conditional LD given linkage; dichotomous trait, quantitative trait and quantitative trait threshold models; and certain types of gene-gene interactions and covariate effects. Features and data (pedigree) structures can be freely mixed and matched within analyses. The statistical framework is specifically tailored to accumulate evidence in a mathematically rigorous way across multiple data sets or data subsets while allowing for multiple sources of heterogeneity, and KELVIN itself utilizes sophisticated software engineering to provide a powerful and robust platform for studying the genetics of complex disorders

    Automated, Parallel Optimization Algorithms for Stochastic Functions

    Get PDF
    The optimization algorithms for stochastic functions are desired specifically for real-world and simulation applications where results are obtained from sampling, and contain experimental error or random noise. We have developed a series of stochastic optimization algorithms based on the well-known classical down hill simplex algorithm. Our parallel implementation of these optimization algorithms, using a framework called MW, is based on a master-worker architecture where each worker runs a massively parallel program. This parallel implementation allows the sampling to proceed independently on many processors as demonstrated by scaling up to more than 100 vertices and 300 cores. This framework is highly suitable for clusters with an ever increasing number of cores per node. The new algorithms have been successfully applied to the reparameterization of a model for liquid water, achieving thermodynamic and structural results for liquid water that are better than a standard model used in molecular simulations, with the the advantage of a fully automated parameterization process

    Neuroengineering of Clustering Algorithms

    Get PDF
    Cluster analysis can be broadly divided into multivariate data visualization, clustering algorithms, and cluster validation. This dissertation contributes neural network-based techniques to perform all three unsupervised learning tasks. Particularly, the first paper provides a comprehensive review on adaptive resonance theory (ART) models for engineering applications and provides context for the four subsequent papers. These papers are devoted to enhancements of ART-based clustering algorithms from (a) a practical perspective by exploiting the visual assessment of cluster tendency (VAT) sorting algorithm as a preprocessor for ART offline training, thus mitigating ordering effects; and (b) an engineering perspective by designing a family of multi-criteria ART models: dual vigilance fuzzy ART and distributed dual vigilance fuzzy ART (both of which are capable of detecting complex cluster structures), merge ART (aggregates partitions and lessens ordering effects in online learning), and cluster validity index vigilance in fuzzy ART (features a robust vigilance parameter selection and alleviates ordering effects in offline learning). The sixth paper consists of enhancements to data visualization using self-organizing maps (SOMs) by depicting in the reduced dimension and topology-preserving SOM grid information-theoretic similarity measures between neighboring neurons. This visualization\u27s parameters are estimated using samples selected via a single-linkage procedure, thereby generating heatmaps that portray more homogeneous within-cluster similarities and crisper between-cluster boundaries. The seventh paper presents incremental cluster validity indices (iCVIs) realized by (a) incorporating existing formulations of online computations for clusters\u27 descriptors, or (b) modifying an existing ART-based model and incrementally updating local density counts between prototypes. Moreover, this last paper provides the first comprehensive comparison of iCVIs in the computational intelligence literature --Abstract, page iv

    Meta-learning computational intelligence architectures

    Get PDF
    In computational intelligence, the term \u27memetic algorithm\u27 has come to be associated with the algorithmic pairing of a global search method with a local search method. In a sociological context, a \u27meme\u27 has been loosely defined as a unit of cultural information, the social analog of genes for individuals. Both of these definitions are inadequate, as \u27memetic algorithm\u27 is too specific, and ultimately a misnomer, as much as a \u27meme\u27 is defined too generally to be of scientific use. In this dissertation the notion of memes and meta-learning is extended from a computational viewpoint and the purpose, definitions, design guidelines and architecture for effective meta-learning are explored. The background and structure of meta-learning architectures is discussed, incorporating viewpoints from psychology, sociology, computational intelligence, and engineering. The benefits and limitations of meme-based learning are demonstrated through two experimental case studies -- Meta-Learning Genetic Programming and Meta- Learning Traveling Salesman Problem Optimization. Additionally, the development and properties of several new algorithms are detailed, inspired by the previous case-studies. With applications ranging from cognitive science to machine learning, meta-learning has the potential to provide much-needed stimulation to the field of computational intelligence by providing a framework for higher order learning --Abstract, page iii

    A Study of Multiobjective Metaheuristics When Solving Parameter Scalable Problems

    Full text link

    A New Measure for Analyzing and Fusing Sequences of Objects

    Get PDF
    This work is related to the combinatorial data analysis problem of seriation used for data visualization and exploratory analysis. Seriation re-sequences the data, so that more similar samples or objects appear closer together, whereas dissimilar ones are further apart. Despite the large number of current algorithms to realize such re-sequencing, there has not been a systematic way for analyzing the resulting sequences, comparing them, or fusing them to obtain a single unifying one. We propose a new positional proximity measure that evaluates the similarity of two arbitrary sequences based on their agreement on pairwise positional information of the sequenced objects. Furthermore, we present various statistical properties of this measure as well as its normalized version modeled as an instance of the generalized correlation coefficient. Based on this measure, we define a new procedure for consensus seriation that fuses multiple arbitrary sequences based on a quadratic assignment problem formulation and an efficient way of approximating its solution. We also derive theoretical links with other permutation distance functions and present their associated combinatorial optimization forms for consensus tasks. The utility of the proposed contributions is demonstrated through the comparison and fusion of multiple seriation algorithms we have implemented, using many real-world datasets from different application domains

    A Survey of Adaptive Resonance Theory Neural Network Models for Engineering Applications

    Full text link
    This survey samples from the ever-growing family of adaptive resonance theory (ART) neural network models used to perform the three primary machine learning modalities, namely, unsupervised, supervised and reinforcement learning. It comprises a representative list from classic to modern ART models, thereby painting a general picture of the architectures developed by researchers over the past 30 years. The learning dynamics of these ART models are briefly described, and their distinctive characteristics such as code representation, long-term memory and corresponding geometric interpretation are discussed. Useful engineering properties of ART (speed, configurability, explainability, parallelization and hardware implementation) are examined along with current challenges. Finally, a compilation of online software libraries is provided. It is expected that this overview will be helpful to new and seasoned ART researchers
    corecore