24,882 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

    Universal microscopic correlation functions for products of independent Ginibre matrices

    Full text link
    We consider the product of n complex non-Hermitian, independent random matrices, each of size NxN with independent identically distributed Gaussian entries (Ginibre matrices). The joint probability distribution of the complex eigenvalues of the product matrix is found to be given by a determinantal point process as in the case of a single Ginibre matrix, but with a more complicated weight given by a Meijer G-function depending on n. Using the method of orthogonal polynomials we compute all eigenvalue density correlation functions exactly for finite N and fixed n. They are given by the determinant of the corresponding kernel which we construct explicitly. In the large-N limit at fixed n we first determine the microscopic correlation functions in the bulk and at the edge of the spectrum. After unfolding they are identical to that of the Ginibre ensemble with n=1 and thus universal. In contrast the microscopic correlations we find at the origin differ for each n>1 and generalise the known Bessel-law in the complex plane for n=2 to a new hypergeometric kernel 0_F_n-1.Comment: 20 pages, v2 published version: typos corrected and references adde

    Evolving Ensemble Fuzzy Classifier

    Full text link
    The concept of ensemble learning offers a promising avenue in learning from data streams under complex environments because it addresses the bias and variance dilemma better than its single model counterpart and features a reconfigurable structure, which is well suited to the given context. While various extensions of ensemble learning for mining non-stationary data streams can be found in the literature, most of them are crafted under a static base classifier and revisits preceding samples in the sliding window for a retraining step. This feature causes computationally prohibitive complexity and is not flexible enough to cope with rapidly changing environments. Their complexities are often demanding because it involves a large collection of offline classifiers due to the absence of structural complexities reduction mechanisms and lack of an online feature selection mechanism. A novel evolving ensemble classifier, namely Parsimonious Ensemble pENsemble, is proposed in this paper. pENsemble differs from existing architectures in the fact that it is built upon an evolving classifier from data streams, termed Parsimonious Classifier pClass. pENsemble is equipped by an ensemble pruning mechanism, which estimates a localized generalization error of a base classifier. A dynamic online feature selection scenario is integrated into the pENsemble. This method allows for dynamic selection and deselection of input features on the fly. pENsemble adopts a dynamic ensemble structure to output a final classification decision where it features a novel drift detection scenario to grow the ensemble structure. The efficacy of the pENsemble has been numerically demonstrated through rigorous numerical studies with dynamic and evolving data streams where it delivers the most encouraging performance in attaining a tradeoff between accuracy and complexity.Comment: this paper has been published by IEEE Transactions on Fuzzy System

    Online Tool Condition Monitoring Based on Parsimonious Ensemble+

    Full text link
    Accurate diagnosis of tool wear in metal turning process remains an open challenge for both scientists and industrial practitioners because of inhomogeneities in workpiece material, nonstationary machining settings to suit production requirements, and nonlinear relations between measured variables and tool wear. Common methodologies for tool condition monitoring still rely on batch approaches which cannot cope with a fast sampling rate of metal cutting process. Furthermore they require a retraining process to be completed from scratch when dealing with a new set of machining parameters. This paper presents an online tool condition monitoring approach based on Parsimonious Ensemble+, pENsemble+. The unique feature of pENsemble+ lies in its highly flexible principle where both ensemble structure and base-classifier structure can automatically grow and shrink on the fly based on the characteristics of data streams. Moreover, the online feature selection scenario is integrated to actively sample relevant input attributes. The paper presents advancement of a newly developed ensemble learning algorithm, pENsemble+, where online active learning scenario is incorporated to reduce operator labelling effort. The ensemble merging scenario is proposed which allows reduction of ensemble complexity while retaining its diversity. Experimental studies utilising real-world manufacturing data streams and comparisons with well known algorithms were carried out. Furthermore, the efficacy of pENsemble was examined using benchmark concept drift data streams. It has been found that pENsemble+ incurs low structural complexity and results in a significant reduction of operator labelling effort.Comment: this paper has been published by IEEE Transactions on Cybernetic

    Meaning of temperature in different thermostatistical ensembles

    Get PDF
    Depending on the exact experimental conditions, the thermodynamic properties of physical systems can be related to one or more thermostatistical ensembles. Here, we survey the notion of thermodynamic temperature in different statistical ensembles, focusing in particular on subtleties that arise when ensembles become non-equivalent. The 'mother' of all ensembles, the microcanonical ensemble, uses entropy and internal energy (the most fundamental, dynamically conserved quantity) to derive temperature as a secondary thermodynamic variable. Over the past century, some confusion has been caused by the fact that several competing microcanonical entropy definitions are used in the literature, most commonly the volume and surface entropies introduced by Gibbs. It can be proved, however, that only the volume entropy satisfies exactly the traditional form of the laws of thermodynamics for a broad class of physical systems, including all standard classical Hamiltonian systems, regardless of their size. This mathematically rigorous fact implies that negative 'absolute' temperatures and Carnot efficiencies >1>1 are not achievable within a standard thermodynamical framework. As an important offspring of microcanonical thermostatistics, we shall briefly consider the canonical ensemble and comment on the validity of the Boltzmann weight factor. We conclude by addressing open mathematical problems that arise for systems with discrete energy spectrum.Comment: 11 pages, 1 figur
    • …
    corecore