8 research outputs found

    Multivariate Data Modeling and Its Applications to Conditional Outlier Detection

    Get PDF
    With recent advances in data technology, large amounts of data of various kinds and from various sources are being generated and collected every second. The increase in the amounts of collected data is often accompanied by increase in the complexity of data types and objects we are able to store. The next challenge is the development of machine learning methods for their analyses. This thesis contributes to the effort by focusing on the analysis of one such data type, complex input-output data objects with high-dimensional multivariate binary output spaces, and two data-analytic problems: Multi-Label Classification and Conditional Outlier Detection. First, we study the Multi-label Classification (MLC) problem that concerns classification of data instances into multiple binary output (class or response) variables that reflect different views, functions, or components describing the data. We present three MLC frameworks that effectively learn and predict the best output configuration for complex input-output data objects. Our experimental evaluation on a range of datasets shows that our solutions outperform several state-of-the-art MLC methods and produce more reliable posterior probability estimates. Second, we investigate the Conditional Outlier Detection (COD) problem, where our goal is to identify unusual patterns observed in the multi-dimensional binary output space given their input context. We made two important contributions to the definition and solutions of COD. First, by observing a gap in between the development of unconditional and conditional outlier detection approaches, we propose a ratio of outlier scores (ROS) that uses a pair of unconditional scores to calculate the conditional scores. Second, we show that by applying the chain decomposition of the probabilistic model, the probabilistic multivariate COD score decomposes to a set of probabilistic univariate COD scores. This decomposition can be subsequently generalized and extended to a broad spectrum of multivariate COD scores, including the new ROS score and its variants, leading to a new multivariate conditional outlier scoring framework. Through experiments on synthetic and real-world datasets with simulated outliers, we provide empirical results that support the validity of our COD methods

    Multivariate Conditional Outlier Detection and Its Clinical Application

    No full text
    This paper overviews and discusses our recent work on a multivariate conditional outlier detection framework for clinical applications

    Multivariate Conditional Anomaly Detection and Its Clinical Application

    No full text
    This paper overviews the background, goals, past achievements and future directions of our research that aims to build a multivariate conditional anomaly detection framework for the clinical application
    corecore