54 research outputs found

    Optimization with Sparsity-Inducing Penalties

    Get PDF
    Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. They were first dedicated to linear variable selection but numerous extensions have now emerged such as structured sparsity or kernel selection. It turns out that many of the related estimation problems can be cast as convex optimization problems by regularizing the empirical risk with appropriate non-smooth norms. The goal of this paper is to present from a general perspective optimization tools and techniques dedicated to such sparsity-inducing penalties. We cover proximal methods, block-coordinate descent, reweighted 2\ell_2-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provide an extensive set of experiments to compare various algorithms from a computational point of view

    Robust and Adversarial Data Mining

    Get PDF
    In the domain of data mining and machine learning, researchers have made significant contributions in developing algorithms handling clustering and classification problems. We develop algorithms under assumptions that are not met by previous works. (i) In adversarial learning, which is the study of machine learning techniques deployed in non-benign environments. We design an algorithm to show how a classifier should be designed to be robust against sparse adversarial attacks. Our main insight is that sparse feature attacks are best defended by designing classifiers which use L1 regularizers. (ii) The different properties between L1 (Lasso) and L2 (Tikhonov or Ridge) regularization has been studied extensively. However, given a data set, principle to follow in terms of choosing the suitable regularizer is yet to be developed. We use mathematical properties of the two regularization methods followed by detailed experimentation to understand their impact based on four characteristics. (iii) The identification of anomalies is an inherent component of knowledge discovery. In lots of cases, the number of features of a data set can be traced to a much smaller set of features. We claim that algorithms applied in a latent space are more robust. This can lead to more accurate results, and potentially provide a natural medium to explain and describe outliers. (iv) We also apply data mining techniques on health care industry. In a lot cases, health insurance companies cover unnecessary costs carried out by healthcare providers. The potential adversarial behaviours of surgeon physicians are addressed. We describe a specific con- text of private healthcare in Australia and describe our social network based approach (applied to health insurance claims) to understand the nature of collaboration among doctors treating hospital inpatients and explore the impact of collaboration on cost and quality of care. (v) We further develop models that predict the behaviours of orthopaedic surgeons in regard to surgery type and use of prosthetic device. An important feature of these models is that they can not only predict the behaviours of surgeons but also provide explanation for the predictions

    Trends in Mathematical Imaging and Surface Processing

    Get PDF
    Motivated both by industrial applications and the challenge of new problems, one observes an increasing interest in the field of image and surface processing over the last years. It has become clear that even though the applications areas differ significantly the methodological overlap is enormous. Even if contributions to the field come from almost any discipline in mathematics, a major role is played by partial differential equations and in particular by geometric and variational modeling and by their numerical counterparts. The aim of the workshop was to gather a group of leading experts coming from mathematics, engineering and computer graphics to cover the main developments

    Bayesian image restoration and bacteria detection in optical endomicroscopy

    Get PDF
    Optical microscopy systems can be used to obtain high-resolution microscopic images of tissue cultures and ex vivo tissue samples. This imaging technique can be translated for in vivo, in situ applications by using optical fibres and miniature optics. Fibred optical endomicroscopy (OEM) can enable optical biopsy in organs inaccessible by any other imaging systems, and hence can provide rapid and accurate diagnosis in a short time. The raw data the system produce is difficult to interpret as it is modulated by a fibre bundle pattern, producing what is called the “honeycomb effect”. Moreover, the data is further degraded due to the fibre core cross coupling problem. On the other hand, there is an unmet clinical need for automatic tools that can help the clinicians to detect fluorescently labelled bacteria in distal lung images. The aim of this thesis is to develop advanced image processing algorithms that can address the above mentioned problems. First, we provide a statistical model for the fibre core cross coupling problem and the sparse sampling by imaging fibre bundles (honeycomb artefact), which are formulated here as a restoration problem for the first time in the literature. We then introduce a non-linear interpolation method, based on Gaussian processes regression, in order to recover an interpretable scene from the deconvolved data. Second, we develop two bacteria detection algorithms, each of which provides different characteristics. The first approach considers joint formulation to the sparse coding and anomaly detection problems. The anomalies here are considered as candidate bacteria, which are annotated with the help of a trained clinician. Although this approach provides good detection performance and outperforms existing methods in the literature, the user has to carefully tune some crucial model parameters. Hence, we propose a more adaptive approach, for which a Bayesian framework is adopted. This approach not only outperforms the proposed supervised approach and existing methods in the literature but also provides computation time that competes with optimization-based methods

    Contributions to Robust Graph Clustering: Spectral Analysis and Algorithms

    Get PDF
    This dissertation details the design of fast, and parameter free, graph clustering methods to robustly determine set cluster assignments. It provides spectral analysis as well as algorithms that adapt the obtained theoretical results to the implementation of robust graph clustering techniques. Sparsity is of importance in graph clustering and a first contribution of the thesis is the definition of a sparse graph model consistent with the graph clustering objectives. This model is based on an advantageous property, arising from a block diagonal representation, of a matrix that promotes the density of connections within clusters and sparsity between them. Spectral analysis of the sparse graph model including the eigen-decomposition of the Laplacian matrix is conducted. The analysis of the Laplacian matrix is simplified by defining a vector that carries all the relevant information that is contained in the Laplacian matrix. The obtained spectral properties of sparse graphs are adapted to sparsity-aware clustering based on two methods that formulate the determination of the sparsity level as approximations to spectral properties of the sparse graph models. A second contribution of this thesis is to analyze the effects of outliers on graph clustering and to propose algorithms that address robustness and the level of sparsity jointly. The basis for this contribution is to specify fundamental outlier types that occur in the cases of extreme sparsity and the mathematical analysis of their effects on sparse graphs to develop graph clustering algorithms that are robust against the investigated outlier effects. Based on the obtained results, two different robust and sparsity-aware affinity matrix construction methods are proposed. Motivated by the outliers’ effects on eigenvectors, a robust Fiedler vector estimation and a robust spectral clustering methods are proposed. Finally, an outlier detection algorithm that is built upon the vertex degree is proposed and applied to gait analysis. The results of this thesis demonstrate the importance of jointly addressing robustness and the level of sparsity for graph clustering algorithms. Additionally, simplified Laplacian matrix analysis provides promising results to design graph construction methods that may be computed efficiently through the optimization in a vector space instead of the usually used matrix space

    Vol. 15, No. 1 (Full Issue)

    Get PDF
    corecore