2 research outputs found

    Distributions based Regression Techniques for Compositional Data

    Get PDF
    A systematic study of regression methods for compositional data, which are unique and rare are explored in this thesis. We start with the basic machine learning concept of regression. We use regression equations to solve a classification problem. With partial least squares discriminant analysis (PLS-DA), we follow regression algorithms and solve classification problems, like spam filtering and intrusion detection. After getting the basic understanding of how regression works, we move on to more complex algorithms of distributions based regression. We explore the uni-dimensional case of distributions, applied to regression, the beta-regression. This gives us an understanding of how, when the data to be predicted, or the outcome, is assumed to be of beta distribution, a prediction can be made with regression equations. To further enhance our understanding, we look into Dirichlet distribution, which is for a multi-dimensional case. Unlike traditional regression, here we are predicting a compositional outcome. Two novel regression approaches based on distributions are proposed for compositional data, namely generalized Dirichlet regression and Beta-Liouville regression. They are extensions of Beta regression in a multi-dimensional scenario, similar to Dirichlet regression. The models are learned by maximum likelihood estimation algorithm using Newton-Raphson approach. The performance comparison between the proposed models and other popular solutions is given and both synthetic and real data sets extracted from challenging applications such as market share analysis using Google-Trends and occupancy estimation in smart buildings are evaluated to show the merits of the proposed approaches. Our work will act as a tool for product based companies to estimate how their investments in advertising have yielded results in the market shares. Google-Trends gives an estimate of the popularity of a company, which reflects the effect of advertisements. This thesis bridges the gap between open source data from Google-Trends and market shares

    Statistical spatial color information modeling in images and applications

    Get PDF
    Image processing, among its vast applications, has proven particular efficiency in quality control systems. Quality control systems such as the ones in the food industry, fruits and meat industries, pharmaceutic, and hardness testing are highly dependent on the accuracy of the algorithms used to extract image feature vectors and process them. Thus, the need to build better quality systems is tied to the progress in the field of image processing. Color histograms have been widely and successfully used in many computer vision and image processing applications. However, they do not include any spatial information. We propose statistical models to integrate both color and spatial information. Our first model is based on finite mixture models which have been applied to different computer vision, image processing and pattern recognition tasks. The majority of the work done concerning finite mixture models has focused on mixtures for continuous data. However, many applications involve and generate discrete data for which discrete mixtures are better suited. In this thesis, we investigate the problem of discrete data modeling using finite mixture models. We propose a novel, well motivated mixture that we call a multinomial generalized Dirichlet mixture. Our second model is based on finite multiple-Bernoulli mixtures. For the estimation of the model's parameters, we use a maximum a posteriori (MAP) approach through deterministic annealing expectation maximization (DAEM). Smoothing priors to the components parameters are introduced to stabilize the estimation. The selection of the number of clusters is based on stochastic complexit
    corecore