98,965 research outputs found

    Clustering South African households based on their asset status using latent variable models

    Full text link
    The Agincourt Health and Demographic Surveillance System has since 2001 conducted a biannual household asset survey in order to quantify household socio-economic status (SES) in a rural population living in northeast South Africa. The survey contains binary, ordinal and nominal items. In the absence of income or expenditure data, the SES landscape in the study population is explored and described by clustering the households into homogeneous groups based on their asset status. A model-based approach to clustering the Agincourt households, based on latent variable models, is proposed. In the case of modeling binary or ordinal items, item response theory models are employed. For nominal survey items, a factor analysis model, similar in nature to a multinomial probit model, is used. Both model types have an underlying latent variable structure - this similarity is exploited and the models are combined to produce a hybrid model capable of handling mixed data types. Further, a mixture of the hybrid models is considered to provide clustering capabilities within the context of mixed binary, ordinal and nominal response data. The proposed model is termed a mixture of factor analyzers for mixed data (MFA-MD). The MFA-MD model is applied to the survey data to cluster the Agincourt households into homogeneous groups. The model is estimated within the Bayesian paradigm, using a Markov chain Monte Carlo algorithm. Intuitive groupings result, providing insight to the different socio-economic strata within the Agincourt region.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS726 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A novel clustering algorithm based on mathematical morphology for wind power generation prediction

    Get PDF
    Wind power has the characteristic of daily similarity. Furthermore, days with wind power variation trends reflect similar meteorological phenomena. Therefore, wind power prediction accuracy can be improved and computational complexity during model simulation reduced by choosing the historical days whose numerical weather prediction information is similar to that of the predicted day as training samples. This paper proposes a new prediction model based on a novel dilation and erosion (DE) clustering algorithm for wind power generation. In the proposed model, the days with similar numerical weather prediction (NWP) information to the predicted day are selected via the proposed DE clustering algorithm, which is based on the basic operations in mathematical morphology. And the proposed DE clustering algorithm can cluster automatically without supervision. Case study conducted using data from Yilan wind farm in northeast China indicate that the performance of the new generalized regression neural network (GRNN) prediction model based on the proposed DE clustering algorithm (DE clustering-GRNN) is better than that of the DPK-medoids clustering-GRNN, the K-means clustering-GRNN, and the AM-GRNN in terms of day-ahead wind power prediction. Further, the proposed DE clustering-GRNN model is adaptive

    Cluster Oriented Image Retrieval System with Context Based Color Feature Subspace Selection

    Get PDF
    This paper presents a cluster oriented image retrieval system with context recognition mechanism for selection subspaces of color features. Our idea to implement a context in the image retrieval system is how to recognize the most important features in the image search by connecting the user impression to the query. We apply a context recognition with Mathematical Model of Meaning (MMM) and then make a projection to the color features with a color impression metric. After a user gives a context, the MMM retrieves the highest correlated words to the context. These representative words are projected to the color impression metric to obtain the most significant colors for subspace feature selection. After applying subspace selection, the system then clusters the image database using Pillar-Kmeans algorithm. The centroids of clustering results are used for calculating the similarity measurements to the image query. We perform our proposed system for experimental purpose with the Ukiyo-e image datasets from Tokyo Metropolitan Library for representing the Japanese cultural image collections

    Continuous homophily and clustering in random networks

    Get PDF
    Gauer F, Landwehr J. Continuous homophily and clustering in random networks. Center for Mathematical Economics Working Papers. Vol 515 July 2014. Bielefeld: Center for Mathematical Economics, Bielefeld University; 2014.We propose a random network model incorporating heterogeneity of agents and a continuous notion of homophily. Unlike the vast majority of the corresponding economic literature, we capture homophily in terms of similarity rather than equality of agents. We show that if links between similar agents are indeed more likely, our homophilous random network model exhibits clustering. Moreover, simulations indicate that the well-known small-world phenomenon is preserved even at high homophily levels. As a possible application we provide a stylized labor market model, where a firm can hire a worker via the social network

    Real time clustering of time series using triangular potentials

    Full text link
    Motivated by the problem of computing investment portfolio weightings we investigate various methods of clustering as alternatives to traditional mean-variance approaches. Such methods can have significant benefits from a practical point of view since they remove the need to invert a sample covariance matrix, which can suffer from estimation error and will almost certainly be non-stationary. The general idea is to find groups of assets which share similar return characteristics over time and treat each group as a single composite asset. We then apply inverse volatility weightings to these new composite assets. In the course of our investigation we devise a method of clustering based on triangular potentials and we present associated theoretical results as well as various examples based on synthetic data.Comment: AIFU1

    From Frequency to Meaning: Vector Space Models of Semantics

    Full text link
    Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field
    • …
    corecore