6,411 research outputs found

    Finding the different patterns in buildings data using bag of words representation with clustering

    Full text link
    The understanding of the buildings operation has become a challenging task due to the large amount of data recorded in energy efficient buildings. Still, today the experts use visual tools for analyzing the data. In order to make the task realistic, a method has been proposed in this paper to automatically detect the different patterns in buildings. The K Means clustering is used to automatically identify the ON (operational) cycles of the chiller. In the next step the ON cycles are transformed to symbolic representation by using Symbolic Aggregate Approximation (SAX) method. Then the SAX symbols are converted to bag of words representation for hierarchical clustering. Moreover, the proposed technique is applied to real life data of adsorption chiller. Additionally, the results from the proposed method and dynamic time warping (DTW) approach are also discussed and compared

    Proceedings of the 2nd Computer Science Student Workshop: Microsoft Istanbul, Turkey, April 9, 2011

    Get PDF

    A Study on Variational Component Splitting approach for Mixture Models

    Get PDF
    Increase in use of mobile devices and the introduction of cloud-based services have resulted in the generation of enormous amount of data every day. This calls for the need to group these data appropriately into proper categories. Various clustering techniques have been introduced over the years to learn the patterns in data that might better facilitate the classification process. Finite mixture model is one of the crucial methods used for this task. The basic idea of mixture models is to fit the data at hand to an appropriate distribution. The design of mixture models hence involves finding the appropriate parameters of the distribution and estimating the number of clusters in the data. We use a variational component splitting framework to do this which could simultaneously learn the parameters of the model and estimate the number of components in the model. The variational algorithm helps to overcome the computational complexity of purely Bayesian approaches and the over fitting problems experienced with Maximum Likelihood approaches guaranteeing convergence. The choice of distribution remains the core concern of mixture models in recent research. The efficiency of Dirichlet family of distributions for this purpose has been proved in latest studies especially for non-Gaussian data. This led us to study the impact of variational component splitting approach on mixture models based on several distributions. Hence, our contribution is the application of variational component splitting approach to design finite mixture models based on inverted Dirichlet, generalized inverted Dirichlet and inverted Beta-Liouville distributions. In addition, we also incorporate a simultaneous feature selection approach for generalized inverted Dirichlet mixture model along with component splitting as another experimental contribution. We evaluate the performance of our models with various real-life applications such as object, scene, texture, speech and video categorization

    Visual Landmark Recognition from Internet Photo Collections: A Large-Scale Evaluation

    Full text link
    The task of a visual landmark recognition system is to identify photographed buildings or objects in query photos and to provide the user with relevant information on them. With their increasing coverage of the world's landmark buildings and objects, Internet photo collections are now being used as a source for building such systems in a fully automatic fashion. This process typically consists of three steps: clustering large amounts of images by the objects they depict; determining object names from user-provided tags; and building a robust, compact, and efficient recognition index. To this date, however, there is little empirical information on how well current approaches for those steps perform in a large-scale open-set mining and recognition task. Furthermore, there is little empirical information on how recognition performance varies for different types of landmark objects and where there is still potential for improvement. With this paper, we intend to fill these gaps. Using a dataset of 500k images from Paris, we analyze each component of the landmark recognition pipeline in order to answer the following questions: How many and what kinds of objects can be discovered automatically? How can we best use the resulting image clusters to recognize the object in a query? How can the object be efficiently represented in memory for recognition? How reliably can semantic information be extracted? And finally: What are the limiting factors in the resulting pipeline from query to semantics? We evaluate how different choices of methods and parameters for the individual pipeline steps affect overall system performance and examine their effects for different query categories such as buildings, paintings or sculptures
    • …
    corecore