17,405 research outputs found

    Unsupervised Detection of Emergent Patterns in Large Image Collections

    Get PDF
    With the advent of modern image acquisition and sharing technologies, billions of images are added to the Internet every day. This huge repository contains useful information, but it is very hard to analyze. If labeled information is available for this data, then supervised learning techniques can be used to extract useful information. Visual pattern mining approaches provide a way to discover visual structures and patterns in an image collection without the need of any supervision. The Internet contains images of various objects, scenes, patterns, and shapes. The majority of approaches for visual pattern discovery, on the other hand, find patterns that are related to object or scene categories.Emergent pattern mining techniques provide a way to extract generic, complex and hidden structures in images. This thesis describes research, experiments, and analysis conducted to explore various approaches to mine emergent patterns from image collections in an unsupervised way. These approaches are based on itemset mining and graph theoretic strategies. The itemset mining strategy uses frequent itemset mining and rare itemset mining techniques to discover patterns.The mining is performed on a transactional dataset which is obtained from the BoW representation of images. The graph-based approach represents visual word co-occurrences obtained from images in a co-occurrence graph.Emergent patterns form dense clusters in this graph that are extracted using normalized cuts. The patterns that are discovered using itemset mining approaches are:stripes and parallel lines;dots and checks;bright dots;single lines;intersections; and frames. The graph based approach revealed various interesting patterns, including some patterns that are related to object categories

    Theory and Applications for Advanced Text Mining

    Get PDF
    Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields

    PicShark: mitigating metadata scarcity through large-scale P2P collaboration

    Get PDF
    With the commoditization of digital devices, personal information and media sharing is becoming a key application on the pervasive Web. In such a context, data annotation rather than data production is the main bottleneck. Metadata scarcity represents a major obstacle preventing efficient information processing in large and heterogeneous communities. However, social communities also open the door to new possibilities for addressing local metadata scarcity by taking advantage of global collections of resources. We propose to tackle the lack of metadata in large-scale distributed systems through a collaborative process leveraging on both content and metadata. We develop a community-based and self-organizing system called PicShark in which information entropy—in terms of missing metadata—is gradually alleviated through decentralized instance and schema matching. Our approach focuses on semi-structured metadata and confines computationally expensive operations to the edge of the network, while keeping distributed operations as simple as possible to ensure scalability. PicShark builds on structured Peer-to-Peer networks for distributed look-up operations, but extends the application of self-organization principles to the propagation of metadata and the creation of schema mappings. We demonstrate the practical applicability of our method in an image sharing scenario and provide experimental evidences illustrating the validity of our approac

    Retrieval and Annotation of Music Using Latent Semantic Models

    Get PDF
    PhDThis thesis investigates the use of latent semantic models for annotation and retrieval from collections of musical audio tracks. In particular latent semantic analysis (LSA) and aspect models (or probabilistic latent semantic analysis, pLSA) are used to index words in descriptions of music drawn from hundreds of thousands of social tags. A new discrete audio feature representation is introduced to encode musical characteristics of automatically-identified regions of interest within each track, using a vocabulary of audio muswords. Finally a joint aspect model is developed that can learn from both tagged and untagged tracks by indexing both conventional words and muswords. This model is used as the basis of a music search system that supports query by example and by keyword, and of a simple probabilistic machine annotation system. The models are evaluated by their performance in a variety of realistic retrieval and annotation tasks, motivated by applications including playlist generation, internet radio streaming, music recommendation and catalogue searchEngineering and Physical Sciences Research Counci
    • …
    corecore