12 research outputs found

    Indexing Metric Spaces for Exact Similarity Search

    Full text link
    With the continued digitalization of societal processes, we are seeing an explosion in available data. This is referred to as big data. In a research setting, three aspects of the data are often viewed as the main sources of challenges when attempting to enable value creation from big data: volume, velocity and variety. Many studies address volume or velocity, while much fewer studies concern the variety. Metric space is ideal for addressing variety because it can accommodate any type of data as long as its associated distance notion satisfies the triangle inequality. To accelerate search in metric space, a collection of indexing techniques for metric data have been proposed. However, existing surveys each offers only a narrow coverage, and no comprehensive empirical study of those techniques exists. We offer a survey of all the existing metric indexes that can support exact similarity search, by i) summarizing all the existing partitioning, pruning and validation techniques used for metric indexes, ii) providing the time and storage complexity analysis on the index construction, and iii) report on a comprehensive empirical comparison of their similarity query processing performance. Here, empirical comparisons are used to evaluate the index performance during search as it is hard to see the complexity analysis differences on the similarity query processing and the query performance depends on the pruning and validation abilities related to the data distribution. This article aims at revealing different strengths and weaknesses of different indexing techniques in order to offer guidance on selecting an appropriate indexing technique for a given setting, and directing the future research for metric indexes

    Study on predicting sentiment from images using categorical and sentimental keyword-based image retrieval

    Get PDF
    Visual stimuli are the most sensitive stimulus to affect human sentiments. Many researches have attempted to find the relationship between visual elements in images and sentimental elements using statistical approaches. In many cases, the range of sentiment that affects humans varies with image categories, such as landscapes, portraits, sports, and still life. Therefore, to enhance the performance of sentiment prediction, an individual prediction model must be established for each image category. However, collecting much ground truth sentiment data is one of the obstacles encountered by studies on this field. In this paper, we propose an approach that acquires a training data set for category classification and predicting sentiments from images. Using this approach, we collect a training data set and establish a predictor for sentiments from images. First, we estimate the image category from a given image, and then we predict the sentiment as coordinates on the arousal–valence space using the predictor of an estimated category. We show that the performance of our approach approximates performance using ground truth data. Based on our experiments, we argue that our approach, which utilizes big data on the web as the training set for predicting content sentiment, is useful for practical purposes

    PYDBSCAN UN SOFTWARE PER IL CLUSTERING DI DATI

    Get PDF
    Con il termine clustering si indica il processo mediante il quale è possibile raggruppare oggetti in base a caratteristiche comuni (features). Questo approccio, alla base dei processi di estrazione di conoscenza da insiemi di dati (data mining), riveste notevole importanza nelle tecniche di analisi. Come verrà mostrato in questo lavoro, l’applicazione delle tecniche di clustering consente di analizzare dataset, con l’obiettivo di ricercare strutture che possano fornire informazioni utili circa i dati oggetto dello studio. Gli ambiti in cui tali algoritmi sono impiegati risultano essere eterogenei, a partire dalle analisi di dati biomedici, astrofisici, biologici, fino ad arrivare a quelli geofisici. La letteratura è ricca di vari casi di studio, dai quali il ricercatore può trarre spunto e adattare i differenti approcci alle proprie esigenze. Il software PyDBSCAN, oggetto del presente lavoro, permette di applicare tecniche di clustering basate sul concetto di densità, applicate ad oggetti (o punti) appartenenti ad insiemi definiti in uno spazio metrico. L’algoritmo di base è il DBSCAN (Density Based Spatial Clustering on Application with Noise) [Ester et al., 1996], di cui viene riportata una implementazione ottimizzata al fine di migliorare la qualità del processamento dei dati. Schematicamente, il sistema proposto può essere rappresentato come in Fig. 1. Il software, sviluppato in Python 2.6 [Python ref.], utilizza le librerie scientifiche Numpy [Numpy ref.], Matplotlib [matplotlib ref.] e la libreria grafica PyQt [PyQt ref.] impiegata nella realizzazione dell’interfaccia utente. Python è un linguaggio di programmazione che permette la realizzazione di applicazioni crossplatform in grado di funzionare su diversi sistemi operativi quali Windows, Unix, Linux e Mac OS. Nella prima parte del lavoro verranno brevemente descritte le tecniche oggetto del software presentato, mentre nella seconda parte verrà descritto un esempio di applicazione su dati reali

    Antipole Tree Indexing to Support Range Search and K-Nearest-Neighbor Search in Metric Spaces

    No full text
    Range and k-nearest neighbor searching are core problems in pattern recognition. Given a database S of objects in a metric space M and a query object q in M, in a range searching problem the goal is to find the objects of S within some threshold distance to q, whereas in a k-nearest neighbor searching problem, the k elements of S closest to q must be produced. These problems can obviously be solved with a linear number of distance calculations, by comparing the query object against every object in the database. However, the goal is to solve such problems much faster. We combine and extend ideas from the M-Tree, the Multivantage Point structure, and the FQ-Tree to create a new structure in the "bisector tree" class, called the Antipole Tree. Bisection is based on the proximity to an "Antipole" pair of elements generated by a suitable linear randomized tournament. The final winners a; b of such a tournament are far enough apart to approximate the diameter of the splitting set. If dist (a; b) is larger than the chosen cluster diameter threshold, then the cluster is split. The proposed data structure is an indexing scheme suitable for ( exact and approximate) best match searching on generic metric spaces. The Antipole Tree outperforms by a factor of approximately two existing structures such as List of Clusters, M-Trees, and others and, in many cases, it achieves better clustering propertie

    Advances in Data Mining Knowledge Discovery and Applications

    Get PDF
    Advances in Data Mining Knowledge Discovery and Applications aims to help data miners, researchers, scholars, and PhD students who wish to apply data mining techniques. The primary contribution of this book is highlighting frontier fields and implementations of the knowledge discovery and data mining. It seems to be same things are repeated again. But in general, same approach and techniques may help us in different fields and expertise areas. This book presents knowledge discovery and data mining applications in two different sections. As known that, data mining covers areas of statistics, machine learning, data management and databases, pattern recognition, artificial intelligence, and other areas. In this book, most of the areas are covered with different data mining applications. The eighteen chapters have been classified in two parts: Knowledge Discovery and Data Mining Applications

    Tracing the Compositional Process. Sound art that rewrites its own past: formation, praxis and a computer framework

    Get PDF
    The domain of this thesis is electroacoustic computer-based music and sound art. It investigates a facet of composition which is often neglected or ill-defined: the process of composing itself and its embedding in time. Previous research mostly focused on instrumental composition or, when electronic music was included, the computer was treated as a tool which would eventually be subtracted from the equation. The aim was either to explain a resultant piece of music by reconstructing the intention of the composer, or to explain human creativity by building a model of the mind. Our aim instead is to understand composition as an irreducible unfolding of material traces which takes place in its own temporality. This understanding is formalised as a software framework that traces creation time as a version graph of transactions. The instantiation and manipulation of any musical structure implemented within this framework is thereby automatically stored in a database. Not only can it be queried ex post by an external researcher—providing a new quality for the empirical analysis of the activity of composing—but it is an integral part of the composition environment. Therefore it can recursively become a source for the ongoing composition and introduce new ways of aesthetic expression. The framework aims to unify creation and performance time, fixed and generative composition, human and algorithmic “writing”, a writing that includes indeterminate elements which condense as concurrent vertices in the version graph. The second major contribution is a critical epistemological discourse on the question of ob- servability and the function of observation. Our goal is to explore a new direction of artistic research which is characterised by a mixed methodology of theoretical writing, technological development and artistic practice. The form of the thesis is an exercise in becoming process-like itself, wherein the epistemic thing is generated by translating the gaps between these three levels. This is my idea of the new aesthetics: That through the operation of a re-entry one may establish a sort of process “form”, yielding works which go beyond a categorical either “sound-in-itself” or “conceptualism”. Exemplary processes are revealed by deconstructing a series of existing pieces, as well as through the successful application of the new framework in the creation of new pieces
    corecore