12 research outputs found
Indexing Metric Spaces for Exact Similarity Search
With the continued digitalization of societal processes, we are seeing an
explosion in available data. This is referred to as big data. In a research
setting, three aspects of the data are often viewed as the main sources of
challenges when attempting to enable value creation from big data: volume,
velocity and variety. Many studies address volume or velocity, while much fewer
studies concern the variety. Metric space is ideal for addressing variety
because it can accommodate any type of data as long as its associated distance
notion satisfies the triangle inequality. To accelerate search in metric space,
a collection of indexing techniques for metric data have been proposed.
However, existing surveys each offers only a narrow coverage, and no
comprehensive empirical study of those techniques exists. We offer a survey of
all the existing metric indexes that can support exact similarity search, by i)
summarizing all the existing partitioning, pruning and validation techniques
used for metric indexes, ii) providing the time and storage complexity analysis
on the index construction, and iii) report on a comprehensive empirical
comparison of their similarity query processing performance. Here, empirical
comparisons are used to evaluate the index performance during search as it is
hard to see the complexity analysis differences on the similarity query
processing and the query performance depends on the pruning and validation
abilities related to the data distribution. This article aims at revealing
different strengths and weaknesses of different indexing techniques in order to
offer guidance on selecting an appropriate indexing technique for a given
setting, and directing the future research for metric indexes
Study on predicting sentiment from images using categorical and sentimental keyword-based image retrieval
Visual stimuli are the most sensitive stimulus to affect human sentiments. Many researches have attempted to find the relationship between visual elements in images and sentimental elements using statistical approaches. In many cases, the range of sentiment that affects humans varies with image categories, such as landscapes, portraits, sports, and still life. Therefore, to enhance the performance of sentiment prediction, an individual prediction model must be established for each image category. However, collecting much ground truth sentiment data is one of the obstacles encountered by studies on this field. In this paper, we propose an approach that acquires a training data set for category classification and predicting sentiments from images. Using this approach, we collect a training data set and establish a predictor for sentiments from images. First, we estimate the image category from a given image, and then we predict the sentiment as coordinates on the arousalâvalence space using the predictor of an estimated category. We show that the performance of our approach approximates performance using ground truth data. Based on our experiments, we argue that our approach, which utilizes big data on the web as the training set for predicting content sentiment, is useful for practical purposes
PYDBSCAN UN SOFTWARE PER IL CLUSTERING DI DATI
Con il termine clustering si indica il processo mediante il quale è possibile raggruppare oggetti in base
a caratteristiche comuni (features). Questo approccio, alla base dei processi di estrazione di conoscenza da
insiemi di dati (data mining), riveste notevole importanza nelle tecniche di analisi. Come verrĂ mostrato in
questo lavoro, lâapplicazione delle tecniche di clustering consente di analizzare dataset, con lâobiettivo di
ricercare strutture che possano fornire informazioni utili circa i dati oggetto dello studio. Gli ambiti in cui tali
algoritmi sono impiegati risultano essere eterogenei, a partire dalle analisi di dati biomedici, astrofisici,
biologici, fino ad arrivare a quelli geofisici. La letteratura è ricca di vari casi di studio, dai quali il ricercatore
può trarre spunto e adattare i differenti approcci alle proprie esigenze.
Il software PyDBSCAN, oggetto del presente lavoro, permette di applicare tecniche di clustering basate
sul concetto di densitĂ , applicate ad oggetti (o punti) appartenenti ad insiemi definiti in uno spazio metrico.
Lâalgoritmo di base è il DBSCAN (Density Based Spatial Clustering on Application with Noise) [Ester et al.,
1996], di cui viene riportata una implementazione ottimizzata al fine di migliorare la qualitĂ del
processamento dei dati. Schematicamente, il sistema proposto può essere rappresentato come in Fig. 1. Il
software, sviluppato in Python 2.6 [Python ref.], utilizza le librerie scientifiche Numpy [Numpy ref.],
Matplotlib [matplotlib ref.] e la libreria grafica PyQt [PyQt ref.] impiegata nella realizzazione dellâinterfaccia
utente. Python è un linguaggio di programmazione che permette la realizzazione di applicazioni crossplatform
in grado di funzionare su diversi sistemi operativi quali Windows, Unix, Linux e Mac OS.
Nella prima parte del lavoro verranno brevemente descritte le tecniche oggetto del software presentato,
mentre nella seconda parte verrĂ descritto un esempio di applicazione su dati reali
Antipole Tree Indexing to Support Range Search and K-Nearest-Neighbor Search in Metric Spaces
Range and k-nearest neighbor searching are core problems in pattern recognition. Given a database S of objects in a metric space M and a query object q in M, in a range searching problem the goal is to find the objects of S within some threshold distance to q, whereas in a k-nearest neighbor searching problem, the k elements of S closest to q must be produced. These problems can obviously be solved with a linear number of distance calculations, by comparing the query object against every object in the database. However, the goal is to solve such problems much faster. We combine and extend ideas from the M-Tree, the Multivantage Point structure, and the FQ-Tree to create a new structure in the "bisector tree" class, called the Antipole Tree. Bisection is based on the proximity to an "Antipole" pair of elements generated by a suitable linear randomized tournament. The final winners a; b of such a tournament are far enough apart to approximate the diameter of the splitting set. If dist (a; b) is larger than the chosen cluster diameter threshold, then the cluster is split. The proposed data structure is an indexing scheme suitable for ( exact and approximate) best match searching on generic metric spaces. The Antipole Tree outperforms by a factor of approximately two existing structures such as List of Clusters, M-Trees, and others and, in many cases, it achieves better clustering propertie
Advances in Data Mining Knowledge Discovery and Applications
Advances in Data Mining Knowledge Discovery and Applications aims to help data miners, researchers, scholars, and PhD students who wish to apply data mining techniques. The primary contribution of this book is highlighting frontier fields and implementations of the knowledge discovery and data mining. It seems to be same things are repeated again. But in general, same approach and techniques may help us in different fields and expertise areas. This book presents knowledge discovery and data mining applications in two different sections. As known that, data mining covers areas of statistics, machine learning, data management and databases, pattern recognition, artificial intelligence, and other areas. In this book, most of the areas are covered with different data mining applications. The eighteen chapters have been classified in two parts: Knowledge Discovery and Data Mining Applications
Tracing the Compositional Process. Sound art that rewrites its own past: formation, praxis and a computer framework
The domain of this thesis is electroacoustic computer-based music and sound art. It investigates
a facet of composition which is often neglected or ill-defined: the process of composing itself
and its embedding in time. Previous research mostly focused on instrumental composition or,
when electronic music was included, the computer was treated as a tool which would eventually
be subtracted from the equation. The aim was either to explain a resultant piece of music by
reconstructing the intention of the composer, or to explain human creativity by building a model
of the mind.
Our aim instead is to understand composition as an irreducible unfolding of material traces which
takes place in its own temporality. This understanding is formalised as a software framework
that traces creation time as a version graph of transactions. The instantiation and manipulation
of any musical structure implemented within this framework is thereby automatically stored
in a database. Not only can it be queried ex post by an external researcherâproviding a new
quality for the empirical analysis of the activity of composingâbut it is an integral part of
the composition environment. Therefore it can recursively become a source for the ongoing
composition and introduce new ways of aesthetic expression. The framework aims to unify
creation and performance time, fixed and generative composition, human and algorithmic
âwritingâ, a writing that includes indeterminate elements which condense as concurrent vertices
in the version graph.
The second major contribution is a critical epistemological discourse on the question of ob-
servability and the function of observation. Our goal is to explore a new direction of artistic
research which is characterised by a mixed methodology of theoretical writing, technological
development and artistic practice. The form of the thesis is an exercise in becoming process-like
itself, wherein the epistemic thing is generated by translating the gaps between these three levels.
This is my idea of the new aesthetics: That through the operation of a re-entry one may establish
a sort of process âformâ, yielding works which go beyond a categorical either âsound-in-itselfâ
or âconceptualismâ.
Exemplary processes are revealed by deconstructing a series of existing pieces, as well as
through the successful application of the new framework in the creation of new pieces