9,930 research outputs found
Finding groups in data: Cluster analysis with ants
Wepresent in this paper a modification of Lumer and Faieta’s algorithm for data clustering. This approach
mimics the clustering behavior observed in real ant colonies. This algorithm discovers automatically
clusters in numerical data without prior knowledge of possible number of clusters. In this paper we focus
on ant-based clustering algorithms, a particular kind of a swarm intelligent system, and on the effects on
the final clustering by using during the classification differentmetrics of dissimilarity: Euclidean, Cosine,
and Gower measures. Clustering with swarm-based algorithms is emerging as an alternative to more
conventional clustering methods, such as e.g. k-means, etc. Among the many bio-inspired techniques, ant
clustering algorithms have received special attention, especially because they still require much
investigation to improve performance, stability and other key features that would make such algorithms
mature tools for data mining.
As a case study, this paper focus on the behavior of clustering procedures in those new approaches.
The proposed algorithm and its modifications are evaluated in a number of well-known benchmark
datasets. Empirical results clearly show that ant-based clustering algorithms performs well when
compared to another techniques
Dynamic feature selection for clustering high dimensional data streams
open access articleChange in a data stream can occur at the concept level and at the feature level. Change at the feature level can occur if new, additional features appear in the stream or if the importance and relevance of a feature changes as the stream progresses. This type of change has not received as much attention as concept-level change. Furthermore, a lot of the methods proposed for clustering streams (density-based, graph-based, and grid-based) rely on some form of distance as a similarity metric and this is problematic in high-dimensional data where the curse of dimensionality renders distance measurements and any concept of “density” difficult. To address these two challenges we propose combining them and framing the problem as a feature selection problem, specifically a dynamic feature selection problem. We propose a dynamic feature mask for clustering high dimensional data streams. Redundant features are masked and clustering is performed along unmasked, relevant features. If a feature's perceived importance changes, the mask is updated accordingly; previously unimportant features are unmasked and features which lose relevance become masked. The proposed method is algorithm-independent and can be used with any of the existing density-based clustering algorithms which typically do not have a mechanism for dealing with feature drift and struggle with high-dimensional data. We evaluate the proposed method on four density-based clustering algorithms across four high-dimensional streams; two text streams and two image streams. In each case, the proposed dynamic feature mask improves clustering performance and reduces the processing time required by the underlying algorithm. Furthermore, change at the feature level can be observed and tracked
Vectors of Locally Aggregated Centers for Compact Video Representation
We propose a novel vector aggregation technique for compact video
representation, with application in accurate similarity detection within large
video datasets. The current state-of-the-art in visual search is formed by the
vector of locally aggregated descriptors (VLAD) of Jegou et. al. VLAD generates
compact video representations based on scale-invariant feature transform (SIFT)
vectors (extracted per frame) and local feature centers computed over a
training set. With the aim to increase robustness to visual distortions, we
propose a new approach that operates at a coarser level in the feature
representation. We create vectors of locally aggregated centers (VLAC) by first
clustering SIFT features to obtain local feature centers (LFCs) and then
encoding the latter with respect to given centers of local feature centers
(CLFCs), extracted from a training set. The sum-of-differences between the LFCs
and the CLFCs are aggregated to generate an extremely-compact video description
used for accurate video segment similarity detection. Experimentation using a
video dataset, comprising more than 1000 minutes of content from the Open Video
Project, shows that VLAC obtains substantial gains in terms of mean Average
Precision (mAP) against VLAD and the hyper-pooling method of Douze et. al.,
under the same compaction factor and the same set of distortions.Comment: Proc. IEEE International Conference on Multimedia and Expo, ICME
2015, Torino, Ital
Learning and comparing functional connectomes across subjects
Functional connectomes capture brain interactions via synchronized
fluctuations in the functional magnetic resonance imaging signal. If measured
during rest, they map the intrinsic functional architecture of the brain. With
task-driven experiments they represent integration mechanisms between
specialized brain areas. Analyzing their variability across subjects and
conditions can reveal markers of brain pathologies and mechanisms underlying
cognition. Methods of estimating functional connectomes from the imaging signal
have undergone rapid developments and the literature is full of diverse
strategies for comparing them. This review aims to clarify links across
functional-connectivity methods as well as to expose different steps to perform
a group study of functional connectomes
Markets, herding and response to external information
We focus on the influence of external sources of information upon financial
markets. In particular, we develop a stochastic agent-based market model
characterized by a certain herding behavior as well as allowing traders to be
influenced by an external dynamic signal of information. This signal can be
interpreted as a time-varying advertising, public perception or rumor, in favor
or against one of two possible trading behaviors, thus breaking the symmetry of
the system and acting as a continuously varying exogenous shock. As an
illustration, we use a well-known German Indicator of Economic Sentiment as
information input and compare our results with Germany's leading stock market
index, the DAX, in order to calibrate some of the model parameters. We study
the conditions for the ensemble of agents to more accurately follow the
information input signal. The response of the system to the external
information is maximal for an intermediate range of values of a market
parameter, suggesting the existence of three different market regimes:
amplification, precise assimilation and undervaluation of incoming information.Comment: 30 pages, 8 figures. Thoroughly revised and updated version of
arXiv:1302.647
Time-variation of higher moments in a financial market with heterogeneous agents: An analytical approach
A growing body of recent literature allows for heterogenous trading strategies and limited rationality of agents in behavioral models of financial markets. More and more, this literature has been concerned with the explanation of some of the stylized facts of financial markets. It now seems that some previously mysterious time-series characteristics like fat tails of returns and temporal dependence of volatility can be observed in many of these models as macroscopic patterns resulting from the interaction among different groups of speculative traders. However, most of the available evidence stems from simulation studies of relatively complicated models which do not allow for analytical solutions. In this paper, this line of research is supplemented by analytical solutions of a simple variant of the seminal herding model introduced by Kirman [1993]. Embedding the herding framework into a simple equilibrium asset pricing model, we are able to derive closed-form solutions for the time-variation of higher moments as well as related quantities of interest enabling us to spell out under what circumstances the model gives rise to realistic behavior of the resulting time series --
- …