9,930 research outputs found

    Finding groups in data: Cluster analysis with ants

    Get PDF
    Wepresent in this paper a modification of Lumer and Faieta’s algorithm for data clustering. This approach mimics the clustering behavior observed in real ant colonies. This algorithm discovers automatically clusters in numerical data without prior knowledge of possible number of clusters. In this paper we focus on ant-based clustering algorithms, a particular kind of a swarm intelligent system, and on the effects on the final clustering by using during the classification differentmetrics of dissimilarity: Euclidean, Cosine, and Gower measures. Clustering with swarm-based algorithms is emerging as an alternative to more conventional clustering methods, such as e.g. k-means, etc. Among the many bio-inspired techniques, ant clustering algorithms have received special attention, especially because they still require much investigation to improve performance, stability and other key features that would make such algorithms mature tools for data mining. As a case study, this paper focus on the behavior of clustering procedures in those new approaches. The proposed algorithm and its modifications are evaluated in a number of well-known benchmark datasets. Empirical results clearly show that ant-based clustering algorithms performs well when compared to another techniques

    Dynamic feature selection for clustering high dimensional data streams

    Get PDF
    open access articleChange in a data stream can occur at the concept level and at the feature level. Change at the feature level can occur if new, additional features appear in the stream or if the importance and relevance of a feature changes as the stream progresses. This type of change has not received as much attention as concept-level change. Furthermore, a lot of the methods proposed for clustering streams (density-based, graph-based, and grid-based) rely on some form of distance as a similarity metric and this is problematic in high-dimensional data where the curse of dimensionality renders distance measurements and any concept of “density” difficult. To address these two challenges we propose combining them and framing the problem as a feature selection problem, specifically a dynamic feature selection problem. We propose a dynamic feature mask for clustering high dimensional data streams. Redundant features are masked and clustering is performed along unmasked, relevant features. If a feature's perceived importance changes, the mask is updated accordingly; previously unimportant features are unmasked and features which lose relevance become masked. The proposed method is algorithm-independent and can be used with any of the existing density-based clustering algorithms which typically do not have a mechanism for dealing with feature drift and struggle with high-dimensional data. We evaluate the proposed method on four density-based clustering algorithms across four high-dimensional streams; two text streams and two image streams. In each case, the proposed dynamic feature mask improves clustering performance and reduces the processing time required by the underlying algorithm. Furthermore, change at the feature level can be observed and tracked

    Vectors of Locally Aggregated Centers for Compact Video Representation

    Full text link
    We propose a novel vector aggregation technique for compact video representation, with application in accurate similarity detection within large video datasets. The current state-of-the-art in visual search is formed by the vector of locally aggregated descriptors (VLAD) of Jegou et. al. VLAD generates compact video representations based on scale-invariant feature transform (SIFT) vectors (extracted per frame) and local feature centers computed over a training set. With the aim to increase robustness to visual distortions, we propose a new approach that operates at a coarser level in the feature representation. We create vectors of locally aggregated centers (VLAC) by first clustering SIFT features to obtain local feature centers (LFCs) and then encoding the latter with respect to given centers of local feature centers (CLFCs), extracted from a training set. The sum-of-differences between the LFCs and the CLFCs are aggregated to generate an extremely-compact video description used for accurate video segment similarity detection. Experimentation using a video dataset, comprising more than 1000 minutes of content from the Open Video Project, shows that VLAC obtains substantial gains in terms of mean Average Precision (mAP) against VLAD and the hyper-pooling method of Douze et. al., under the same compaction factor and the same set of distortions.Comment: Proc. IEEE International Conference on Multimedia and Expo, ICME 2015, Torino, Ital

    Learning and comparing functional connectomes across subjects

    Get PDF
    Functional connectomes capture brain interactions via synchronized fluctuations in the functional magnetic resonance imaging signal. If measured during rest, they map the intrinsic functional architecture of the brain. With task-driven experiments they represent integration mechanisms between specialized brain areas. Analyzing their variability across subjects and conditions can reveal markers of brain pathologies and mechanisms underlying cognition. Methods of estimating functional connectomes from the imaging signal have undergone rapid developments and the literature is full of diverse strategies for comparing them. This review aims to clarify links across functional-connectivity methods as well as to expose different steps to perform a group study of functional connectomes

    Markets, herding and response to external information

    Get PDF
    We focus on the influence of external sources of information upon financial markets. In particular, we develop a stochastic agent-based market model characterized by a certain herding behavior as well as allowing traders to be influenced by an external dynamic signal of information. This signal can be interpreted as a time-varying advertising, public perception or rumor, in favor or against one of two possible trading behaviors, thus breaking the symmetry of the system and acting as a continuously varying exogenous shock. As an illustration, we use a well-known German Indicator of Economic Sentiment as information input and compare our results with Germany's leading stock market index, the DAX, in order to calibrate some of the model parameters. We study the conditions for the ensemble of agents to more accurately follow the information input signal. The response of the system to the external information is maximal for an intermediate range of values of a market parameter, suggesting the existence of three different market regimes: amplification, precise assimilation and undervaluation of incoming information.Comment: 30 pages, 8 figures. Thoroughly revised and updated version of arXiv:1302.647

    Time-variation of higher moments in a financial market with heterogeneous agents: An analytical approach

    Get PDF
    A growing body of recent literature allows for heterogenous trading strategies and limited rationality of agents in behavioral models of financial markets. More and more, this literature has been concerned with the explanation of some of the stylized facts of financial markets. It now seems that some previously mysterious time-series characteristics like fat tails of returns and temporal dependence of volatility can be observed in many of these models as macroscopic patterns resulting from the interaction among different groups of speculative traders. However, most of the available evidence stems from simulation studies of relatively complicated models which do not allow for analytical solutions. In this paper, this line of research is supplemented by analytical solutions of a simple variant of the seminal herding model introduced by Kirman [1993]. Embedding the herding framework into a simple equilibrium asset pricing model, we are able to derive closed-form solutions for the time-variation of higher moments as well as related quantities of interest enabling us to spell out under what circumstances the model gives rise to realistic behavior of the resulting time series --
    corecore