303 research outputs found
IGMM-CD: a gaussian mixture classification algorithm for data streams with concept drifts
Learning concepts from data streams differs significantly from traditional batch learning, because in data streams the concepts to be learned may evolve over time. Incremental learning paradigm is a promising approach for learning in a data stream setting. However, in the presence of concept drifts, outdated concepts can cause misclassifications. Although several incremental Gaussian mixture models methods have been proposed in the literature, we notice that these algorithms lack an explicit policy to discard outdated concepts. In this paper, we propose a new incremental algorithm for data stream learning based on Gaussian Mixture Models. The proposed method is compared to various algorithms widely used in the literature, and the results show that it is competitive with them in various scenarios, overcoming them in some cases
Music shapelets for fast cover song regognition
A cover song is a new performance or recording of a previously recorded music by an artist other than the original one. The automatic identification of cover songs is useful for a wide range of tasks, from fans looking for new versions of their favorite songs to organizations involved in licensing copyrighted songs. This is a difficult task given that a cover may differ from the original song in key, timbre, tempo, structure, arrangement and even language of the vocals. Cover song identification has attracted some attention recently. However, most of the state-of-the-art approaches are based on similarity search, which involves a large number of similarity computations to retrieve potential cover versions for a query recording. In this paper, we adapt the idea of time series shapelets for contentbased music retrieval. Our proposal adds a training phase that finds small excerpts of feature vectors that best describe each song. We demonstrate that we can use such small segments to identify cover songs with higher identification rates and more than one order of magnitude faster than methods that use features to describe the whole music.FAPESP (grants #2011/17698-5, #2013/26151-5, and 2015/07628-0)CNPq (grants 446330/2014-0 and 303083/2013-1
Signal classification by similarity and feature extraction allows an important application in insect recognition
Insects have a strong relationship with the humanity, in both positive and negative ways. It is estimated that insects, particularly bees, pollinate at least twothirds of all food consumed in the world. In contrast, mosquito borne diseases kill millions of people every year. Due to such a complex relationship, insect control attempts must be carefully planned. Otherwise, there is the risk of eliminating beneficial species, such as the recent threat of bee extinction. We are developing a\ud
novel sensor as a tool to control disease vectors and agricultural pests. This sensor captures insect flight information using laser light and classify the insects according to their species. Therefore, the sensor will provide real-time population estimates of species. Such information is the key to enable effective alarming systems for outbreaks, the intelligent use of insect\ud
control techniques, such as insecticides, and will be the heart of the next generation of insect traps that will capture only species of interest. In this paper, we demonstrate how we overtook the most importante challenge to make this sensor practical: the creation of accurate classification systems. The sensor generates\ud
a very brief signal as result of the instant that the insect crosses the laser. Such events last for tenths of a second and have a very simple structure, consequence of the wings movements. Nevertheless, we managed to successfully identify relevant features using speech and audio analysis techniques. Even with the described challenges, we show that we can achieve an accuracy of 98% in the task of disease vector mosquitoes identification.São Paulo Research Foundation (FAPESP) (Grants #2011/04054-2 and #2012/50714-7
Marketing, qualidade e inovação.
Texto publicado originalmente no jornal Açoriano Oriental, secção "Bits & Bytes", de 18 de Novembro de 2006."[…]. Marketing é, segundo a nova definição da associação americana (American MarketingAssociation), um conjunto de actividades que têm por objectivo compreender as necessidades do consumidor, cliente ou utente e de empreender esforços para satisfazer essas necessidades, da melhor forma possível, ou seja, de forma a aumentar o valor dos produtos e serviços para os clientes, passando por melhorar o grau de satisfação, a fidelização e a satisfação global destes. O marketing surge assim como um conjunto de actividades absolutamente essenciais nas organizações modernas orientadas para o cliente e não para o produto ou serviço que vendem. O marketing é o ponto final da cadeia logística, sendo responsável por objectivos definidos em termos de volume de vendas ou da relação prolongada no tempo com o cliente. […]"
A study of the use of complexity measures in the similarity search process adopted by kNN algorithm for time series prediction
In the last two decades, with the rise of the Data Mining process, there is an increasing interest in the adaptation of Machine Learning methods to support Time Series non-parametric modeling and prediction. The non-parametric temporal data modeling can be performed according to local and global approaches. The most of the local prediction data strategies are based on the k-Nearest Neighbor (kNN) learning method. In this paper we propose a modification of the kNN algorithm for Time Series prediction. Our proposal differs from the literature by incorporating three techniques for obtaining amplitude and offset invariance, complexity invariance, and treatment of trivial matches. We evaluate the proposed method with six complexity measures, in order to verify the impact of these measures in the projection of the future values. Besides, we face our method with two Machine Learning regression algorithms. The experimental comparisons were performed using 55 data sets, which are available at the ICMC-USP Time Series Prediction Repository. Our results indicate that the developed method is competitive and the use of a complexity-invariant distance measure generally improves the predictive performance.FAPESP (grant 2013/109- 78-8)CNPq (grants 303083/2013-1 and 446330/2014-0
Robust Multi-class Graph Transduction with higher order regularization
Graph transduction refers to a family of algorithms that learn from both labeled and unlabeled examples using a weighted graph and scarce label information via regularization or label propagation. A recent empirical study showed that the Robust Multi-class Graph Transduction (RMGT) algorithm achieves state-of-the-art performance on a variety of graph transduction tasks. Although RMGT achieves state-of-the-art performance and is parameter-free, this method was specifically designed for using the combinatorial Laplacian within its regularization framework. Unfortunately, the combinatorial Laplacian may not be the most appropriate graph Laplacian for all real applications and recent empirical studies showed that normalized and iterated Laplacians may be better suited than combinatorial Laplacians in general tasks. In this paper, we generalize the RMGT algorithm for any positive semidefinite matrix. Therefore, we provide a novel graph transduction method that can naturally deal with higher order regularization. In order to show the effectiveness of our method, we empirically evaluate it against five state-of-the-art graphbased semi-supervised learning algorithms with respect to graph construction and parameter selection on a number of benchmark data sets. Through a detailed experimental analysis using recently proposed empirical evaluation models, we see that our method achieved competitive performance on most data sets. In addition, our method achieved good stability with respect to the graph's parameter for most data sets and graph construction methods, which is a valuable property for real applications. However, the Laplacian's degree value may have a moderate influence in the performance of our method.CAPESFAPESP (grant 2012/50714-7)CNPq (grant 446330/2014-0
Time series classification with representation ensembles
Time series has attracted much attention in recent years, with thousands of methods for diverse tasks such as classification, clustering, prediction, and anomaly detection. Among all these tasks, classification is likely the most prominent task, accounting for most of the applications and attention from the research community. However, in spite of the huge number of methods available, there is a significant body of empirical evidence indicating that the 1-nearest neighbor algorithm (1-NN) in the time domain is “extremely difficult to beat”. In this paper, we evaluate the use of different data representations in time series classification. Our work is motivated by methods used in related areas such as signal processing and music retrieval. In these areas, a change of representation frequently reveals features that are not apparent in the original data representation. Our approach consists of using different representations such as frequency, wavelets, and autocorrelation to transform the time series into alternative decision spaces. A classifier is then used to provide a classification for each test time series in the alternative domain. We investigate how features provided in different domains can help in time series classification. We also experiment with different ensembles to investigate if the data representations are a good source of diversity for time series classification. Our extensive experimental evaluation approaches the issue of combining sets of representations and ensemble strategies, resulting in over 300 ensemble configurations.São Paulo Research Foundation (FAPESP) (grant #2012/08923-8, #2013/26151-5, and #2015/07628-0)CNPq (grant #446330/2014-0 and #303083/2013-1)International Symposium on Advances in Intelligent Data Analysis - IDA (14. 2015 Saint Etienne
Extracting texture features for time series classification
Time series are present in many pattern recognition applications related to medicine, biology, astronomy, economy, and others. In particular, the classification task has attracted much attention from a large number of researchers. In such a task, empirical researches has shown that the 1-Nearest Neighbor rule with a distance measure in time domain usually performs well in a variety of application domains. However, certain time series features are not evident in time domain. A classical example is the classification of sound, in which representative features are usually present in the frequency domain. For these applications, an alternative representation is necessary. In this work we investigate the use of recurrence plots as data representation for time series classification. This representation has well-defined visual texture patterns and their graphical nature exposes hidden patterns and structural changes in data. Therefore, we propose a method capable of extracting texture features from this graphical representation, and use those features to classify time series data. We use traditional methods such as Grey Level Co-occurrence Matrix and Local Binary Patterns, which have shown good results in texture classification. In a comprehensible experimental evaluation, we show that our method outperforms the state-ofthe-art methods for time series classification.CNPqFAPESP (grants #2011/17698-5, #2012/07295-3, #2012/50714-7 and #2013/23037-7
Music classification by transductive learning using bipartite heterogeneous networks
The popularization of music distribution in electronic format has increased the amount of music with incomplete metadata. The incompleteness of data can hamper some important tasks, such as music and artist recommendation. In this scenario, transductive classification can be used to classify the whole dataset considering just few labeled instances. Usually transductive classification is performed through label propagation, in which data are represented as networks and the examples propagate their labels through\ud
their connections. Similarity-based networks are usually applied to model data as network. However, this kind of representation requires the definition of parameters, which significantly affect the classification accuracy, and presentes a high cost due to the computation of similarities among all dataset instances. In contrast, bipartite heterogeneous networks have appeared as an alternative to similarity-based networks in text mining applications. In these networks, the words are connected to the documents which they occur. Thus, there is no parameter or additional costs to generate such networks. In this paper, we propose the use of the bipartite network representation to perform transductive classification of music, using a bag-of-frames approach to describe music signals. We demonstrate that the proposed approach outperforms other music classification approaches when few labeled instances are available.Sao Paulo Research Foundation (FAPESP) (grants 2011/12823-6, 2012/50714-7, 2013/26151-5, and 2014/08996-0
Adding diversity to rank examples in anytime nearest neighbor classification
In the last decade we have witnessed a huge increase of interest in data stream learning algorithms. A stream is na ordered sequence of data records. It is characterized by properties such as the potentially infinite and rapid flow of instances. However, a property that is common to various application domains and is frequently disregarded is the very high fluctuating data rates. In domains with fluctuating data rates, the events do not occur with a fixed frequency. This imposes an additional\ud
challenge for the classifiers since the next event can occur at any time after the previous one. Anytime classification provides a very convenient approach for fluctuating data rates. In summary, an anytime classifier can be interrupted at any time before its completion and still be able to provide an intermediate solution. The popular k-nearest neighbor (k-NN) classifier can be easily made anytime by introducing a ranking of the training examples. A classification is achieved by scanning the training examples according to this ranking. In this paper, we show how the\ud
current state-of-the-art k-NN anytime classifier can be made more accurate by introducing diversity in the training set ranking. Our results show that, with this simple modification, the performance of the anytime version of the k-NN algorithm is consistently improved for a large number of datasets
- …