120 research outputs found

    Unsupervised Feature Based Algorithms for Time Series Extrinsic Regression

    Full text link
    Time Series Extrinsic Regression (TSER) involves using a set of training time series to form a predictive model of a continuous response variable that is not directly related to the regressor series. The TSER archive for comparing algorithms was released in 2022 with 19 problems. We increase the size of this archive to 63 problems and reproduce the previous comparison of baseline algorithms. We then extend the comparison to include a wider range of standard regressors and the latest versions of TSER models used in the previous study. We show that none of the previously evaluated regressors can outperform a regression adaptation of a standard classifier, rotation forest. We introduce two new TSER algorithms developed from related work in time series classification. FreshPRINCE is a pipeline estimator consisting of a transform into a wide range of summary features followed by a rotation forest regressor. DrCIF is a tree ensemble that creates features from summary statistics over random intervals. Our study demonstrates that both algorithms, along with InceptionTime, exhibit significantly better performance compared to the other 18 regressors tested. More importantly, these two proposals (DrCIF and FreshPRINCE) models are the only ones that significantly outperform the standard rotation forest regressor.Comment: 19 pages, 21 figures, 6 tables. Appendix include

    Optimizing Dynamic Time Warping’s Window Width for Time Series Data Mining Applications

    Get PDF
    Dynamic Time Warping (DTW) is a highly competitive distance measure for most time series data mining problems. Obtaining the best performance from DTW requires setting its only parameter, the maximum amount of warping (w). In the supervised case with ample data, w is typically set by cross-validation in the training stage. However, this method is likely to yield suboptimal results for small training sets. For the unsupervised case, learning via cross-validation is not possible because we do not have access to labeled data. Many practitioners have thus resorted to assuming that “the larger the better”, and they use the largest value of w permitted by the computational resources. However, as we will show, in most circumstances, this is a naïve approach that produces inferior clusterings. Moreover, the best warping window width is generally non-transferable between the two tasks, i.e., for a single dataset, practitioners cannot simply apply the best w learned for classification on clustering or vice versa. In addition, we will demonstrate that the appropriate amount of warping not only depends on the data structure, but also on the dataset size. Thus, even if a practitioner knows the best setting for a given dataset, they will likely be at a lost if they apply that setting on a bigger size version of that data. All these issues seem largely unknown or at least unappreciated in the community. In this work, we demonstrate the importance of setting DTW’s warping window width correctly, and we also propose novel methods to learn this parameter in both supervised and unsupervised settings. The algorithms we propose to learn w can produce significant improvements in classification accuracy and clustering quality. We demonstrate the correctness of our novel observations and the utility of our ideas by testing them with more than one hundred publicly available datasets. Our forceful results allow us to make a perhaps unexpected claim; an underappreciated “low hanging fruit” in optimizing DTW’s performance can produce improvements that make it an even stronger baseline, closing most or all the improvement gap of the more sophisticated methods proposed in recent years

    Music classification by transductive learning using bipartite heterogeneous networks

    Get PDF
    The popularization of music distribution in electronic format has increased the amount of music with incomplete metadata. The incompleteness of data can hamper some important tasks, such as music and artist recommendation. In this scenario, transductive classification can be used to classify the whole dataset considering just few labeled instances. Usually transductive classification is performed through label propagation, in which data are represented as networks and the examples propagate their labels through\ud their connections. Similarity-based networks are usually applied to model data as network. However, this kind of representation requires the definition of parameters, which significantly affect the classification accuracy, and presentes a high cost due to the computation of similarities among all dataset instances. In contrast, bipartite heterogeneous networks have appeared as an alternative to similarity-based networks in text mining applications. In these networks, the words are connected to the documents which they occur. Thus, there is no parameter or additional costs to generate such networks. In this paper, we propose the use of the bipartite network representation to perform transductive classification of music, using a bag-of-frames approach to describe music signals. We demonstrate that the proposed approach outperforms other music classification approaches when few labeled instances are available.Sao Paulo Research Foundation (FAPESP) (grants 2011/12823-6, 2012/50714-7, 2013/26151-5, and 2014/08996-0

    Marketing, qualidade e inovação.

    Get PDF
    Texto publicado originalmente no jornal Açoriano Oriental, secção "Bits & Bytes", de 18 de Novembro de 2006."[…]. Marketing é, segundo a nova definição da associação americana (American MarketingAssociation), um conjunto de actividades que têm por objectivo compreender as necessidades do consumidor, cliente ou utente e de empreender esforços para satisfazer essas necessidades, da melhor forma possível, ou seja, de forma a aumentar o valor dos produtos e serviços para os clientes, passando por melhorar o grau de satisfação, a fidelização e a satisfação global destes. O marketing surge assim como um conjunto de actividades absolutamente essenciais nas organizações modernas orientadas para o cliente e não para o produto ou serviço que vendem. O marketing é o ponto final da cadeia logística, sendo responsável por objectivos definidos em termos de volume de vendas ou da relação prolongada no tempo com o cliente. […]"

    Music shapelets for fast cover song regognition

    Get PDF
    A cover song is a new performance or recording of a previously recorded music by an artist other than the original one. The automatic identification of cover songs is useful for a wide range of tasks, from fans looking for new versions of their favorite songs to organizations involved in licensing copyrighted songs. This is a difficult task given that a cover may differ from the original song in key, timbre, tempo, structure, arrangement and even language of the vocals. Cover song identification has attracted some attention recently. However, most of the state-of-the-art approaches are based on similarity search, which involves a large number of similarity computations to retrieve potential cover versions for a query recording. In this paper, we adapt the idea of time series shapelets for contentbased music retrieval. Our proposal adds a training phase that finds small excerpts of feature vectors that best describe each song. We demonstrate that we can use such small segments to identify cover songs with higher identification rates and more than one order of magnitude faster than methods that use features to describe the whole music.FAPESP (grants #2011/17698-5, #2013/26151-5, and 2015/07628-0)CNPq (grants 446330/2014-0 and 303083/2013-1

    Signal classification by similarity and feature extraction allows an important application in insect recognition

    Get PDF
    Insects have a strong relationship with the humanity, in both positive and negative ways. It is estimated that insects, particularly bees, pollinate at least twothirds of all food consumed in the world. In contrast, mosquito borne diseases kill millions of people every year. Due to such a complex relationship, insect control attempts must be carefully planned. Otherwise, there is the risk of eliminating beneficial species, such as the recent threat of bee extinction. We are developing a\ud novel sensor as a tool to control disease vectors and agricultural pests. This sensor captures insect flight information using laser light and classify the insects according to their species. Therefore, the sensor will provide real-time population estimates of species. Such information is the key to enable effective alarming systems for outbreaks, the intelligent use of insect\ud control techniques, such as insecticides, and will be the heart of the next generation of insect traps that will capture only species of interest. In this paper, we demonstrate how we overtook the most importante challenge to make this sensor practical: the creation of accurate classification systems. The sensor generates\ud a very brief signal as result of the instant that the insect crosses the laser. Such events last for tenths of a second and have a very simple structure, consequence of the wings movements. Nevertheless, we managed to successfully identify relevant features using speech and audio analysis techniques. Even with the described challenges, we show that we can achieve an accuracy of 98% in the task of disease vector mosquitoes identification.São Paulo Research Foundation (FAPESP) (Grants #2011/04054-2 and #2012/50714-7

    Noninvasive Self-monitoring of Blood Glucose at Your Fingertips, Literally!: Smartphone-Based Photoplethysmography

    Get PDF
    Diabetes is a chronic disease and one of the major public health problems worldwide. It is a multifactorial disease, caused by genetic factors and lifestyle habits. Brazil had ∼ 16.8 million individuals living with diabetes in 2019 and is expected to reach 26 million people by 2045. There are global increasing needs for the development of noninvasive diagnostic methods and use of mobile health, mainly in face of the pandemic caused by the coronavirus disease 2019 (COVID-19). For daily glycemic control, diabetic patients use a portable glucometer for glycemic self-monitoring and need to prick their fingertips three or more times a day, generating a huge discomfort throughout their lives. Our goal here is to present a review with very recent emerging studies in the field of noninvasive diagnosis and to emphasize that smartphone-based photoplethysmography (spPPG), powered by artificial intelligence, might be a trend to self-monitor blood glucose levels. In photoplethysmography, a light source travels through the tissue, interacts with the interstitium and with cells and molecules present in the blood. Reflection of light occurs as it passes through the biological tissues and a photodetector can capture these interactions. When using a smartphone, the built-in flashlight is a white light-emitting LED and the camera works as a photodetector. The higher the concentration of circulating glucose, the greater the absorbance and, consequently, the lesser the reflected light intensity will be. Due to these optical phenomena, the signal intensity captured will be inversely proportional to the blood glucose level. Furthermore, we highlight the microvascular changes in the progression of diabetes that can interfere in the signals captured by the photodetector using spPPG, due to the decrease of peripheral blood perfusion, which can be confused with high blood glucose levels. It is necessary to create strategies to filter or reduce the impact of these vascular changes in the blood glucose level analysis. Deep learning strategies can help the machine to solve these challenges, allowing an accurate blood glucose level and interstitial glucose prediction

    Time series classification with representation ensembles

    Get PDF
    Time series has attracted much attention in recent years, with thousands of methods for diverse tasks such as classification, clustering, prediction, and anomaly detection. Among all these tasks, classification is likely the most prominent task, accounting for most of the applications and attention from the research community. However, in spite of the huge number of methods available, there is a significant body of empirical evidence indicating that the 1-nearest neighbor algorithm (1-NN) in the time domain is “extremely difficult to beat”. In this paper, we evaluate the use of different data representations in time series classification. Our work is motivated by methods used in related areas such as signal processing and music retrieval. In these areas, a change of representation frequently reveals features that are not apparent in the original data representation. Our approach consists of using different representations such as frequency, wavelets, and autocorrelation to transform the time series into alternative decision spaces. A classifier is then used to provide a classification for each test time series in the alternative domain. We investigate how features provided in different domains can help in time series classification. We also experiment with different ensembles to investigate if the data representations are a good source of diversity for time series classification. Our extensive experimental evaluation approaches the issue of combining sets of representations and ensemble strategies, resulting in over 300 ensemble configurations.São Paulo Research Foundation (FAPESP) (grant #2012/08923-8, #2013/26151-5, and #2015/07628-0)CNPq (grant #446330/2014-0 and #303083/2013-1)International Symposium on Advances in Intelligent Data Analysis - IDA (14. 2015 Saint Etienne

    Adding diversity to rank examples in anytime nearest neighbor classification

    Get PDF
    In the last decade we have witnessed a huge increase of interest in data stream learning algorithms. A stream is na ordered sequence of data records. It is characterized by properties such as the potentially infinite and rapid flow of instances. However, a property that is common to various application domains and is frequently disregarded is the very high fluctuating data rates. In domains with fluctuating data rates, the events do not occur with a fixed frequency. This imposes an additional\ud challenge for the classifiers since the next event can occur at any time after the previous one. Anytime classification provides a very convenient approach for fluctuating data rates. In summary, an anytime classifier can be interrupted at any time before its completion and still be able to provide an intermediate solution. The popular k-nearest neighbor (k-NN) classifier can be easily made anytime by introducing a ranking of the training examples. A classification is achieved by scanning the training examples according to this ranking. In this paper, we show how the\ud current state-of-the-art k-NN anytime classifier can be made more accurate by introducing diversity in the training set ranking. Our results show that, with this simple modification, the performance of the anytime version of the k-NN algorithm is consistently improved for a large number of datasets
    • …