9,966 research outputs found

    ADBSCAN: Adaptive Density-Based Spatial Clustering of Applications with Noise for Identifying Clusters with Varying Densities

    Full text link
    Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm which has the high-performance rate for dataset where clusters have the constant density of data points. One of the significant attributes of this algorithm is noise cancellation. However, DBSCAN demonstrates reduced performances for clusters with different densities. Therefore, in this paper, an adaptive DBSCAN is proposed which can work significantly well for identifying clusters with varying densities.Comment: To be published in the 4th IEEE International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT 2018

    Implications of Z-normalization in the matrix profile

    Get PDF
    Companies are increasingly measuring their products and services, resulting in a rising amount of available time series data, making techniques to extract usable information needed. One state-of-the-art technique for time series is the Matrix Profile, which has been used for various applications including motif/discord discovery, visualizations and semantic segmentation. Internally, the Matrix Profile utilizes the z-normalized Euclidean distance to compare the shape of subsequences between two series. However, when comparing subsequences that are relatively flat and contain noise, the resulting distance is high despite the visual similarity of these subsequences. This property violates some of the assumptions made by Matrix Profile based techniques, resulting in worse performance when series contain flat and noisy subsequences. By studying the properties of the z-normalized Euclidean distance, we derived a method to eliminate this effect requiring only an estimate of the standard deviation of the noise. In this paper we describe various practical properties of the z-normalized Euclidean distance and show how these can be used to correct the performance of Matrix Profile related techniques. We demonstrate our techniques using anomaly detection using a Yahoo! Webscope anomaly dataset, semantic segmentation on the PAMAP2 activity dataset and for data visualization on a UCI activity dataset, all containing real-world data, and obtain overall better results after applying our technique. Our technique is a straightforward extension of the distance calculation in the Matrix Profile and will benefit any derived technique dealing with time series containing flat and noisy subsequences

    EC-CENTRIC: An Energy- and Context-Centric Perspective on IoT Systems and Protocol Design

    Get PDF
    The radio transceiver of an IoT device is often where most of the energy is consumed. For this reason, most research so far has focused on low power circuit and energy efficient physical layer designs, with the goal of reducing the average energy per information bit required for communication. While these efforts are valuable per se, their actual effectiveness can be partially neutralized by ill-designed network, processing and resource management solutions, which can become a primary factor of performance degradation, in terms of throughput, responsiveness and energy efficiency. The objective of this paper is to describe an energy-centric and context-aware optimization framework that accounts for the energy impact of the fundamental functionalities of an IoT system and that proceeds along three main technical thrusts: 1) balancing signal-dependent processing techniques (compression and feature extraction) and communication tasks; 2) jointly designing channel access and routing protocols to maximize the network lifetime; 3) providing self-adaptability to different operating conditions through the adoption of suitable learning architectures and of flexible/reconfigurable algorithms and protocols. After discussing this framework, we present some preliminary results that validate the effectiveness of our proposed line of action, and show how the use of adaptive signal processing and channel access techniques allows an IoT network to dynamically tune lifetime for signal distortion, according to the requirements dictated by the application

    Mining Novel Multivariate Relationships in Time Series Data Using Correlation Networks

    Full text link
    In many domains, there is significant interest in capturing novel relationships between time series that represent activities recorded at different nodes of a highly complex system. In this paper, we introduce multipoles, a novel class of linear relationships between more than two time series. A multipole is a set of time series that have strong linear dependence among themselves, with the requirement that each time series makes a significant contribution to the linear dependence. We demonstrate that most interesting multipoles can be identified as cliques of negative correlations in a correlation network. Such cliques are typically rare in a real-world correlation network, which allows us to find almost all multipoles efficiently using a clique-enumeration approach. Using our proposed framework, we demonstrate the utility of multipoles in discovering new physical phenomena in two scientific domains: climate science and neuroscience. In particular, we discovered several multipole relationships that are reproducible in multiple other independent datasets and lead to novel domain insights.Comment: This is the accepted version of article submitted to IEEE Transactions on Knowledge and Data Engineering 201

    Automated Process Discovery: A Literature Review and a Comparative Evaluation with Domain Experts

    Get PDF
    Äriprotsesside kaeve meetodi võimaldavad analüütikul kasutada logisid saamaks teadmisi protsessi tegeliku toimise kohta. Neist meetodist üks enim uuritud on automaatne äriprotsesside avastamine. Sündmuste logi võetakse kui sisend automaatse äriprotsesside avastamise meetodi poolt ning väljundina toodetakse äriprotsessi mudel, mis kujutab logis talletatud sündmuste kontrollvoogu. Viimase kahe kümnendi jooksul on väljapakutud mitmeidki automaatseid äriprotsessi avastamise meetodeid balansseerides erinevalt toodetavate mudelite skaleeruvuse, täpsuse ning keerukuse vahel. Siiani on automaatsed äriprotsesside avastamise meetodid testitud ad-hoc kombel, kus erinevad autorid kasutavad erinevaid andmestike, seadistusi, hindamismeetrikuid ning alustõdesid, mis viib tihti võrdlematute tulemusteni ning mõnikord ka mittetaastoodetavate tulemusteni suletud andmestike kasutamise tõttu. Eelpool toodu mõistes sooritatakse antud magistritöö raames süstemaatiline kirjanduse ülevaade automaatsete äriprotsesside avastamise meetoditest ja ka süstemaatiline hindav võrdlus üle nelja kvaliteedimeetriku olemasolevate automaatsete äriprotsesside avastamise meetodite kohta koostöös domeeniekspertidega ning kasutades reaalset logi rahvusvahelisest tarkvara firmast. Kirjanduse ülevaate ning hindamise tulemused tõstavad esile puudujääke ning seni uurimata kompromisse mudelite loomiseks nelja kvaliteedimeetriku kontekstis. Antud magistritöö tulemused võimaldavad teaduritel parandada puudujäägid meetodites. Samuti vastatakse küsimusele automaatsete äriprotsesside avastamise meetodite kasutamise kohta väljaspool akadeemilist maailma.Process mining methods allow analysts to use logs of historical executions of business processes in order to gain knowledge about the actual performance of these processes.One of the most widely studied process mining operations is automated process discovery.An event log is taken as input by an automated process discovery method and produces a business process model as output that captures the control-flow relations between tasks that are described by the event log.Several automated process discovery methods have been proposed in the past two decades, striking different tradeoffs between scalability, accuracy and complexity of the resulting models.So far, automated process discovery methods have been evaluated in an ad hoc manner, with different authors employing different datasets, experimental setups, evaluation measures and baselines, often leading to incomparable conclusions and sometimes unreproducible results due to the use of non-publicly available datasets.In this setting, this thesis provides a systematic review of automated process discovery methods and a systematic comparative evaluation of existing implementations of these methods with domain experts by using a real-life event log extracted from a international software engineering company and four quality metrics.The review and evaluation results highlight gaps and unexplored tradeoffs in the field in the context of four business process model quality metrics.The results of this master thesis allows researchers to improve the lacks in the automated process discovery methods and also answers question about the usability of process discovery techniques in industry

    Big data in hotel revenue management: exploring cancellation drivers to gain insights into booking cancellation behavior

    Get PDF
    n the hospitality industry, demand forecast accuracy is highly impacted by booking cancellations, which makes demand-management decisions difficult and risky. In attempting to minimize losses, hotels tend to implement restrictive cancellation policies and employ overbooking tactics, which, in turn, reduce the number of bookings and reduce revenue. To tackle the uncertainty arising from booking cancellations, we combined the data from eight hotels’ property management systems with data from several sources (weather, holidays, events, social reputation, and online prices/inventory) and machine learning interpretable algorithms to develop booking cancellation prediction models for the hotels. In a real production environment, improvement of the forecast accuracy due to the use of these models could enable hoteliers to decrease the number of cancellations, thus, increasing confidence in demand-management decisions. Moreover, this work shows that improvement of the demand forecast would allow hoteliers to better understand their net demand, that is, current demand minus predicted cancellations. Simultaneously, by focusing not only on forecast accuracy but also on its explicability, this work illustrates one other advantage of the application of these types of techniques in forecasting: the interpretation of the predictions of the model. By exposing cancellation drivers, models help hoteliers to better understand booking cancellation patterns and enable the adjustment of a hotel’s cancellation policies and overbooking tactics according to the characteristics of its bookings.info:eu-repo/semantics/acceptedVersio

    Supervised classification and mathematical optimization

    Get PDF
    Data Mining techniques often ask for the resolution of optimization problems. Supervised Classification, and, in particular, Support Vector Machines, can be seen as a paradigmatic instance. In this paper, some links between Mathematical Optimization methods and Supervised Classification are emphasized. It is shown that many different areas of Mathematical Optimization play a central role in off-the-shelf Supervised Classification methods. Moreover, Mathematical Optimization turns out to be extremely useful to address important issues in Classification, such as identifying relevant variables, improving the interpretability of classifiers or dealing with vagueness/noise in the data.Ministerio de Ciencia e InnovaciónJunta de Andalucí
    corecore