19 research outputs found

    Using spatial outliers detection to assess balancing mechanisms in bike sharing systems

    Get PDF
    International audienceSpatial outliers are objects having a behavior significantly different from their spatial neighbors, in a context where neighbors are heavily correlated. Moran scatterplot is a well-known method that exploits similarity between neighbors in order to detect spatial outliers. In this paper, we proposed first an improved version of Moran scatterplot, using a robust distance metric called Gower's similarity. We used the new version of Moran scatterplot to study the homogeneity of the Parisian bike sharing system (Velib). We carried out different experiments on a real dataset issued from the Velib system. We identified many spatial outliers stations, very different from their neighboring stations (often with much more available bikes or with much more empty docks during the day). Then, we designed and tested a new method that globally improves the distribution of the resources (bikes and docks) among bike stations. This method is motivated by the existence of spatial outliers stations. It relies on a local small change in users behaviors, by adapting their trips to resources' availability around their departure and arrival stations. Results show that, even with a partial users collaboration, the proposed method enhances significantly the global homogeneity of the bike sharing system and therefore the users' satisfaction

    Anomaly Detection for Symbolic Representations

    Get PDF
    A fully autonomous agent recognizes new problems, explains what causes such problems, and generates its own goals to solve these problems. Our approach to this goal-driven model of autonomy uses a methodology called the Note-Assess-Guide procedure. It instantiates a monitoring process in which an agent notes an anomaly in the world, assesses the nature and cause of that anomaly, and guides appropriate modifications to behavior. This report describes a novel approach to the note phase of that procedure. A-distance, a sliding-window statistical distance metric, is applied to numerical vector representations of intermediate states from plans generated for two symbolic domains. Using these representations, the metric is able to detect anomalous world states caused by restricting the actions available to the planner

    Study of First Local Maximum of Confidence in Mining Sequential Patterns

    Get PDF
    Sequential data mining is increasingly important in many domains. WinMiner is a constraint-based algorithm to retrieve frequent episodes and association rules of high confidence and to search first local maximum (FLM) - rules. An algorithm for mining FLM rules from sequential dataset is implemented and is applied to several datasets of different origins. The experiments show that FLM rules are rare in randomly generated dataset and loosening the mining constraints leads to the increase of numbers of FLM rules. Correlations or dependencies among the constituent events introduced into the randomly generated dataset can dramatically increase numbers of FLM rules.Computer Science Departmen

    Centralized and distributed learning methods for predictive health analytics

    Get PDF
    The U.S. health care system is considered costly and highly inefficient, devoting substantial resources to the treatment of acute conditions in a hospital setting rather than focusing on prevention and keeping patients out of the hospital. The potential for cost savings is large; in the U.S. more than $30 billion are spent each year on hospitalizations deemed preventable, 31% of which is attributed to heart diseases and 20% to diabetes. Motivated by this, our work focuses on developing centralized and distributed learning methods to predict future heart- or diabetes- related hospitalizations based on patient Electronic Health Records (EHRs). We explore a variety of supervised classification methods and we present a novel likelihood ratio based method (K-LRT) that predicts hospitalizations and offers interpretability by identifying the K most significant features that lead to a positive prediction for each patient. Next, assuming that the positive class consists of multiple clusters (hospitalized patients due to different reasons), while the negative class is drawn from a single cluster (non-hospitalized patients healthy in every aspect), we present an alternating optimization approach, which jointly discovers the clusters in the positive class and optimizes the classifiers that separate each positive cluster from the negative samples. We establish the convergence of the method and characterize its VC dimension. Last, we develop a decentralized cluster Primal-Dual Splitting (cPDS) method for large-scale problems, that is computationally efficient and privacy-aware. Such a distributed learning scheme is relevant for multi-institutional collaborations or peer-to-peer applications, allowing the agents to collaborate, while keeping every participant's data private. cPDS is proved to have an improved convergence rate compared to existing centralized and decentralized methods. We test all methods on real EHR data from the Boston Medical Center and compare results in terms of prediction accuracy and interpretability
    corecore