2,513 research outputs found

    Continuous Nearest Neighbor Queries over Sliding Windows

    Get PDF
    Abstract—This paper studies continuous monitoring of nearest neighbor (NN) queries over sliding window streams. According to this model, data points continuously stream in the system, and they are considered valid only while they belong to a sliding window that contains 1) the W most recent arrivals (count-based) or 2) the arrivals within a fixed interval W covering the most recent time stamps (time-based). The task of the query processor is to constantly maintain the result of long-running NN queries among the valid data. We present two processing techniques that apply to both count-based and time-based windows. The first one adapts conceptual partitioning, the best existing method for continuous NN monitoring over update streams, to the sliding window model. The second technique reduces the problem to skyline maintenance in the distance-time space and precomputes the future changes in the NN set. We analyze the performance of both algorithms and extend them to variations of NN search. Finally, we compare their efficiency through a comprehensive experimental evaluation. The skyline-based algorithm achieves lower CPU cost, at the expense of slightly larger space overhead. Index Terms—Location-dependent and sensitive, spatial databases, query processing, nearest neighbors, data streams, sliding windows.

    Enabling near-term prediction of status for intelligent transportation systems: Management techniques for data on mobile objects

    Get PDF
    Location Dependent Queries (LDQs) benefit from the rapid advances in communication and Global Positioning System (GPS) technologies to track moving objects\u27 locations, and improve the quality-of-life by providing location relevant services and information to end users. The enormity of the underlying data maintained by LDQ applications - a large quantity of mobile objects and their frequent mobility - is, however, a major obstacle in providing effective and efficient services. Motivated by this obstacle, this thesis sets out in the quest to find improved methods to efficiently index, access, retrieve, and update volatile LDQ related mobile object data and information. Challenges and research issues are discussed in detail, and solutions are presented and examined. --Abstract, page iii

    Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection

    Full text link
    In recent years, there have been many practical applications of anomaly detection such as in predictive maintenance, detection of credit fraud, network intrusion, and system failure. The goal of anomaly detection is to identify in the test data anomalous behaviors that are either rare or unseen in the training data. This is a common goal in predictive maintenance, which aims to forecast the imminent faults of an appliance given abundant samples of normal behaviors. Local outlier factor (LOF) is one of the state-of-the-art models used for anomaly detection, but the predictive performance of LOF depends greatly on the selection of hyperparameters. In this paper, we propose a novel, heuristic methodology to tune the hyperparameters in LOF. A tuned LOF model that uses the proposed method shows good predictive performance in both simulations and real data sets.Comment: 15 pages, 5 figure

    k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)

    Get PDF
    Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier -- classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report. Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN

    ILARS: An Improved Empirical Analysis for Lars* Using Partitioning and Travel Penalty

    Get PDF
    In this paper we develop an improved web based location-aware recommender software system, ILARS, that uses location-based ratings to provide proper advice and counseling. Present recommender systems don’t consider about spatial attributes of users and also of items; But, ILARS*considers major classes regarding location such as spatial scores rate for the non-spatial things, non-spatial score rate for the spatial things, and spatial score rate for the spatial things. ILARS* deals with recommendation points for accomplishing user ranking locations with help of user partitioning methods, which that are spatially near querying users in an effective way that maximizes system computability by not reducing the systems quality. A style that supports recommendation successors nearer in travel distance to querying users is used by ILARS* to exploits item locations using travel penalty. For avoiding thorough access to any or all spatial things. ILARS* will apply these art singly, or based on the rating that is obtained. The experimental results show information from various location based social networks. Various social network tells that LARS* is magnified , most expanded ,inexpensive ,reasonable ,capable of showing recommendations which are accurate as compared to existing recommendation software systems. DOI: 10.17762/ijritcc2321-8169.15073

    Continuous Spatial Query Processing:A Survey of Safe Region Based Techniques

    Get PDF
    In the past decade, positioning system-enabled devices such as smartphones have become most prevalent. This functionality brings the increasing popularity of location-based services in business as well as daily applications such as navigation, targeted advertising, and location-based social networking. Continuous spatial queries serve as a building block for location-based services. As an example, an Uber driver may want to be kept aware of the nearest customers or service stations. Continuous spatial queries require updates to the query result as the query or data objects are moving. This poses challenges to the query efficiency, which is crucial to the user experience of a service. A large number of approaches address this efficiency issue using the concept of safe region . A safe region is a region within which arbitrary movement of an object leaves the query result unchanged. Such a region helps reduce the frequency of query result update and hence improves query efficiency. As a result, safe region-based approaches have been popular for processing various types of continuous spatial queries. Safe regions have interesting theoretical properties and are worth in-depth analysis. We provide a comparative study of safe region-based approaches. We describe how safe regions are computed for different types of continuous spatial queries, showing how they improve query efficiency. We compare the different safe region-based approaches and discuss possible further improvements

    Scalable aggregation predictive analytics: a query-driven machine learning approach

    Get PDF
    We introduce a predictive modeling solution that provides high quality predictive analytics over aggregation queries in Big Data environments. Our predictive methodology is generally applicable in environments in which large-scale data owners may or may not restrict access to their data and allow only aggregation operators like COUNT to be executed over their data. In this context, our methodology is based on historical queries and their answers to accurately predict ad-hoc queries’ answers. We focus on the widely used set-cardinality, i.e., COUNT, aggregation query, as COUNT is a fundamental operator for both internal data system optimizations and for aggregation-oriented data exploration and predictive analytics. We contribute a novel, query-driven Machine Learning (ML) model whose goals are to: (i) learn the query-answer space from past issued queries, (ii) associate the query space with local linear regression & associative function estimators, (iii) define query similarity, and (iv) predict the cardinality of the answer set of unseen incoming queries, referred to the Set Cardinality Prediction (SCP) problem. Our ML model incorporates incremental ML algorithms for ensuring high quality prediction results. The significance of contribution lies in that it (i) is the only query-driven solution applicable over general Big Data environments, which include restricted-access data, (ii) offers incremental learning adjusted for arriving ad-hoc queries, which is well suited for query-driven data exploration, and (iii) offers a performance (in terms of scalability, SCP accuracy, processing time, and memory requirements) that is superior to data-centric approaches. We provide a comprehensive performance evaluation of our model evaluating its sensitivity, scalability and efficiency for quality predictive analytics. In addition, we report on the development and incorporation of our ML model in Spark showing its superior performance compared to the Spark’s COUNT method

    Predictive maintenance of electrical grid assets: internship at EDP Distribuição - Energia S.A

    Get PDF
    Internship Report presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceThis report will describe the activities developed during an internship at EDP Distribuição, focusing on a Predictive Maintenance analytics project directed at high voltage electrical grid assets including Overhead Lines, Power Transformers and Circuit Breakers. The project’s main goal is to support EDP’s asset management processes by improving maintenance and investing planning. The project’s main deliverables are the Probability of Failure metric that forecast asset failures 15 days ahead of time, estimated through supervised machine learning models; the Health Index metric that indicates asset’s current state and condition, implemented though the Ofgem methodology; and two asset management dashboards. The project was implemented by an external service provider, a consultant company, and during the internship it was possible to integrate the team, and participate in the development activities

    Case-based Maintenance Model for handling Relevant and Irrelevant Cases in Case-based Reasoning System

    Get PDF
    Case-based maintenance can be resource intensive and requires significant time and effort to collect and analyse all cases. This can lead to inefficiencies and high costs in the entire case-based reasoning system. Accordingly, the Relative Coverage Condensed Nearest Neighbour had been created to reduce the number of cases in a dataset by selecting a subset of representative cases, whereas maintaining the overall performance of the whole system. Besides, Footprint utility deletion is a type of case deletion algorithm that can remove redundant or irrelevant cases from a storage, though maintaining the system’s competency. Recently, Hybrid approach was given to ensure that the case-base remains up-to-date and relevant, while also reducing its size and complexity. However, the results from using these approaches seem to be improved for the better performance. Therefore, the proposed model is developed, which comprises two main phrases by using case-based reasoning and identifying relevant and irrelevant cases to provide better results. The reduction size of case-base is lower than the traditional studies approximately 1-9% and also gives higher percentage of solving problems about 1-7%, while the average problem-solving time is shorter than them nearly at most 8 times. &nbsp
    • …
    corecore