2,837 research outputs found

    Query-driven learning for predictive analytics of data subspace cardinality

    Get PDF
    Fundamental to many predictive analytics tasks is the ability to estimate the cardinality (number of data items) of multi-dimensional data subspaces, defined by query selections over datasets. This is crucial for data analysts dealing with, e.g., interactive data subspace explorations, data subspace visualizations, and in query processing optimization. However, in many modern data systems, predictive analytics may be (i) too costly money-wise, e.g., in clouds, (ii) unreliable, e.g., in modern Big Data query engines, where accurate statistics are difficult to obtain/maintain, or (iii) infeasible, e.g., for privacy issues. We contribute a novel, query-driven, function estimation model of analyst-defined data subspace cardinality. The proposed estimation model is highly accurate in terms of prediction and accommodating the well-known selection queries: multi-dimensional range and distance-nearest neighbors (radius) queries. Our function estimation model: (i) quantizes the vectorial query space, by learning the analysts’ access patterns over a data space, (ii) associates query vectors with their corresponding cardinalities of the analyst-defined data subspaces, (iii) abstracts and employs query vectorial similarity to predict the cardinality of an unseen/unexplored data subspace, and (iv) identifies and adapts to possible changes of the query subspaces based on the theory of optimal stopping. The proposed model is decentralized, facilitating the scaling-out of such predictive analytics queries. The research significance of the model lies in that (i) it is an attractive solution when data-driven statistical techniques are undesirable or infeasible, (ii) it offers a scale-out, decentralized training solution, (iii) it is applicable to different selection query types, and (iv) it offers a performance that is superior to that of data-driven approaches

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

    Predictive intelligence to the edge through approximate collaborative context reasoning

    Get PDF
    We focus on Internet of Things (IoT) environments where a network of sensing and computing devices are responsible to locally process contextual data, reason and collaboratively infer the appearance of a specific phenomenon (event). Pushing processing and knowledge inference to the edge of the IoT network allows the complexity of the event reasoning process to be distributed into many manageable pieces and to be physically located at the source of the contextual information. This enables a huge amount of rich data streams to be processed in real time that would be prohibitively complex and costly to deliver on a traditional centralized Cloud system. We propose a lightweight, energy-efficient, distributed, adaptive, multiple-context perspective event reasoning model under uncertainty on each IoT device (sensor/actuator). Each device senses and processes context data and infers events based on different local context perspectives: (i) expert knowledge on event representation, (ii) outliers inference, and (iii) deviation from locally predicted context. Such novel approximate reasoning paradigm is achieved through a contextualized, collaborative belief-driven clustering process, where clusters of devices are formed according to their belief on the presence of events. Our distributed and federated intelligence model efficiently identifies any localized abnormality on the contextual data in light of event reasoning through aggregating local degrees of belief, updates, and adjusts its knowledge to contextual data outliers and novelty detection. We provide comprehensive experimental and comparison assessment of our model over real contextual data with other localized and centralized event detection models and show the benefits stemmed from its adoption by achieving up to three orders of magnitude less energy consumption and high quality of inference

    Fault Diagnosis of HVDC Systems Using Machine Learning Based Methods

    Get PDF
    With the development of high-power electronic technology, HVDC system is applied in the power system because of advantages in large-capacity and long-distance transmission, stability, and flexibility. Therefore, as the guarantee of reliable operating of HVDC system, fault diagnosis of the HVDC system is of great significance. In the current variety methods used in fault diagnosis, Machine Learning based methods have become a hotspot. To this end, the performance of several commonly used machine learning classifiers is compared in HVDC system. First of all, nine faults both in AC systems and DC systems of the HVDC system are set in the HVDC model in Simulink. Therefore, 10 operating states corresponding to the faults and normal operating are considered as the output classes of classifier. Seven parameters, such as DC voltage and DC current, are selected as fault feature parameters of each sample. By simulating the HVDC system in 10 operating states (including normal operating state) correspondingly, 20000 samples, each containing seven parameters, be obtained during the fault period. Then, the training sample set and the test sample set are established by 80% and 20% of the whole sample set. Subsequently, Decision Trees, the Support Vector Machine (SVM), K-Nearest Neighborhood Classifier (KNN), Ensemble classifiers, Discriminant Analysis, Backward Propagation Neural Network (BP-NN), long Short-Term Memory Neural Network (LSTM-NN), Extreme Learning Machine (ELM) was trained and tested. The accuracy of testing is used as the performance index of the model. In particular, for BP-NN, the impact of different transfer functions and learning rules combinations on the accuracy of the model was tested. For ELM, the impact of different activation functions on accuracy is tested. The results have shown that ELM and Bagged Trees have the best performance in HVDC fault diagnosis. The accuracy of these two methods are 92.23% and 96.5% respectively. However, in order to achieve better accuracy in ELM model, a large number of hidden layer nodes are set so that training time increases sharply

    Fault Diagnosis of HVDC Systems Using Machine Learning Based Methods

    Get PDF
    With the development of high-power electronic technology, HVDC system is applied in the power system because of advantages in large-capacity and long-distance transmission, stability, and flexibility. Therefore, as the guarantee of reliable operating of HVDC system, fault diagnosis of the HVDC system is of great significance. In the current variety methods used in fault diagnosis, Machine Learning based methods have become a hotspot. To this end, the performance of several commonly used machine learning classifiers is compared in HVDC system. First of all, nine faults both in AC systems and DC systems of the HVDC system are set in the HVDC model in Simulink. Therefore, 10 operating states corresponding to the faults and normal operating are considered as the output classes of classifier. Seven parameters, such as DC voltage and DC current, are selected as fault feature parameters of each sample. By simulating the HVDC system in 10 operating states (including normal operating state) correspondingly, 20000 samples, each containing seven parameters, be obtained during the fault period. Then, the training sample set and the test sample set are established by 80% and 20% of the whole sample set. Subsequently, Decision Trees, the Support Vector Machine (SVM), K-Nearest Neighborhood Classifier (KNN), Ensemble classifiers, Discriminant Analysis, Backward Propagation Neural Network (BP-NN), long Short-Term Memory Neural Network (LSTM-NN), Extreme Learning Machine (ELM) was trained and tested. The accuracy of testing is used as the performance index of the model. In particular, for BP-NN, the impact of different transfer functions and learning rules combinations on the accuracy of the model was tested. For ELM, the impact of different activation functions on accuracy is tested. The results have shown that ELM and Bagged Trees have the best performance in HVDC fault diagnosis. The accuracy of these two methods are 92.23% and 96.5% respectively. However, in order to achieve better accuracy in ELM model, a large number of hidden layer nodes are set so that training time increases sharply
    • …
    corecore