1,399 research outputs found

    On the Differential Privacy of Bayesian Inference

    Get PDF
    We study how to communicate findings of Bayesian inference to third parties, while preserving the strong guarantee of differential privacy. Our main contributions are four different algorithms for private Bayesian inference on proba-bilistic graphical models. These include two mechanisms for adding noise to the Bayesian updates, either directly to the posterior parameters, or to their Fourier transform so as to preserve update consistency. We also utilise a recently introduced posterior sampling mechanism, for which we prove bounds for the specific but general case of discrete Bayesian networks; and we introduce a maximum-a-posteriori private mechanism. Our analysis includes utility and privacy bounds, with a novel focus on the influence of graph structure on privacy. Worked examples and experiments with Bayesian na{\"i}ve Bayes and Bayesian linear regression illustrate the application of our mechanisms.Comment: AAAI 2016, Feb 2016, Phoenix, Arizona, United State

    A lossless online Bayesian classifier.

    Get PDF
    We are living in a world progressively driven by data. Besides the issue that big data cannot be entirely stored in the main memory as required by traditional offline learning methods, the problem of learning data that can only be collected over time is also very prevalent. Consequently, there is a need of online methods which can handle sequentially arriving data and offer the same accuracy as offline methods. In this paper, we introduce a new lossless online Bayesian-based classifier which uses the arriving data in a 1-by-1 manner and discards each data right after use. The lossless property of our proposed method guarantees that it can reach the same prediction performance as its offline counterpart regardless of the incremental training order. Experimental results demonstrate its superior performance over many well-known state-of-the-art online learning methods in the literature

    Assesing Completeness of Solvency and Financial Condition Reports through the use of Machine Learning and Text Classification

    Get PDF
    Text mining is a method for extracting useful information from unstructured data through the identification and exploration of large amounts of text. It is a valuable support tool for organisations. It enables a greater understanding and identification of relevant business insights from text. Critically it identifies connections between information within texts that would otherwise go unnoticed. Its application is prevalent in areas such as marketing and political science however, until recently it has been largely overlooked within economics. Central banks are beginning to investigate the benefits of machine learning, sentiment analysis and natural language processing in light of the large amount of unstructured data available to them. This includes news articles, financial contracts, social media, supervisory and market intelligence and regulatory reports. In this research paper a dataset consisting of regulatory required Solvency and Financial Condition Reports (SFCR) is analysed to determine if machine learning and text classification can assist assessing the completeness of SFCRs. The completeness is determined by whether or not the document adheres to nine European guidelines. Natural language processing and supervised machine learning techniques are implemented to classify pages of the report as belonging to one of the guidelines

    System (for) Tracking Equilibrium and Determining Incline (STEADI)

    Get PDF
    The goal of this project was to design and implement a smartphone-based wearable system to detect fall events in real time. It has the acronym STEADI. Rather than have expensive customised hardware STEADI was implemented in a cost effective manner using a generic mobile computing device. In order to detect the fall event, we propose a fall detector that uses the accelerometer available in a mobile phone. As for detecting a fall we mainly divide the system in two sections, the signal processing and classification. For the processing both a median filter and a high pass filter are used. A Median filter is used to amplify/enhance the signal by removing impulsive noise while preserving the signal shape while the High pass filter is used to emphasise transitions in the signal. Then, in order to recognize a fall event, our STEADI system implements two methods that are a simple threshold analysis to determine whether or not a fall has occurred (threshold-based) and a more sophisticated Naïve-Bayes classification method to differentiate falling from other mobile activities. Our experimental results show that by applying the signal processing and Naïve-Bayes classification together increases the accuracy by more than 20% compared with using the threshold-based method alone. The Naïve-Bayes achieved a detection accuracy of 95% in overall. Furthermore, an external sensor is introduced in order to enhance its accuracy. In addition to the fall detection, the systems can also provide location information using Google Maps as to the whereabouts of the fall event using the available GPS on the smartphone and sends the message to the caretaker via an SMS

    System (for) Tracking Equilibrium and Determining Incline (STEADI)

    Get PDF
    The goal of this project was to design and implement a smartphone-based wearable system to detect fall events in real time. It has the acronym STEADI. Rather than have expensive customised hardware STEADI was implemented in a cost effective manner using a generic mobile computing device. In order to detect the fall event, we propose a fall detector that uses the accelerometer available in a mobile phone. As for detecting a fall we mainly divide the system in two sections, the signal processing and classification. For the processing both a median filter and a high pass filter are used. A Median filter is used to amplify/enhance the signal by removing impulsive noise while preserving the signal shape while the High pass filter is used to emphasise transitions in the signal. Then, in order to recognize a fall event, our STEADI system implements two methods that are a simple threshold analysis to determine whether or not a fall has occurred (threshold-based) and a more sophisticated Naïve-Bayes classification method to differentiate falling from other mobile activities. Our experimental results show that by applying the signal processing and Naïve-Bayes classification together increases the accuracy by more than 20% compared with using the threshold-based method alone. The Naïve-Bayes achieved a detection accuracy of 95% in overall. Furthermore, an external sensor is introduced in order to enhance its accuracy. In addition to the fall detection, the systems can also provide location information using Google Maps as to the whereabouts of the fall event using the available GPS on the smartphone and sends the message to the caretaker via an SMS

    Semi-Supervised Learning For Identifying Opinions In Web Content

    Get PDF
    Thesis (Ph.D.) - Indiana University, Information Science, 2011Opinions published on the World Wide Web (Web) offer opportunities for detecting personal attitudes regarding topics, products, and services. The opinion detection literature indicates that both a large body of opinions and a wide variety of opinion features are essential for capturing subtle opinion information. Although a large amount of opinion-labeled data is preferable for opinion detection systems, opinion-labeled data is often limited, especially at sub-document levels, and manual annotation is tedious, expensive and error-prone. This shortage of opinion-labeled data is less challenging in some domains (e.g., movie reviews) than in others (e.g., blog posts). While a simple method for improving accuracy in challenging domains is to borrow opinion-labeled data from a non-target data domain, this approach often fails because of the domain transfer problem: Opinion detection strategies designed for one data domain generally do not perform well in another domain. However, while it is difficult to obtain opinion-labeled data, unlabeled user-generated opinion data are readily available. Semi-supervised learning (SSL) requires only limited labeled data to automatically label unlabeled data and has achieved promising results in various natural language processing (NLP) tasks, including traditional topic classification; but SSL has been applied in only a few opinion detection studies. This study investigates application of four different SSL algorithms in three types of Web content: edited news articles, semi-structured movie reviews, and the informal and unstructured content of the blogosphere. SSL algorithms are also evaluated for their effectiveness in sparse data situations and domain adaptation. Research findings suggest that, when there is limited labeled data, SSL is a promising approach for opinion detection in Web content. Although the contributions of SSL varied across data domains, significant improvement was demonstrated for the most challenging data domain--the blogosphere--when a domain transfer-based SSL strategy was implemented

    Improving Floating Search Feature Selection using Genetic Algorithm

    Get PDF
    Classification, a process for predicting the class of a given input data, is one of the most fundamental tasks in data mining. Classification performance is negatively affected by noisy data and therefore selecting features relevant to the problem is a critical step in classification, especially when applied to large datasets. In this article, a novel filter-based floating search technique for feature selection to select an optimal set of features for classification purposes is proposed. A genetic algorithm is employed to improve the quality of the features selected by the floating search method in each iteration. A criterion function is applied to select relevant and high-quality features that can improve classification accuracy. The proposed method was evaluated using 20 standard machine learning datasets of various size and complexity. The results show that the proposed method is effective in general across different classifiers and performs well in comparison with recently reported techniques. In addition, the application of the proposed method with support vector machine provides the best performance among the classifiers studied and outperformed previous researches with the majority of data sets
    • …
    corecore