5,412 research outputs found

    Query expansion with naive bayes for searching distributed collections

    Get PDF
    The proliferation of online information resources increases the importance of effective and efficient distributed searching. However, the problem of word mismatch seriously hurts the effectiveness of distributed information retrieval. Automatic query expansion has been suggested as a technique for dealing with the fundamental issue of word mismatch. In this paper, we propose a method - query expansion with Naive Bayes to address the problem, discuss its implementation in IISS system, and present experimental results demonstrating its effectiveness. Such technique not only enhances the discriminatory power of typical queries for choosing the right collections but also hence significantly improves retrieval results

    Naive Bayes Classification in The Question and Answering System

    Get PDF
    Abstract—Question and answering (QA) system is a system to answer question based on collections of unstructured text or in the form of human language. In general, QA system consists of four stages, i.e. question analysis, documents selection, passage retrieval and answer extraction. In this study we added two processes i.e. classifying documents and classifying passage. We use Naïve Bayes for classification, Dynamic Passage Partitioning for finding answer and Lucene for document selection. The experiment was done using 100 questions from 3000 documents related to the disease and the results were compared with a system that does not use the classification process. From the test results, the system works best with the use of 10 of the most relevant documents, 5 passage with the highest score and 10 answer the closest distance. Mean Reciprocal Rank (MMR) value for QA system with classification is 0.41960 which is 4.9% better than MRR value for QA system without classificatio

    Learning Fair Naive Bayes Classifiers by Discovering and Eliminating Discrimination Patterns

    Full text link
    As machine learning is increasingly used to make real-world decisions, recent research efforts aim to define and ensure fairness in algorithmic decision making. Existing methods often assume a fixed set of observable features to define individuals, but lack a discussion of certain features not being observed at test time. In this paper, we study fairness of naive Bayes classifiers, which allow partial observations. In particular, we introduce the notion of a discrimination pattern, which refers to an individual receiving different classifications depending on whether some sensitive attributes were observed. Then a model is considered fair if it has no such pattern. We propose an algorithm to discover and mine for discrimination patterns in a naive Bayes classifier, and show how to learn maximum likelihood parameters subject to these fairness constraints. Our approach iteratively discovers and eliminates discrimination patterns until a fair model is learned. An empirical evaluation on three real-world datasets demonstrates that we can remove exponentially many discrimination patterns by only adding a small fraction of them as constraints

    A systematic comparison of supervised classifiers

    Get PDF
    Pattern recognition techniques have been employed in a myriad of industrial, medical, commercial and academic applications. To tackle such a diversity of data, many techniques have been devised. However, despite the long tradition of pattern recognition research, there is no technique that yields the best classification in all scenarios. Therefore, the consideration of as many as possible techniques presents itself as an fundamental practice in applications aiming at high accuracy. Typical works comparing methods either emphasize the performance of a given algorithm in validation tests or systematically compare various algorithms, assuming that the practical use of these methods is done by experts. In many occasions, however, researchers have to deal with their practical classification tasks without an in-depth knowledge about the underlying mechanisms behind parameters. Actually, the adequate choice of classifiers and parameters alike in such practical circumstances constitutes a long-standing problem and is the subject of the current paper. We carried out a study on the performance of nine well-known classifiers implemented by the Weka framework and compared the dependence of the accuracy with their configuration parameter configurations. The analysis of performance with default parameters revealed that the k-nearest neighbors method exceeds by a large margin the other methods when high dimensional datasets are considered. When other configuration of parameters were allowed, we found that it is possible to improve the quality of SVM in more than 20% even if parameters are set randomly. Taken together, the investigation conducted in this paper suggests that, apart from the SVM implementation, Weka's default configuration of parameters provides an performance close the one achieved with the optimal configuration

    Anomaly Detection Based on Indicators Aggregation

    Full text link
    Automatic anomaly detection is a major issue in various areas. Beyond mere detection, the identification of the source of the problem that produced the anomaly is also essential. This is particularly the case in aircraft engine health monitoring where detecting early signs of failure (anomalies) and helping the engine owner to implement efficiently the adapted maintenance operations (fixing the source of the anomaly) are of crucial importance to reduce the costs attached to unscheduled maintenance. This paper introduces a general methodology that aims at classifying monitoring signals into normal ones and several classes of abnormal ones. The main idea is to leverage expert knowledge by generating a very large number of binary indicators. Each indicator corresponds to a fully parametrized anomaly detector built from parametric anomaly scores designed by experts. A feature selection method is used to keep only the most discriminant indicators which are used at inputs of a Naive Bayes classifier. This give an interpretable classifier based on interpretable anomaly detectors whose parameters have been optimized indirectly by the selection process. The proposed methodology is evaluated on simulated data designed to reproduce some of the anomaly types observed in real world engines.Comment: International Joint Conference on Neural Networks (IJCNN 2014), Beijing : China (2014). arXiv admin note: substantial text overlap with arXiv:1407.088
    • …
    corecore