5,412 research outputs found
Query expansion with naive bayes for searching distributed collections
The proliferation of online information resources increases the importance of effective and efficient distributed searching. However, the problem of word mismatch seriously hurts the effectiveness of distributed information retrieval. Automatic query expansion has been suggested as a technique for dealing with the fundamental issue of word mismatch. In this paper, we propose a method - query expansion with Naive Bayes to address the problem, discuss its implementation in IISS system, and present experimental results demonstrating its effectiveness. Such technique not only enhances the discriminatory power of typical queries for choosing the right collections but also hence significantly improves retrieval results
Naive Bayes Classification in The Question and Answering System
Abstract—Question and answering (QA) system is a system to answer question based on collections of unstructured text or in the form of human language. In general, QA system consists of four stages, i.e. question analysis, documents selection, passage retrieval and answer extraction. In this study we added two processes i.e. classifying documents and classifying passage. We use Naïve Bayes for classification, Dynamic Passage Partitioning for finding answer and Lucene for document selection. The experiment was done using 100 questions from 3000 documents related to the disease and the results were compared with a system that does not use the classification process. From the test results, the system works best with the use of 10 of the most relevant documents, 5 passage with the highest score and 10 answer the closest distance. Mean Reciprocal Rank (MMR) value for QA system with classification is 0.41960 which is 4.9% better than MRR value for QA system without classificatio
Learning Fair Naive Bayes Classifiers by Discovering and Eliminating Discrimination Patterns
As machine learning is increasingly used to make real-world decisions, recent
research efforts aim to define and ensure fairness in algorithmic decision
making. Existing methods often assume a fixed set of observable features to
define individuals, but lack a discussion of certain features not being
observed at test time. In this paper, we study fairness of naive Bayes
classifiers, which allow partial observations. In particular, we introduce the
notion of a discrimination pattern, which refers to an individual receiving
different classifications depending on whether some sensitive attributes were
observed. Then a model is considered fair if it has no such pattern. We propose
an algorithm to discover and mine for discrimination patterns in a naive Bayes
classifier, and show how to learn maximum likelihood parameters subject to
these fairness constraints. Our approach iteratively discovers and eliminates
discrimination patterns until a fair model is learned. An empirical evaluation
on three real-world datasets demonstrates that we can remove exponentially many
discrimination patterns by only adding a small fraction of them as constraints
A systematic comparison of supervised classifiers
Pattern recognition techniques have been employed in a myriad of industrial,
medical, commercial and academic applications. To tackle such a diversity of
data, many techniques have been devised. However, despite the long tradition of
pattern recognition research, there is no technique that yields the best
classification in all scenarios. Therefore, the consideration of as many as
possible techniques presents itself as an fundamental practice in applications
aiming at high accuracy. Typical works comparing methods either emphasize the
performance of a given algorithm in validation tests or systematically compare
various algorithms, assuming that the practical use of these methods is done by
experts. In many occasions, however, researchers have to deal with their
practical classification tasks without an in-depth knowledge about the
underlying mechanisms behind parameters. Actually, the adequate choice of
classifiers and parameters alike in such practical circumstances constitutes a
long-standing problem and is the subject of the current paper. We carried out a
study on the performance of nine well-known classifiers implemented by the Weka
framework and compared the dependence of the accuracy with their configuration
parameter configurations. The analysis of performance with default parameters
revealed that the k-nearest neighbors method exceeds by a large margin the
other methods when high dimensional datasets are considered. When other
configuration of parameters were allowed, we found that it is possible to
improve the quality of SVM in more than 20% even if parameters are set
randomly. Taken together, the investigation conducted in this paper suggests
that, apart from the SVM implementation, Weka's default configuration of
parameters provides an performance close the one achieved with the optimal
configuration
Anomaly Detection Based on Indicators Aggregation
Automatic anomaly detection is a major issue in various areas. Beyond mere
detection, the identification of the source of the problem that produced the
anomaly is also essential. This is particularly the case in aircraft engine
health monitoring where detecting early signs of failure (anomalies) and
helping the engine owner to implement efficiently the adapted maintenance
operations (fixing the source of the anomaly) are of crucial importance to
reduce the costs attached to unscheduled maintenance. This paper introduces a
general methodology that aims at classifying monitoring signals into normal
ones and several classes of abnormal ones. The main idea is to leverage expert
knowledge by generating a very large number of binary indicators. Each
indicator corresponds to a fully parametrized anomaly detector built from
parametric anomaly scores designed by experts. A feature selection method is
used to keep only the most discriminant indicators which are used at inputs of
a Naive Bayes classifier. This give an interpretable classifier based on
interpretable anomaly detectors whose parameters have been optimized indirectly
by the selection process. The proposed methodology is evaluated on simulated
data designed to reproduce some of the anomaly types observed in real world
engines.Comment: International Joint Conference on Neural Networks (IJCNN 2014),
Beijing : China (2014). arXiv admin note: substantial text overlap with
arXiv:1407.088
- …