14,739 research outputs found
Query expansion with naive bayes for searching distributed collections
The proliferation of online information resources increases the importance of effective and efficient distributed searching. However, the problem of word mismatch seriously hurts the effectiveness of distributed information retrieval. Automatic query expansion has been suggested as a technique for dealing with the fundamental issue of word mismatch. In this paper, we propose a method - query expansion with Naive Bayes to address the problem, discuss its implementation in IISS system, and present experimental results demonstrating its effectiveness. Such technique not only enhances the discriminatory power of typical queries for choosing the right collections but also hence significantly improves retrieval results
Recommended from our members
Real estate portfolio construction and estimation risk
The use of MPT in the construction real estate portfolios has two serious limitations when used in an ex-ante framework: (1) the intertemporal instability of the portfolio weights and (2) the sharp deterioration in performance of the optimal portfolios outside the sample period used to estimate asset mean returns. Both problems can be traced to wide fluctuations in sample means Jorion (1985). Thus the use of a procedure that ignores the estimation risk due to the uncertain in mean returns is likely to produce sub-optimal results in subsequent periods. This suggests that the consideration of the issue of estimation risk is crucial in the use of MPT in developing a successful real estate portfolio strategy. Therefore, following Eun & Resnick (1988), this study extends previous ex-ante based studies by evaluating optimal portfolio allocations in subsequent test periods by using methods that have been proposed to reduce the effect of measurement error on optimal portfolio allocations
A systematic comparison of supervised classifiers
Pattern recognition techniques have been employed in a myriad of industrial,
medical, commercial and academic applications. To tackle such a diversity of
data, many techniques have been devised. However, despite the long tradition of
pattern recognition research, there is no technique that yields the best
classification in all scenarios. Therefore, the consideration of as many as
possible techniques presents itself as an fundamental practice in applications
aiming at high accuracy. Typical works comparing methods either emphasize the
performance of a given algorithm in validation tests or systematically compare
various algorithms, assuming that the practical use of these methods is done by
experts. In many occasions, however, researchers have to deal with their
practical classification tasks without an in-depth knowledge about the
underlying mechanisms behind parameters. Actually, the adequate choice of
classifiers and parameters alike in such practical circumstances constitutes a
long-standing problem and is the subject of the current paper. We carried out a
study on the performance of nine well-known classifiers implemented by the Weka
framework and compared the dependence of the accuracy with their configuration
parameter configurations. The analysis of performance with default parameters
revealed that the k-nearest neighbors method exceeds by a large margin the
other methods when high dimensional datasets are considered. When other
configuration of parameters were allowed, we found that it is possible to
improve the quality of SVM in more than 20% even if parameters are set
randomly. Taken together, the investigation conducted in this paper suggests
that, apart from the SVM implementation, Weka's default configuration of
parameters provides an performance close the one achieved with the optimal
configuration
Evaluation of Machine Learning Algorithms for Intrusion Detection System
Intrusion detection system (IDS) is one of the implemented solutions against
harmful attacks. Furthermore, attackers always keep changing their tools and
techniques. However, implementing an accepted IDS system is also a challenging
task. In this paper, several experiments have been performed and evaluated to
assess various machine learning classifiers based on KDD intrusion dataset. It
succeeded to compute several performance metrics in order to evaluate the
selected classifiers. The focus was on false negative and false positive
performance metrics in order to enhance the detection rate of the intrusion
detection system. The implemented experiments demonstrated that the decision
table classifier achieved the lowest value of false negative while the random
forest classifier has achieved the highest average accuracy rate
- …