3,503 research outputs found
Comparing SVM and Naive Bayes classifiers for text categorization with Wikitology as knowledge enrichment
The activity of labeling of documents according to their content is known as
text categorization. Many experiments have been carried out to enhance text
categorization by adding background knowledge to the document using knowledge
repositories like Word Net, Open Project Directory (OPD), Wikipedia and
Wikitology. In our previous work, we have carried out intensive experiments by
extracting knowledge from Wikitology and evaluating the experiment on Support
Vector Machine with 10- fold cross-validations. The results clearly indicate
Wikitology is far better than other knowledge bases. In this paper we are
comparing Support Vector Machine (SVM) and Na\"ive Bayes (NB) classifiers under
text enrichment through Wikitology. We validated results with 10-fold cross
validation and shown that NB gives an improvement of +28.78%, on the other hand
SVM gives an improvement of +6.36% when compared with baseline results. Na\"ive
Bayes classifier is better choice when external enriching is used through any
external knowledge base.Comment: 5 page
Peer to Peer Information Retrieval: An Overview
Peer-to-peer technology is widely used for file sharing. In the past decade a number of prototype peer-to-peer information retrieval systems have been developed. Unfortunately, none of these have seen widespread real- world adoption and thus, in contrast with file sharing, information retrieval is still dominated by centralised solutions. In this paper we provide an overview of the key challenges for peer-to-peer information retrieval and the work done so far. We want to stimulate and inspire further research to overcome these challenges. This will open the door to the development and large-scale deployment of real-world peer-to-peer information retrieval systems that rival existing centralised client-server solutions in terms of scalability, performance, user satisfaction and freedom
Nomenclature and Contemporary Affirmation of the Unsupervised Learning in Text and Document Mining
Document clustering is primarily a method applied for an uncomplicated, document search, analysis and review of content or is a process of automatic classification of documents of similar type categorized to relevant clusters, in a clustering hierarchy. In this paper a review of the related work in the field of document clustering from the simple techniques of word and phrase to the present complex techniques of statistical analysis, machine learning etc are illustrated with their implications for future research work
User centred evaluation of a recommendation based image browsing system
In this paper, we introduce a novel approach to recommend images by mining user interactions based on implicit feedback of user browsing. The underlying hypothesis is that the interaction implicitly indicates the interests of the users for meeting practical image retrieval tasks. The algorithm mines interaction data and also low-level content of the clicked images to choose diverse images by clustering heterogeneous features. A user-centred, task-oriented, comparative evaluation was undertaken to verify the validity of our approach where two versions of systems { one set up to enable diverse image recommendation { the other allowing browsing only { were compared. Use was made of the two systems by users in simulated work task situations and quantitative and qualitative data collected as indicators of recommendation results and the levels of user's satisfaction. The responses from the users indicate that they nd the more diverse recommendation highly useful
Data Mining Techniques for Mining Query Logs in Web Search Engines
International audienceThe Web is the biggest repository of documents humans have ever built. Even more, it is increasingly growing in size every day. Users rely on Web search engines (WSEs) for finding information on the Web. By submitting a textual query expressing their information need, WSE users obtain a list of documents that are highly relevant to the query. Moreover, WSEs store such huge amount of users activities in query logs. Query log mining is the set of techniques aiming at extracting valuable knowledge from query logs. This knowledge represents one of the most used ways of enhancing the users search experience. The primary focus of this work is on introducing the data mining techniques for mining query logs in web search engines and showing how search engines applications may benefit from this mining
Topic Detection and Tracking in Personal Search History
This thesis describes a system for tracking and detecting topics in personal search history. In particular, we developed a time tracking tool that helps users in analyzing their time and discovering their activity patterns. The system allows a user to specify interesting topics to monitor with a keyword description. The system would then keep track of the log and the time spent on each document and produce a time graph to show how much time has been spent on each topic to be monitored. The system can also detect new topics and potentially recommend relevant information about them to the user. This work has been integrated with the UCAIR Toolbar, a client side agent. Considering limited resources on the client side, we designed an e????cient incremental algorithm for topic tracking and detection. Various unsupervised learning approaches have been considered to improve the accuracy in categorizing the user log into appropriate categories. Experiments show that our tool is effective in categorizing the documents into existing categories and detecting the new useful catgeories. Moreover, the quality of categorization improves over time as more and more log is available
- …