20,550 research outputs found
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
Expert Finding by Capturing Organisational Knowledge from Legacy Documents
Organisations capitalise on their best knowledge through the improvement of shared expertise which leads to a higher level of productivity and competency. The recognition of the need to foster the sharing of expertise has led to the development of expert finder systems that hold pointers to experts who posses specific knowledge in organisations. This paper discusses an approach to locating an expert through the application of information retrieval and analysis processes to an organization’s existing information resources, with specific reference to the engineering design domain. The approach taken was realised through an expert finder system framework. It enables the relationships of heterogeneous information sources with experts to be factored in modelling individuals’ expertise. These valuable relationships are typically ignored by existing expert finder systems, which only focus on how documents relate to their content. The developed framework also provides an architecture that can be easily adapted to different organisational environments. In addition, it also allows users to access the expertise recognition logic, giving them greater trust in the systems implemented using this framework. The framework were applied to real world application and evaluated within a major engineering company
Making AI Meaningful Again
Artificial intelligence (AI) research enjoyed an initial period of enthusiasm in the 1970s and 80s. But this enthusiasm was tempered by a long interlude of frustration when genuinely useful AI applications failed to be forthcoming. Today, we are experiencing once again a period of enthusiasm, fired above all by the successes of the technology of deep neural networks or deep machine learning. In this paper we draw attention to what we take to be serious problems underlying current views of artificial intelligence encouraged by these successes, especially in the domain of language processing. We then show an alternative approach to language-centric AI, in which we identify a role for philosophy
Toward Optimal Feature Selection in Naive Bayes for Text Categorization
Automated feature selection is important for text categorization to reduce
the feature size and to speed up the learning process of classifiers. In this
paper, we present a novel and efficient feature selection framework based on
the Information Theory, which aims to rank the features with their
discriminative capacity for classification. We first revisit two information
measures: Kullback-Leibler divergence and Jeffreys divergence for binary
hypothesis testing, and analyze their asymptotic properties relating to type I
and type II errors of a Bayesian classifier. We then introduce a new divergence
measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure
multi-distribution divergence for multi-class classification. Based on the
JMH-divergence, we develop two efficient feature selection methods, termed
maximum discrimination () and methods, for text categorization.
The promising results of extensive experiments demonstrate the effectiveness of
the proposed approaches.Comment: This paper has been submitted to the IEEE Trans. Knowledge and Data
Engineering. 14 pages, 5 figure
Recommended from our members
An architecture for the automated detection of textual indicators of reflection
Manual annotation of evidence of reflection expressed in texts is time consuming, especially as fine-grained models of reflection require extensive training of coders, otherwise resulting in low inter-coder reliability. Automated reflection detection provides a solution to this problem. Within this paper, a new basic architecture for detecting evidence of reflection is proposed that allows for automated marking up of written accounts of certain, observable elements of reflection. Furthermore, three promising example annotators of elements of reflection are identified, implemented, and demonstrated: detecting reflective keywords, premise and conclusions of arguments, and questions. It appears that automated detection of reflections bears the potential to support learning with technology at least on three levels: it can foster creating awareness of the reflectivity of own writings, it can help in becoming aware of reflective writings of others, and it can make visible reflective writings of learning networks as a whole
Recommended from our members
Hierarchical classification for multiple, distributed web databases
The proliferation of online information resources increases the importance of effective and efficient distributed searching. Our research aims to provide an alternative hierarchical categorization and search capability based on a Bayesian network learning algorithm. Our proposed approach, which is grounded on automatic textual analysis of subject content of online web databases, attempts to address the database selection problem by first classifying web databases into a hierarchy of topic categories. The experimental results reported demonstrate that such a classification approach not only effectively reduces the class search space, but also helps to significantly improve the accuracy of classification performance
KACST Arabic Text Classification Project: Overview and Preliminary Results
Electronically formatted Arabic free-texts can be found in abundance these days on the World Wide Web, often linked to commercial enterprises and/or government organizations. Vast tracts of knowledge and relations lie hidden within these texts, knowledge that can be exploited once the correct intelligent tools have been identified and applied. For example, text mining may help with text classification and categorization. Text classification aims to automatically assign text to a predefined category based on identifiable linguistic features. Such a process has different useful applications including, but not restricted to, E-Mail spam detection, web pages content filtering, and automatic message routing. In this paper an overview of King Abdulaziz City for Science and Technology (KACST) Arabic Text Classification Project will be illustrated along with some preliminary results. This project will contribute to the better understanding and elaboration of Arabic text classification techniques
- …