89,332 research outputs found
Introduction to the special issue on cross-language algorithms and applications
With the increasingly global nature of our everyday interactions, the need for multilingual technologies to support efficient and efective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of
Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross-language in order to create multilingual technologies rapidly. The goal of this JAIR special
issue on Cross-Language Algorithms and Applications (CLAA) is to present leading research in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment
analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.Postprint (published version
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
Recommended from our members
Personalization via collaboration in web retrieval systems: a context based approach
World Wide Web is a source of information, and searches on the Web can be analyzed to detect patterns in Web users' search behaviors and information needs to effectively handle the users' subsequent needs. The rationale is that the information need of a user at a particular time point occurs in a particular context, and queries are derived from that need. In this paper, we discuss an extension of our personalization approach that was originally developed for a traditional bibliographic retrieval system but has been adapted and extended with a collaborative model for the Web retrieval environment. We start with a brief introduction of our personalization approach in a traditional information retrieval system. Then, based on the differences in the nature of documents, users and search tasks between traditional and Web retrieval environments, we describe our extensions of integrating collaboration in personalization in the Web retrieval environment. The architecture for the extension integrates machine learning techniques for the purpose of better modeling users' search tasks. Finally, a user-oriented evaluation of Web-based adaptive retrieval systems is presented as an important aspect of the overall strategy for personalization
Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval
Relevance feedback schemes based on support vector machines (SVM) have been widely used in content-based image retrieval (CBIR). However, the performance of SVM-based relevance feedback is often poor when the number of labeled positive feedback samples is small. This is mainly due to three reasons: 1) an SVM classifier is unstable on a small-sized training set, 2) SVM's optimal hyperplane may be biased when the positive feedback samples are much less than the negative feedback samples, and 3) overfitting happens because the number of feature dimensions is much higher than the size of the training set. In this paper, we develop a mechanism to overcome these problems. To address the first two problems, we propose an asymmetric bagging-based SVM (AB-SVM). For the third problem, we combine the random subspace method and SVM for relevance feedback, which is named random subspace SVM (RS-SVM). Finally, by integrating AB-SVM and RS-SVM, an asymmetric bagging and random subspace SVM (ABRS-SVM) is built to solve these three problems and further improve the relevance feedback performance
Recommended from our members
Local search: A guide for the information retrieval practitioner
There are a number of combinatorial optimisation problems in information retrieval in which the use of local search methods are worthwhile. The purpose of this paper is to show how local search can be used to solve some well known tasks in information retrieval (IR), how previous research in the field is piecemeal, bereft of a structure and methodologically flawed, and to suggest more rigorous ways of applying local search methods to solve IR problems. We provide a query based taxonomy for analysing the use of local search in IR tasks and an overview of issues such as fitness functions, statistical significance and test collections when conducting experiments on combinatorial optimisation problems. The paper gives a guide on the pitfalls and problems for IR practitioners who wish to use local search to solve their research issues, and gives practical advice on the use of such methods. The query based taxonomy is a novel structure which can be used by the IR practitioner in order to examine the use of local search in IR
Sequence Mining and Pattern Analysis in Drilling Reports with Deep Natural Language Processing
Drilling activities in the oil and gas industry have been reported over
decades for thousands of wells on a daily basis, yet the analysis of this text
at large-scale for information retrieval, sequence mining, and pattern analysis
is very challenging. Drilling reports contain interpretations written by
drillers from noting measurements in downhole sensors and surface equipment,
and can be used for operation optimization and accident mitigation. In this
initial work, a methodology is proposed for automatic classification of
sentences written in drilling reports into three relevant labels (EVENT,
SYMPTOM and ACTION) for hundreds of wells in an actual field. Some of the main
challenges in the text corpus were overcome, which include the high frequency
of technical symbols, mistyping/abbreviation of technical terms, and the
presence of incomplete sentences in the drilling reports. We obtain
state-of-the-art classification accuracy within this technical language and
illustrate advanced queries enabled by the tool.Comment: 7 pages, 14 figures, technical repor
The 'what' and 'how' of learning in design, invited paper
Previous experiences hold a wealth of knowledge which we often take for granted and use unknowingly through our every day working lives. In design, those experiences can play a crucial role in the success or failure of a design project, having a great deal of influence on the quality, cost and development time of a product. But how can we empower computer based design systems to acquire this knowledge? How would we use such systems to support design? This paper outlines some of the work which has been carried out in applying and developing Machine Learning techniques to support the design activity; particularly in utilising previous designs and learning the design process
- …