27,227 research outputs found
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
Advanced language modeling approaches, case study: Expert search
This tutorial gives a clear and detailed overview of advanced language modeling approaches and tools, including the use of document priors, translation models, relevance models, parsimonious models and expectation maximization training. Expert search will be used as a case study to explain the consequences of modeling assumptions
Language Models
Contains fulltext :
227630.pdf (preprint version ) (Open Access
Learning to Rank Academic Experts in the DBLP Dataset
Expert finding is an information retrieval task that is concerned with the
search for the most knowledgeable people with respect to a specific topic, and
the search is based on documents that describe people's activities. The task
involves taking a user query as input and returning a list of people who are
sorted by their level of expertise with respect to the user query. Despite
recent interest in the area, the current state-of-the-art techniques lack in
principled approaches for optimally combining different sources of evidence.
This article proposes two frameworks for combining multiple estimators of
expertise. These estimators are derived from textual contents, from
graph-structure of the citation patterns for the community of experts, and from
profile information about the experts. More specifically, this article explores
the use of supervised learning to rank methods, as well as rank aggregation
approaches, for combing all of the estimators of expertise. Several supervised
learning algorithms, which are representative of the pointwise, pairwise and
listwise approaches, were tested, and various state-of-the-art data fusion
techniques were also explored for the rank aggregation framework. Experiments
that were performed on a dataset of academic publications from the Computer
Science domain attest the adequacy of the proposed approaches.Comment: Expert Systems, 2013. arXiv admin note: text overlap with
arXiv:1302.041
Disambiguation strategies for cross-language information retrieval
This paper gives an overview of tools and methods for Cross-Language Information Retrieval (CLIR) that are developed within the Twenty-One project. The tools and methods are evaluated with the TREC CLIR task document collection using Dutch queries on the English document base. The main issue addressed here is an evaluation of two approaches to disambiguation. The underlying question is whether a lot of effort should be put in finding the correct translation for each query term before searching, or whether searching with more than one possible translation leads to better results? The experimental study suggests that the quality of search methods is more important than the quality of disambiguation methods. Good retrieval methods are able to disambiguate translated queries implicitly during searching
Probabilistic latent semantic analysis as a potential method for integrating spatial data concepts
In this paper we explore the use of Probabilistic Latent Semantic Analysis (PLSA) as a method for quantifying semantic differences between land cover classes. The results are promising, revealing ‘hidden’ or not easily discernible data concepts. PLSA provides a ‘bottom up’ approach to interoperability problems for users in the face of ‘top down’ solutions provided by formal ontologies. We note the potential for a meta-problem of how to interpret the concepts and the need for further research to reconcile the top-down and bottom-up approaches
Neural Networks for Information Retrieval
Machine learning plays a role in many aspects of modern IR systems, and deep
learning is applied in all of them. The fast pace of modern-day research has
given rise to many different approaches for many different IR problems. The
amount of information available can be overwhelming both for junior students
and for experienced researchers looking for new research topics and directions.
Additionally, it is interesting to see what key insights into IR problems the
new technologies are able to give us. The aim of this full-day tutorial is to
give a clear overview of current tried-and-trusted neural methods in IR and how
they benefit IR research. It covers key architectures, as well as the most
promising future directions.Comment: Overview of full-day tutorial at SIGIR 201
CBR and MBR techniques: review for an application in the emergencies domain
The purpose of this document is to provide an in-depth analysis of current reasoning engine practice and the integration strategies of Case Based Reasoning and Model Based Reasoning that will be used in the design and development of the RIMSAT system.
RIMSAT (Remote Intelligent Management Support and Training) is a European Commission funded project designed to:
a.. Provide an innovative, 'intelligent', knowledge based solution aimed at improving the quality of critical decisions
b.. Enhance the competencies and responsiveness of individuals and organisations involved in highly complex, safety critical incidents - irrespective of their location.
In other words, RIMSAT aims to design and implement a decision support system that using Case Base Reasoning as well as Model Base Reasoning technology is applied in the management of emergency situations.
This document is part of a deliverable for RIMSAT project, and although it has been done in close contact with the requirements of the project, it provides an overview wide enough for providing a state of the art in integration strategies between CBR and MBR technologies.Postprint (published version
Composite load spectra for select space propulsion structural components
The work performed to develop composite load spectra (CLS) for the Space Shuttle Main Engine (SSME) using probabilistic methods. The three methods were implemented to be the engine system influence model. RASCAL was chosen to be the principal method as most component load models were implemented with the method. Validation of RASCAL was performed. High accuracy comparable to the Monte Carlo method can be obtained if a large enough bin size is used. Generic probabilistic models were developed and implemented for load calculations using the probabilistic methods discussed above. Each engine mission, either a real fighter or a test, has three mission phases: the engine start transient phase, the steady state phase, and the engine cut off transient phase. Power level and engine operating inlet conditions change during a mission. The load calculation module provides the steady-state and quasi-steady state calculation procedures with duty-cycle-data option. The quasi-steady state procedure is for engine transient phase calculations. In addition, a few generic probabilistic load models were also developed for specific conditions. These include the fixed transient spike model, the poison arrival transient spike model, and the rare event model. These generic probabilistic load models provide sufficient latitude for simulating loads with specific conditions. For SSME components, turbine blades, transfer ducts, LOX post, and the high pressure oxidizer turbopump (HPOTP) discharge duct were selected for application of the CLS program. They include static pressure loads and dynamic pressure loads for all four components, centrifugal force for the turbine blade, temperatures of thermal loads for all four components, and structural vibration loads for the ducts and LOX posts
- …