113,781 research outputs found
Introduction to the special issue on cross-language algorithms and applications
With the increasingly global nature of our everyday interactions, the need for multilingual technologies to support efficient and efective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of
Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross-language in order to create multilingual technologies rapidly. The goal of this JAIR special
issue on Cross-Language Algorithms and Applications (CLAA) is to present leading research in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment
analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.Postprint (published version
Recommended from our members
Local search: A guide for the information retrieval practitioner
There are a number of combinatorial optimisation problems in information retrieval in which the use of local search methods are worthwhile. The purpose of this paper is to show how local search can be used to solve some well known tasks in information retrieval (IR), how previous research in the field is piecemeal, bereft of a structure and methodologically flawed, and to suggest more rigorous ways of applying local search methods to solve IR problems. We provide a query based taxonomy for analysing the use of local search in IR tasks and an overview of issues such as fitness functions, statistical significance and test collections when conducting experiments on combinatorial optimisation problems. The paper gives a guide on the pitfalls and problems for IR practitioners who wish to use local search to solve their research issues, and gives practical advice on the use of such methods. The query based taxonomy is a novel structure which can be used by the IR practitioner in order to examine the use of local search in IR
Recommended from our members
Organising music for movies
Purpose - The purpose of this paper is to examine and discuss the classification of commercial popular music when large digital collections are organised for use in films.
Design/methodology/approach - A range of systems are investigated and their organization is discussed, focusing on an analysis of the metadata used by the systems and choices given to the end-user to construct a query. The indexing of the music is compared to a checklist of music facets which has been derived from recent musicological literature on semiotic analysis of popular music. These facets include aspects of communication, cultural and musical expression, codes and competences.
Findings -In addition to bibliographic detail, descriptive metadata is used to organise music in these systems. Genre, subject and mood are used widely; some musical facets also appear. The extent to which attempts are being made to reflect these facets in the organization of these systems is discussed. A number of recommendations are made which may help to improve this process.
Originality/value - This paper discusses an area of creative music search which has not previously been investigated in any depth and makes recommendations based on findings and the literature which may be used in the development of commercial systems as well as making a contribution to the literature
Generalized Approximate Survey Propagation for High-Dimensional Estimation
In Generalized Linear Estimation (GLE) problems, we seek to estimate a signal
that is observed through a linear transform followed by a component-wise,
possibly nonlinear and noisy, channel. In the Bayesian optimal setting,
Generalized Approximate Message Passing (GAMP) is known to achieve optimal
performance for GLE. However, its performance can significantly degrade
whenever there is a mismatch between the assumed and the true generative model,
a situation frequently encountered in practice. In this paper, we propose a new
algorithm, named Generalized Approximate Survey Propagation (GASP), for solving
GLE in the presence of prior or model mis-specifications. As a prototypical
example, we consider the phase retrieval problem, where we show that GASP
outperforms the corresponding GAMP, reducing the reconstruction threshold and,
for certain choices of its parameters, approaching Bayesian optimal
performance. Furthermore, we present a set of State Evolution equations that
exactly characterize the dynamics of GASP in the high-dimensional limit
Science Models as Value-Added Services for Scholarly Information Systems
The paper introduces scholarly Information Retrieval (IR) as a further
dimension that should be considered in the science modeling debate. The IR use
case is seen as a validation model of the adequacy of science models in
representing and predicting structure and dynamics in science. Particular
conceptualizations of scholarly activity and structures in science are used as
value-added search services to improve retrieval quality: a co-word model
depicting the cognitive structure of a field (used for query expansion), the
Bradford law of information concentration, and a model of co-authorship
networks (both used for re-ranking search results). An evaluation of the
retrieval quality when science model driven services are used turned out that
the models proposed actually provide beneficial effects to retrieval quality.
From an IR perspective, the models studied are therefore verified as expressive
conceptualizations of central phenomena in science. Thus, it could be shown
that the IR perspective can significantly contribute to a better understanding
of scholarly structures and activities.Comment: 26 pages, to appear in Scientometric
The Most Influential Paper Gerard Salton Never Wrote
Gerard Salton is often credited with developing the vector space model
(VSM) for information retrieval (IR). Citations to Salton give the impression
that the VSM must have been articulated as an IR model sometime between
1970 and 1975. However, the VSM as it is understood today evolved over a
longer time period than is usually acknowledged, and an articulation of the
model and its assumptions did not appear in print until several years after
those assumptions had been criticized and alternative models proposed. An
often cited overview paper titled ???A Vector Space Model for Information
Retrieval??? (alleged to have been published in 1975) does not exist, and
citations to it represent a confusion of two 1975 articles, neither of which
were overviews of the VSM as a model of information retrieval. Until the
late 1970s, Salton did not present vector spaces as models of IR generally
but rather as models of specifi c computations. Citations to the phantom
paper refl ect an apparently widely held misconception that the operational
features and explanatory devices now associated with the VSM must have
been introduced at the same time it was fi rst proposed as an IR model.published or submitted for publicatio
Disambiguation strategies for cross-language information retrieval
This paper gives an overview of tools and methods for Cross-Language Information Retrieval (CLIR) that are developed within the Twenty-One project. The tools and methods are evaluated with the TREC CLIR task document collection using Dutch queries on the English document base. The main issue addressed here is an evaluation of two approaches to disambiguation. The underlying question is whether a lot of effort should be put in finding the correct translation for each query term before searching, or whether searching with more than one possible translation leads to better results? The experimental study suggests that the quality of search methods is more important than the quality of disambiguation methods. Good retrieval methods are able to disambiguate translated queries implicitly during searching
A Survey of Volunteered Open Geo-Knowledge Bases in the Semantic Web
Over the past decade, rapid advances in web technologies, coupled with
innovative models of spatial data collection and consumption, have generated a
robust growth in geo-referenced information, resulting in spatial information
overload. Increasing 'geographic intelligence' in traditional text-based
information retrieval has become a prominent approach to respond to this issue
and to fulfill users' spatial information needs. Numerous efforts in the
Semantic Geospatial Web, Volunteered Geographic Information (VGI), and the
Linking Open Data initiative have converged in a constellation of open
knowledge bases, freely available online. In this article, we survey these open
knowledge bases, focusing on their geospatial dimension. Particular attention
is devoted to the crucial issue of the quality of geo-knowledge bases, as well
as of crowdsourced data. A new knowledge base, the OpenStreetMap Semantic
Network, is outlined as our contribution to this area. Research directions in
information integration and Geographic Information Retrieval (GIR) are then
reviewed, with a critical discussion of their current limitations and future
prospects
- …