7,122 research outputs found
A grid-based infrastructure for distributed retrieval
In large-scale distributed retrieval, challenges of latency, heterogeneity, and dynamicity emphasise the importance of infrastructural support in reducing the development costs of state-of-the-art solutions. We present a service-based infrastructure for distributed retrieval which blends middleware facilities and a design framework to âliftâ the resource sharing approach and the computational services of a European Grid platform into the domain of e-Science applications. In this paper, we give an overview of the DILIGENT Search Framework and illustrate its exploitation in the ïŹeld of Earth Science
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
Towards Cleaning-up Open Data Portals: A Metadata Reconciliation Approach
This paper presents an approach for metadata reconciliation, curation and
linking for Open Governamental Data Portals (ODPs). ODPs have been lately the
standard solution for governments willing to put their public data available
for the society. Portal managers use several types of metadata to organize the
datasets, one of the most important ones being the tags. However, the tagging
process is subject to many problems, such as synonyms, ambiguity or
incoherence, among others. As our empiric analysis of ODPs shows, these issues
are currently prevalent in most ODPs and effectively hinders the reuse of Open
Data. In order to address these problems, we develop and implement an approach
for tag reconciliation in Open Data Portals, encompassing local actions related
to individual portals, and global actions for adding a semantic metadata layer
above individual portals. The local part aims to enhance the quality of tags in
a single portal, and the global part is meant to interlink ODPs by establishing
relations between tags.Comment: 8 pages,10 Figures - Under Revision for ICSC201
Generating indicative-informative summaries with SumUM
We present and evaluate SumUM, a text summarization system that takes a raw technical text as input and produces an indicative informative summary. The indicative part of the summary identifies the topics of the document, and the informative part elaborates on some of these topics according to the reader's interest. SumUM motivates the topics, describes entities, and defines concepts. It is a first step for exploring the issue of dynamic summarization. This is accomplished through a process of shallow syntactic and semantic analysis, concept identification, and text regeneration. Our method was developed through the study of a corpus of abstracts written by professional abstractors. Relying on human judgment, we have evaluated indicativeness, informativeness, and text acceptability of the automatic summaries. The results thus far indicate good performance when compared with other summarization technologies
ACL Anthology Helper: A Tool to Retrieve and Manage Literature from ACL Anthology
The ACL Anthology is an online repository that serves as a comprehensive
collection of publications in the field of natural language processing (NLP)
and computational linguistics (CL). This paper presents a tool called ``ACL
Anthology Helper''. It automates the process of parsing and downloading papers
along with their meta-information, which are then stored in a local MySQL
database. This allows for efficient management of the local papers using a wide
range of operations, including "where," "group," "order," and more. By
providing over 20 operations, this tool significantly enhances the retrieval of
literature based on specific conditions. Notably, this tool has been
successfully utilised in writing a survey paper (Tang et al.,2022a). By
introducing the ACL Anthology Helper, we aim to enhance researchers' ability to
effectively access and organise literature from the ACL Anthology. This tool
offers a convenient solution for researchers seeking to explore the ACL
Anthology's vast collection of publications while allowing for more targeted
and efficient literature retrieval
From Frequency to Meaning: Vector Space Models of Semantics
Computers understand very little of the meaning of human language. This
profoundly limits our ability to give instructions to computers, the ability of
computers to explain their actions to us, and the ability of computers to
analyse and process text. Vector space models (VSMs) of semantics are beginning
to address these limits. This paper surveys the use of VSMs for semantic
processing of text. We organize the literature on VSMs according to the
structure of the matrix in a VSM. There are currently three broad classes of
VSMs, based on term-document, word-context, and pair-pattern matrices, yielding
three classes of applications. We survey a broad range of applications in these
three categories and we take a detailed look at a specific open source project
in each category. Our goal in this survey is to show the breadth of
applications of VSMs for semantics, to provide a new perspective on VSMs for
those who are already familiar with the area, and to provide pointers into the
literature for those who are less familiar with the field
SemEval-2016 Task 13: Taxonomy Extraction Evaluation (TExEval-2)
This paper describes the second edition of the shared task on Taxonomy Extraction Evaluation organised as part of SemEval 2016. This task aims to extract hypernym-hyponym relations between a given list of domain-specific terms and then to construct a domain taxonomy based on them. TExEval-2 introduced a multilingual setting for this task, covering four different languages including English, Dutch, Italian and French from domains as diverse as environment, food and science. A total of
62 runs submitted by 5 different teams were
evaluated using structural measures, by comparison with gold standard taxonomies and by manual quality assessment of novel relations.Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289 (INSIGHT
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and âenablersâ, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
- âŠ