1,346 research outputs found
Automatic Normalization and Annotation for Discovering Semantic Mappings
Schema matching is the problem of finding relationships among concepts across heterogeneous data sources (heterogeneous in format and in structure). Starting from the “hidden meaning” associated to schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) helps in associating a “meaning” to schema labels. However, accuracy of semi-automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns and word abbreviations. In this work, we address this problem by proposing a method to perform schema labels normalization which increases the number of comparable labels. Unlike other solutions, the method semi-automatically expands abbreviations and annotates compound terms, without a minimal manual effort. We empirically prove that our normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching accuracy
Uncertainty in data integration systems: automatic generation of probabilistic relationships
This paper proposes a method for the automatic discovery of probabilistic relationships in the environment of data integration systems. Dynamic data integration systems extend the architecture of current data integration systems by modeling uncertainty at their core. Our method is based on probabilistic word sense disambiguation (PWSD), which allows to automatically lexically annotate (i.e. to perform annotation w.r.t. a thesaurus/lexical resource) the schemata of a given set of data sources to be integrated. From the annotated schemata and the relathionships defined in the thesaurus, we derived the probabilistic lexical relationships among schema elements. Lexical relationships are collected in the Probabilistic Common Thesaurus (PCT), as well as structural relationships
Robust Subgraph Generation Improves Abstract Meaning Representation Parsing
The Abstract Meaning Representation (AMR) is a representation for open-domain
rich semantics, with potential use in fields like event extraction and machine
translation. Node generation, typically done using a simple dictionary lookup,
is currently an important limiting factor in AMR parsing. We propose a small
set of actions that derive AMR subgraphs by transformations on spans of text,
which allows for more robust learning of this stage. Our set of construction
actions generalize better than the previous approach, and can be learned with a
simple classifier. We improve on the previous state-of-the-art result for AMR
parsing, boosting end-to-end performance by 3 F on both the LDC2013E117 and
LDC2014T12 datasets.Comment: To appear in ACL 201
Melis: an incremental method for the lexical annotation of domain ontologies
In this paper, we present MELIS (Meaning Elicitation and Lexical Integration System), a method and a software tool for enabling an incremental process of automatic annotation of local schemas (e.g. relational database schemas, directory trees) with lexical information. The distinguishing and original feature of MELIS is the incremental process: the higher the number of schemas which are processed, the more background/domain knowledge is cumulated in the system (a portion of domain ontology is learned at every step), the better the performance of the systems on annotating new schemas.MELIS has been tested as component of MOMIS-Ontology Builder, a framework able to create a domain ontology representing a set of selected data sources, described with a standard W3C language wherein concepts and attributes are annotated according to the lexical reference database.We describe the MELIS component within the MOMIS-Ontology Builder framework and provide some experimental results of ME LIS as a standalone tool and as a component integrated in MOMIS
Semantic Web Services Provisioning
Semantic Web Services constitute an important research area, where vari ous underlying frameworks, such as WSMO and OWL-S, define Semantic Web
ontologies to describe Web services, so they can be automatically discovered,
composed, and invoked. Service discovery has been traditionally interpreted
as a functional filter in current Semantic Web Services frameworks, frequently
performed by Description Logics reasoners. However, semantic provisioning
has to be performed taking Quality-of-Service (QOS) into account, defining
user preferences that enable QOS-aware Semantic Web Service selection.
Nowadays, the research focus is actually on QOS-aware processes, so cur rent proposals are developing the field by providing QOS support to semantic
provisioning, especially in selection processes. These processes lead to opti mization problems, where the best service among a set of services has to be
selected, so Description Logics cannot be used in this context. Furthermore,
user preferences has to be semantically defined so they can be used within
selection processes.
There are several proposals that extend Semantic Web Services frameworks
allowing QOS-aware semantic provisioning. However, proposed selection
techniques are very coupled with their proposed extensions, most of them
being implemented ad hoc. Thus, there is a semantic gap between functional
descriptions (usually using WSMO or OWL-S) and user preferences, which are
specific for each proposal, using different ontologies or even non-semantic de scriptions, and depending on its corresponding ad hoc selection technique.
In this report, we give an overview of most important Semantic Web Ser vices frameworks, showing a comparison between them. Then, a thorough
analysis of state-of-the art proposals on QOS-aware semantic provisioning and
user preferences descriptions is presented, discussing about their applicabil ity, advantages, and defects. Results from this analysis motivate our research
work, which has been already materialized in two early contributions.Los servicios web semánticos constituyen un importante campo de inves tigación, en el cual distintos frameworks, como por ejemplo WSMO y OWL-S,
definen ontologías de la web semántica para describir servicios web, de for ma que estos puedan ser descubiertos, compuestos e invocados de manera
automática. El descubrimiento de servicios ha sido interpretado tradicional mente como un filtro funcional en los frameworks actuales de servicios web
semánticos, usando para ello razonadores de lógica descriptiva. Sin embargo,
las tareas de aprovisionamiento semántico deberían tener en cuenta la calidad
del servicio, definiendo para ello preferencias de usuario de manera que sea
posible realizar una selección de servicios web semánticos sensible a la cali dad.
Actualmente, el foco de la investigación está en procesos sensibles a la ca lidad, por lo que las propuestas actuales están trabajando en este campo intro duciendo el soporte adecuado a la calidad del servicio dentro del aprovisio namiento semántico, y principalmente en las tareas de selección. Estas tareas
desembocan en problemas de optimización, donde el mejor servicio de entre
un concjunto debe ser seleccionado, por lo que las lógicas descriptivas no pue den ser usadas en este contexto. Además, las preferencias de usuario deben ser
definidas semánticamente, de forma que puedan ser usadas en las tareas de
selección.
Existen bastantes propuestas que extienden los frameworks de servicios
web semánticos para habilitar el aprovisionamiento sensible a la calidad. Sin
embargo, las técnicas de selección propuestas están altamente acopladas con
dichas extensiones, donde la mayoría de ellas implementan algoritmos ad hoc.
Por tanto, existe un salto semántico entre las descripciones funcionales (nor malmente usando WSMO o OWL-S) y las preferencias de usuario, las cuales
son definidas específicamente por cada propuesta, usando ontologías distin tas o incluso descripciones no semánticas que dependen de la correspondiente
técnica de selección ad hoc
Toward automatic extraction of expressive elements from motion pictures : tempo
This paper addresses the challenge of bridging the semantic gap that exists between the simplicity of features that can be currently computed in automated content indexing systems and the richness of semantics in user queries posed for media search and retrieval. It proposes a unique computational approach to extraction of expressive elements of motion pictures for deriving high-level semantics of stories portrayed, thus enabling rich video annotation and interpretation. This approach, motivated and directed by the existing cinematic conventions known as film grammar, as a first step toward demonstrating its effectiveness, uses the attributes of motion and shot length to define and compute a novel measure of tempo of a movie. Tempo flow plots are defined and derived for a number of full-length movies and edge analysis is performed leading to the extraction of dramatic story sections and events signaled by their unique tempo. The results confirm tempo as a useful high-level semantic construct in its own right and a promising component of others such as rhythm, tone or mood of a film. In addition to the development of this computable tempo measure, a study is conducted as to the usefulness of biasing it toward either of its constituents, namely, motion or shot length. Finally, a refinement is made to the shot length normalizing mechanism, driven by the peculiar characteristics of shot length distribution exhibited by movies. Results of these additional studies, and possible applications and limitations are discussed
Recovering Grammar Relationships for the Java Language Specification
Grammar convergence is a method that helps discovering relationships between
different grammars of the same language or different language versions. The key
element of the method is the operational, transformation-based representation
of those relationships. Given input grammars for convergence, they are
transformed until they are structurally equal. The transformations are composed
from primitive operators; properties of these operators and the composed chains
provide quantitative and qualitative insight into the relationships between the
grammars at hand. We describe a refined method for grammar convergence, and we
use it in a major study, where we recover the relationships between all the
grammars that occur in the different versions of the Java Language
Specification (JLS). The relationships are represented as grammar
transformation chains that capture all accidental or intended differences
between the JLS grammars. This method is mechanized and driven by nominal and
structural differences between pairs of grammars that are subject to
asymmetric, binary convergence steps. We present the underlying operator suite
for grammar transformation in detail, and we illustrate the suite with many
examples of transformations on the JLS grammars. We also describe the
extraction effort, which was needed to make the JLS grammars amenable to
automated processing. We include substantial metadata about the convergence
process for the JLS so that the effort becomes reproducible and transparent
Overview of BioASQ 2023: The eleventh BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering
This is an overview of the eleventh edition of the BioASQ challenge in the
context of the Conference and Labs of the Evaluation Forum (CLEF) 2023. BioASQ
is a series of international challenges promoting advances in large-scale
biomedical semantic indexing and question answering. This year, BioASQ
consisted of new editions of the two established tasks b and Synergy, and a new
task (MedProcNER) on semantic annotation of clinical content in Spanish with
medical procedures, which have a critical role in medical practice. In this
edition of BioASQ, 28 competing teams submitted the results of more than 150
distinct systems in total for the three different shared tasks of the
challenge. Similarly to previous editions, most of the participating systems
achieved competitive performance, suggesting the continuous advancement of the
state-of-the-art in the field.Comment: 24 pages, 12 tables, 3 figures. CLEF2023. arXiv admin note: text
overlap with arXiv:2210.0685
An Intelligent Online Shopping Guide Based On Product Review Mining
This position paper describes an on-going work on a novel recommendation framework for assisting online shoppers in choosing the most desired products, in accordance with requirements input in natural language. Existing feature-based Shopping Guidance Systems fail when the customer lacks domain expertise. This framework enables the customer to use natural language in the query text to retrieve preferred products interactively. In addition, it is intelligent enough to allow a customer to use objective and subjective terms when querying, or even the purpose of purchase, to screen out the expected products
- …