Search CORE

9 research outputs found

Hacia una integración de un sistema de búsqueda de respuestas sobre la inteligencia empresarial mediante el uso de ontologías

Author: Ferrández Antonio
Peral Jesús
Roger Sandra
Publication venue
Publication date: 01/05/2010
Field of study

El objetivo general de las aplicaciones de inteligencia empresarial (Business Intelligence, a partir de ahora BI) es permitir a sus usuarios entender y analizar los datos existentes en sus organizaciones para adquirir conocimiento útil y lograr así una mejor toma de decisiones. El corazón de las aplicaciones de BI son los almacenes de datos (Data Warehouse, a partir de ahora DW), los cuales integran diferentes recursos de datos, principalmente bases de datos estructuradas. Sin embargo, una nueva tendencia a utilizar la Web como fuente de información sobre el entorno de las organizaciones ha emergido. Como parte de esta línea de investigación, estamos trabajando en la aplicación de un sistema de búsqueda de respuesta (Question Answering) como herramienta vinculante a los DW para la obtención de información que ayude en la toma de decisiones, continuando, de esta manera, con los avances obtenidos.Eje: Agentes y sistemas inteligentesRed de Universidades con Carreras en Informática (RedUNCI

Hacia una integración de un sistema de búsqueda de respuestas sobre la inteligencia empresarial mediante el uso de ontologías

Author: Ferrández Antonio
Peral Jesús
Roger Sandra
Publication venue
Publication date: 10/08/2012
Field of study

Servicio de Difusión de la Creación Intelectual

Sistemas multiagentes en ambientes dinámicos

Author: Braun Germán
Cecchi Laura
Kogan Pablo
Moya Mario
Parra Gerardo
Roger Sandra
Publication venue
Publication date: 01/05/2011
Field of study

La meta fundamental de este proyecto es el desarrollo de conocimiento especializado en el área de Inteligencia Artificial Distribuida, estudiando técnicas de representación del conocimiento y razonamiento, junto con métodos de planificación y tecnologías del lenguaje natural aplicadas al desarrollo de sistemas multiagentes. En la línea Planificación, la temática de investigación es el desarrollo de una arquitectura para agentes que soporte tanto control reactivo como deliberativo, de forma tal que el agente pueda actuar de manera competente y efectiva en un ambiente real. Uno de los objetivos de esta investigación es el intento de dotar a un agente inteligente de ambas capacidades. Esto brindará la posibilidad de elegir cuál sería la mejor forma de actuar frente a un problema determinado. Por otro lado, las otras líneas se basan en técnicas de procesamiento del lenguaje natural (PLN). La información textual disponible en la web podría ser categorizada en expresiones de hecho y de opinión. Las expresiones de hechos están relacionadas a entidades, eventos y sus propiedades. Por otro lado, las de opinión son usualmente expresiones subjetivas que describen algún sentimiento sobre las personas, valoraciones o sentimientos hacia las entidades, eventos y sus propiedades. Siguiendo con esto, cada línea de investigación, dentro del PLN, está orientada a tratar con una de estas categorías. Es así que la línea de Opinion Mining se centra en las expresiones de opinión. Mientras que la línea de investigación sobre la inteligencia empresarial (Business Intelligence), en esta primera etapa, está orientada a trabajar solamente con expresiones de hechos.Eje: Agentes y sistemas inteligentesRed de Universidades con Carreras en Informática (RedUNCI

Servicio de Difusión de la Creación Intelectual

Passage retrieval in legal texts

Author: Buscaldi Davide
Correa García Santiago
Rosso Paolo
Publication venue: 'Elsevier BV'
Publication date: 17/03/2011
Field of study

[EN] Legal texts usually comprise many kinds of texts, such as contracts, patents and treaties. These texts usually include a huge quantity of unstructured information written in natural language. Thanks to automatic analysis and Information Retrieval (IR) techniques, it is possible to filter out information that is not relevant and, therefore, to reduce the amount of documents that users need to browse to find the information they are looking for. In this paper we adapted the JIRS passage retrieval system to work with three kinds of legal texts: treaties, patents and contracts, studying the issues related with the processing of this kind of information. In particular, we studied how a passage retrieval system might be linked up to automated analysis based on logic and algebraic programming for the detection of conflicts in contracts. In our set-up, a contract is translated into formal clauses, which are analysed by means of a model checking tool; then, the passage retrieval system is used to extract conflicting sentences from the original contract text. © 2011 Elsevier Inc. All rights reserved.We thank the MICINN (Plan I+D+i) TEXT-ENTERPRISE 2.0: (TIN2009-13391-C04-03) research project. The work of the second author has been possible thanks to a scholarship funded by Maat Gknowledge in the framework of the project with the Universidad Politécnica de Valencia Módulo de servicios semánticos de la plataforma GRosso, P.; Correa García, S.; Buscaldi, D. (2011). Passage retrieval in legal texts. Journal of Logic and Algebraic Programming. 80(3-5):139-153. doi:10.1016/j.jlap.2011.02.001S139153803-

Elsevier - Publisher Connector

RiuNet

An Empirical Analysis of NMT-Derived Interlingual Embeddings and their Use in Parallel Sentence Identification

Author: Barrón-Cedeño Alberto
España-Bonet Cristina
van Genabith Josef
Varga Ádám Csaba
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/11/2017
Field of study

End-to-end neural machine translation has overtaken statistical machine translation in terms of translation quality for some language pairs, specially those with large amounts of parallel data. Besides this palpable improvement, neural networks provide several new properties. A single system can be trained to translate between many languages at almost no additional cost other than training time. Furthermore, internal representations learned by the network serve as a new semantic representation of words -or sentences- which, unlike standard word embeddings, are learned in an essentially bilingual or even multilingual context. In view of these properties, the contribution of the present work is two-fold. First, we systematically study the NMT context vectors, i.e. output of the encoder, and their power as an interlingua representation of a sentence. We assess their quality and effectiveness by measuring similarities across translations, as well as semantically related and semantically unrelated sentence pairs. Second, as extrinsic evaluation of the first point, we identify parallel sentences in comparable corpora, obtaining an F1=98.2% on data from a shared task when using only NMT context vectors. Using context vectors jointly with similarity measures F1 reaches 98.9%.Comment: 11 pages, 4 figure

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Web 2.0, Language Resources and standards to automatically build a multilingual Named Entity Lexicon

Author: A. Lenci
Antonio Toral
Aristotle
D. Lenat
G. A. Miller
H. Alshawi
I. H. Witten
J. Giles
J. M. Wiebe
J. Pustejovsky
M. A. Hearst
Monica Monachini
O. Etzioni
P. Vossen
Rafael Muñoz
S. P. Ponzetto
Sergio Ferrández
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Bootstrapping named entity resources for adaptive question answering systems

Author: Pablo Sánchez César de
Publication venue
Publication date: 22/10/2010
Field of study

Los Sistemas de Búsqueda de Respuestas (SBR) amplían las capacidades de un buscador de información tradicional con la capacidad de encontrar respuestas precisas a las preguntas del usuario. El objetivo principal es facilitar el acceso a la información y disminuir el tiempo y el esfuerzo que el usuario debe emplear para encontrar una información concreta en una lista de documentos relevantes. En esta investigación se han abordado dos trabajos relacionados con los SBR. La primera parte presenta una arquitectura para SBR en castellano basada en la combinación y adaptación de diferentes técnicas de Recuperación y de Extracción de Información. Esta arquitectura está integrada por tres módulos principales que incluyen el análisis de la pregunta, la recuperación de pasajes relevantes y la extracción y selección de respuestas. En ella se ha prestado especial atención al tratamiento de las Entidades Nombradas puesto que, con frecuencia, son el tema de las preguntas o son buenas candidatas como respuestas. La propuesta se ha encarnado en el SBR del grupo MIRACLE que ha sido evaluado de forma independiente durante varias ediciones en la tarea compartida CLEF@QA, parte del foro de evaluación competitiva Cross-Language Evaluation Forum (CLEF). Se describen aquí las participaciones y los resultados obtenidos entre 2004 y 2007. El SBR de MIRACLE ha obtenido resultados moderados en el desempeño de la tarea con tasas de respuestas correctas entre el 20% y el 30%. Entre los resultados obtenidos destacan los de la tarea principal de 2005 y la tarea piloto de Búsqueda de Respuestas en tiempo real de 2006, RealTimeQA. Esta última tarea, además de requerir respuestas correctas incluía el tiempo de respuesta como un factor adicional en la evaluación. Estos resultados respaldan la validez de la arquitectura propuesta como una alternativa viable para los SBR sobre colecciones textuales y también corrobora resultados similares para el inglés y otras lenguas. Por otro lado, el análisis de los resultados a lo largo de las diferentes ediciones de CLEF así como la comparación con otros SBR apunta nuevos problemas y retos. Según nuestra experiencia, los sistemas de QA son más complicados de adaptar a otros dominios y lenguas que los sistemas de Recuperación de Información. Este problema viene heredado del uso de herramientas complejas de análisis de lenguaje como analizadores morfológicos, sintácticos y semánticos. Entre estos últimos se cuentan las herramientas para el Reconocimiento y Clasificación de Entidades Nombradas (NERC en inglés) así como para la Detección y Clasificación de Relaciones (RDC en inglés). Debido a la di cultad de adaptación del SBR a distintos dominios y colecciones, en la segunda parte de esta tesis se investiga una propuesta diferente basada en la adquisición de conocimiento mediante métodos de aprendizaje ligeramente supervisado. El objetivo de esta investigación es adquirir recursos semánticos útiles para las tareas de NERC y RDC usando colecciones de textos no anotados. Además, se trata de eliminar la dependencia de herramientas de análisis lingüístico con el fin de facilitar que las técnicas sean portables a diferentes dominios e idiomas. En primer lugar, se ha realizado un estudio de diferentes algoritmos para NERC y RDC de forma semisupervisada a partir de unos pocos ejemplos (bootstrapping). Este trabajo propone primero una arquitectura común y compara diferentes funciones que se han usado en la evaluación y selección de resultados intermedios, tanto instancias como patrones. La principal propuesta es un nuevo algoritmo que permite la adquisición simultánea e iterativa de instancias y patrones asociados a una relación. Incluye también la posibilidad de adquirir varias relaciones de forma simultánea y mediante el uso de la hipótesis de exclusividad obtener mejores resultados. Como característica distintiva el algoritmo explora la colección de textos con una estrategia basada en indización, que permite adquirir conocimiento de grandes colecciones. La estrategia de selección de candidatos y la evaluación se basan en la construcción de un grafo de instancias y patrones, que justifica nuestro método para la selección de candidatos. Este procedimiento es semejante al frente de exploración de una araña web y permite encontrar las instancias más parecidas a las semillas con las evidencias disponibles. Este algoritmo se ha implementado en el sistema SPINDEL y para su evaluación se ha comenzado con el caso concreto de la adquisición de recursos para las clases de Entidades Nombradas más comunes, Persona, Lugar y Organización. El objetivo es adquirir nombres asociados a cada una de las categorías así como patrones contextuales que permitan detectar menciones asociadas a una clase. Se presentan resultados para la adquisición de dos idiomas distintos, castellano e inglés, y para el castellano, en dos dominios diferentes, noticias y textos de una enciclopedia colaborativa, Wikipedia. En ambos casos el uso de herramientas de análisis lingüístico se ha limitado de acuerdo con el objetivo de avanzar hacia la independencia de idioma. Las listas adquiridas mediante bootstrapping parten de menos de 40 semillas por clase y obtienen del orden de 30.000 instancias de calidad variable. Además se obtienen listas de patrones indicativos asociados a cada clase de entidad. La evaluación indirecta confirma la utilidad de ambos recursos en la clasificación de Entidades Nombradas usando un enfoque simple basado únicamente en diccionarios. La mejor configuración obtiene para la clasificación en castellano una medida F de 67,17 y para inglés de 55,99. Además se confirma la utilidad de los patrones adquiridos que en ambos casos ayudan a mejorar la cobertura. El módulo requiere menor esfuerzo de desarrollo que los enfoques supervisados, si incluimos la necesidad de anotación, aunque su rendimiento es inferior por el momento. En definitiva, esta investigación constituye un primer paso hacia el desarrollo de aplicaciones semánticas como los SBR que requieran menos esfuerzo de adaptación a un dominio o lenguaje nuevo.-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Question Answering (QA) systems add new capabilities to traditional search engines with the ability to find precise answers to user questions. Their objective is to enable easier information access by reducing the time and effort that the user requires to find a concrete information among a list of relevant documents. In this thesis we have carried out two works related with QA systems. The first part introduces an architecture for QA systems for Spanish which is based on the combination and adaptation of different techniques from Information Retrieval (IR) and Information Extraction (IE). This architecture is composed by three modules that include question analysis, relevant passage retrieval and answer extraction and selection. The appropriate processing of Named Entities (NE) has received special attention because of their importance as question themes and candidate answers. The proposed architecture has been implemented as part of the MIRACLE QA system. This system has taken part in independent evaluations like the CLEF@QA track in the Cross-Language Evaluation Forum (CLEF). Results from 2004 to 2007 campaigns as well as the details and the evolution of the system have been described in deep. The MIRACLE QA system has obtained moderate performance with a first answer accuracy ranging between 20% and 30%. Nevertheless, it is important to highlight the results obtained in the 2005 main QA task and the RealTimeQA pilot task in 2006. The last one included response time as an important additional variable of the evaluation. These results back the proposed architecture as an option for QA from textual collection and confirm similar findings obtained for English and other languages. On the other hand, the analysis of the results along evaluation campaigns and the comparison with other QA systems point problems with current systems and new challenges. According to our experience, it is more dificult to tailor QA systems to different domains and languages than IR systems. The problem is inherited by the use of complex language analysis tools like POS taggers, parsers and other semantic analyzers, like NE Recognition and Classification (NERC) and Relation Detection and Characterization (RDC) tools. The second part of this thesis tackles this problem and proposes a different approach to adapting QA systems for di erent languages and collections. The proposal focuses on acquiring knowledge for the semantic analyzers based on lightly supervised approaches. The goal is to obtain useful resources that help to perform NERC or RDC using as few annotated resources as possible. Besides, we try to avoid dependencies from other language analysis tools with the purpose that these methods apply to different languages and domains. First of all, we have study previous work on building NERC and RDC modules with few supervision, particularly bootstrapping methods. We propose a common framework for different bootstrapping systems that help to unify different evaluation functions for intermediate results. The main proposal is a new algorithm that is able to simultaneously acquire instances and patterns associated to a relation of interest. It also uses mutual exclusion among relations to reduce concept drift and achieve better results. A distinctive characteristic is that it uses a query based exploration strategy of the text collection which enables their use for larger collections. Candidate selection and evaluation are based on incrementally building a graph of instances and patterns which also justifies our evaluation function. The discovery approach is analogous to the front of exploration in a web crawler and it is able to find the most similar instances to the available seeds. This algorithm has been implemented in the SPINDEL system. We have selected for evaluation the task of acquiring resources for the most common NE classes, Person, Location and Organization. The objective is to acquire name instances that belong to any of the classes as well as contextual patterns that help to detect mentions of NE that belong to that class. We present results for the acquisition of resources from raw text from two different languages, Spanish and English. We also performed experiments for Spanish in two different collections, news and texts from a collaborative encyclopedia, Wikipedia. Both cases are tackled with limited language analysis tools and resources. With an initial list of 40 instance seeds, the bootstrapping process is able to acquire large name lists containing up to 30.000 instances with a variable quality. Besides, large lists of indicative patterns are obtained too. Our indirect evaluation confirms the utility of both resources to classify NE using a simple dictionary recognition approach. Best results for Spanish obtained a F-score of 67,17 and for English this value is 55,99. The module requires much less development effort than annotation for supervised algorithms although the performance is not in pair yet. This research is a first step towards the development of semantic applications like QA for a new language or domain with no annotated corpora that requires less adaptation effort

Universidad Carlos III de Madrid e-Archivo

A Review of the Analytics Techniques for an Efficient Management of Online Forums: An Architecture Proposal

Author: Ferrández Antonio
Gil David
Kauffmann Erick
Mora Higinio
Peral Jesús
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

E-learning is a response to the new educational needs of society and an important development in information and communication technologies because it represents the future of the teaching and learning processes. However, this trend presents many challenges, such as the processing of online forums which generate a huge number of messages with an unordered structure and a great variety of topics. These forums provide an excellent platform for learning and connecting students of a subject but the difficulty of following and searching the vast volume of information that they generate may be counterproductive. The main goal of this paper is to review the approaches and techniques related to online courses in order to present a set of learning analytics techniques and a general architecture that solve the main challenges found in the state of the art by managing them in a more efficient way: 1) efficient tracking and monitoring of forums generated; 2) design of effective search mechanisms for questions and answers in the forums; and 3) extraction of relevant key performance indicators with the objective of carrying out an efficient management of online forums. In our proposal, natural language processing, clustering, information retrieval, question answering, and data mining techniques will be used.This work was supported in part by the Spanish Ministry of Economy and Competitiveness through the Project SEQUOIA-UA under Grant TIN2015-63502-C3-3-R, the Project RESCATA under Grant TIN2015-65100-R, and the Project PROMETEO/2018/089, and in part by the Spanish Research Agency (AEI) and the European Regional Development Fund (FEDER) through the Project CloudDriver4Industry under Grant TIN2017-89266-R

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

AliQAn, Spanish QA system at multilingual QA@CLEF-2008

Author: Ferrández Antonio
Gómez José M.
Martínez-Barco Patricio
Muñoz Terol Rafael
Pardiño Juan María
Peral Jesús
Puchol Blasco Marcel
Roger Calzetti Sandra Emilce
Vila Rodríguez Katia
Publication venue
Publication date: 01/01/2008
Field of study

Comunicación presentada en Cross-Language Evaluation Forum (CLEF 2008), Aarhus, Denmark, September 17-19, 2008.In QA@CLEF 2008, we participate in monolingual (Spanish) and multilingual (English - Spanish) tasks. Specifically, in this paper, we will tackle with the English - Spanish QA task. In this edition we will deal with two main problems: an heterogeneous document collection (news articles and Wikipedia) and a large number of topic-related questions, which make somewhat difficult our participation. We want to highlight in the translation module in our system two possible mechanisms: one based on logic forms, and the other, on machine translation techniques. In addition, it has also been used a system of anaphora resolution that it is described below and a QA System, AliQAn (also used this year in the monolingual task)

Repositorio Institucional de la Universidad de Alicante