446 research outputs found
Dublin City University at QA@CLEF 2008
We describe our participation in Multilingual Question Answering at CLEF 2008 using German and English as our source and target languages respectively. The system was built using UIMA (Unstructured Information Management Architecture) as underlying framework
A Novel Multidimensional Reference Model For Heterogeneous Textual Datasets Using Context, Semantic And Syntactic Clues
With the advent of technology and use of latest devices, they produces voluminous data. Out of it, 80% of the data are unstructured and remaining 20% are structured and semi-structured. The produced data are in heterogeneous format and without following any standards. Among heterogeneous (structured, semi-structured and unstructured) data, textual data are nowadays used by industries for prediction and visualization of future challenges. Extracting useful information from it is really challenging for stakeholders due to lexical and semantic matching. Few studies have been solving this issue by using ontologies and semantic tools, but the main limitations of proposed work were the less coverage of multidimensional terms. To solve this problem, this study aims to produce a novel multidimensional reference model using linguistics categories for heterogeneous textual datasets. The categories such context, semantic and syntactic clues are focused along with their score. The main contribution of MRM is that it checks each tokens with each term based on indexing of linguistic categories such as synonym, antonym, formal, lexical word order and co-occurrence. The experiments show that the percentage of MRM is better than the state-of-the-art single dimension reference model in terms of more coverage, linguistics categories and heterogeneous datasets
A Novel Multidimensional Reference Model For Heterogeneous Textual Datasets Using Context, Semantic And Syntactic Clues
With the advent of technology and use of latest devices, they produces
voluminous data. Out of it, 80% of the data are unstructured and remaining 20%
are structured and semi-structured. The produced data are in heterogeneous
format and without following any standards. Among heterogeneous (structured,
semi-structured and unstructured) data, textual data are nowadays used by
industries for prediction and visualization of future challenges. Extracting
useful information from it is really challenging for stakeholders due to
lexical and semantic matching. Few studies have been solving this issue by
using ontologies and semantic tools, but the main limitations of proposed work
were the less coverage of multidimensional terms. To solve this problem, this
study aims to produce a novel multidimensional reference model using
linguistics categories for heterogeneous textual datasets. The categories such
context, semantic and syntactic clues are focused along with their score. The
main contribution of MRM is that it checks each tokens with each term based on
indexing of linguistic categories such as synonym, antonym, formal, lexical
word order and co-occurrence. The experiments show that the percentage of MRM
is better than the state-of-the-art single dimension reference model in terms
of more coverage, linguistics categories and heterogeneous datasets.Comment: International Journal of Advanced Science and Applications, Volume
14, Issue 10, pp. 754-763, 202
A toolkit for a generative lexicon
In this paper we describe the conception of a software toolkit designed for
the construction, maintenance and collaborative use of a Generative Lexicon. In
order to ease its portability and spreading use, this tool was built with free
and open source products. We eventually tested the toolkit and showed it
filters the adequate form of anaphoric reference to the modifier in endocentric
compounds.Comment: poster - 6 page
- âŠ