25 research outputs found
EDBL: a General Lexical Basis for the Automatic Processing of Basque
EDBL (Euskararen Datu-Base Lexikala) is a general-purpose lexical database used in Basque text-processing tasks. It is a large repository of lexical knowledge (currently around 80,000 entries) that acts as basis and support in a number of different NLP tasks, thus providing lexical information for several language tools: morphological analysis, spell checking and correction, lemmatization and tagging, syntactic analysis, and so on. It has been designed to be neutral in relation to the different linguistic formalisms, and flexible and open enough to accept new types of information. A browser-based user interface makes the job of consulting the database, correcting and updating entries, adding new ones, etc. easy to the lexicographer. The paper presents the conceptual schema and the main features of the database, along with some problems encountered in its design and implementation in a commercial DBMS. Given the diversity of the lexical entities and the complex relationships existing among them, three total specializations have been defined under the main class of the hierarchy that represents the conceptual schema. The first one divides all the entries in EDBL into Basque standard and non-standard entries. The second divides the units in the database into dictionary entries (classified into the different parts-of-speech) and other entries (mainly non-independent morphemes and irregularly inflected forms). Finally, another total specialization has been established between single-word entries and multiword lexical units; this permits us to describe the morphotactics of single-word entries, and the constitution and surface realization schemas of multiword lexical units.A hierarchy of typed feature structures (FS) has been designed to map the entities and relationships in the database conceptual schema. The FSs are coded in TEI-conformant SGML, and Feature Structure Declarations (FSD) have been made for all the types of the hierarchy. Feature structures are used as a delivery format to export the lexical information from the database. The information coded in this way is subsequently used as input by the different language analysis tools
Recommended from our members
Multilingual audio information management system based on semantic knowledge in complex environments
AbstractThis paper proposes a multilingual audio information management system based on semantic knowledge in complex environments. The complex environment is defined by the limited resources (financial, material, human, and audio resources); the poor quality of the audio signal taken from an internet radio channel; the multilingual context (Spanish, French, and Basque that is in under-resourced situation in some areas); and the regular appearance of cross-lingual elements between the three languages. In addition to this, the system is also constrained by the requirements of the local multilingual industrial sector. We present the first evolutionary system based on a scalable architecture that is able to fulfill these specifications with automatic adaptation based on automatic semantic speech recognition, folksonomies, automatic configuration selection, machine learning, neural computing methodologies, and collaborative networks. As a result, it can be said that the initial goals have been accomplished and the usability of the final application has been tested successfully, even with non-experienced users.</jats:p
Recommended from our members
Multilingual audio information management system based on semantic knowledge in complex environments
AbstractThis paper proposes a multilingual audio information management system based on semantic knowledge in complex environments. The complex environment is defined by the limited resources (financial, material, human, and audio resources); the poor quality of the audio signal taken from an internet radio channel; the multilingual context (Spanish, French, and Basque that is in under-resourced situation in some areas); and the regular appearance of cross-lingual elements between the three languages. In addition to this, the system is also constrained by the requirements of the local multilingual industrial sector. We present the first evolutionary system based on a scalable architecture that is able to fulfill these specifications with automatic adaptation based on automatic semantic speech recognition, folksonomies, automatic configuration selection, machine learning, neural computing methodologies, and collaborative networks. As a result, it can be said that the initial goals have been accomplished and the usability of the final application has been tested successfully, even with non-experienced users.</jats:p
Alzheimer Disease Diagnosis based on Automatic Spontaneous Speech Analysis
Alzheimer’s disease (AD) is the most prevalent form of progressive degenerative
dementia and it has a high socio-economic impact in Western countries, therefore is
one of the most active research areas today. Its diagnosis is sometimes made by excluding
other dementias, and definitive confirmation must be done trough a post-mortem
study of the brain tissue of the patient. The purpose of this paper is to contribute to improvement
of early diagnosis of AD and its degree of severity, from an automatic analysis
performed by non-invasive intelligent methods. The methods selected in this case are
Automatic Spontaneous Speech Analysis (ASSA) and Emotional Temperature (ET), that
have the great advantage of being non invasive, low cost and without any side effects
Multilingual audio information management system based on semantic knowledge in complex environments
This paper proposes a multilingual audio information management system based on semantic knowledge in complex environments. The complex environment is defined by the limited resources (financial, material, human, and audio resources); the poor quality of the audio signal taken from an internet radio channel; the multilingual context (Spanish, French, and Basque that is in under-resourced situation in some areas); and the regular appearance of cross-lingual elements between the three languages. In addition to this, the system is also constrained by the requirements of the local multilingual industrial sector. We present the first evolutionary system based on a scalable architecture that is able to fulfill these specifications with automatic adaptation based on automatic semantic speech recognition, folksonomies, automatic configuration selection, machine learning, neural computing methodologies, and collaborative networks. As a result, it can be said that the initial goals have been accomplished and the usability of the final application has been tested successfully, even with non-experienced users.This work is being funded by Grants: TEC201677791-C4 from Plan Nacional de I + D + i, Ministry of Economic Affairs and Competitiveness of Spain and from the DomusVi Foundation Kms para recorder, the Basque Government (ELKARTEK KK-2018/00114, GEJ IT1189-19, the Government of Gipuzkoa (DG18/14 DG17/16), UPV/EHU (GIU19/090), COST ACTION (CA18106, CA15225)
TweetLID : a benchmark for tweet language identification
Language identification, as the task of determining the language a given text is written in, has progressed substantially in recent decades. However, three main issues remain still unresolved: (1) distinction of similar languages, (2) detection of multilingualism in a single document, and (3) identifying the language of short texts. In this paper, we describe our work on the development of a benchmark to encourage further research in these three directions, set forth an evaluation framework suitable for the task, and make a dataset of annotated tweets publicly available for research purposes. We also describe the shared task we organized to validate and assess the evaluation framework and dataset with systems submitted by seven different participants, and analyze the performance of these systems. The evaluation of the results submitted by the participants of the shared task helped us shed some light on the shortcomings of state-of-the-art language identification systems, and gives insight into the extent to which the brevity, multilingualism, and language similarity found in texts exacerbate the performance of language identifiers. Our dataset with nearly 35,000 tweets and the evaluation framework provide researchers and practitioners with suitable resources to further study the aforementioned issues on language identification within a common setting that enables to compare results with one another
ZMC 211-3 - KAEDAH MATEMATIK II MAC-APRIL 1989.pdf
The work presented here is part of a larger study to identify novel technologies
and biomarkers for early Alzheimer disease (AD) detection and it focuses on evaluating the
suitability of a new approach for early AD diagnosis by non-invasive methods. The
purpose is to examine in a pilot study the potential of applying intelligent algorithms to
speech features obtained from suspected patients in order to contribute to the improvement
of diagnosis of AD and its degree of severity. In this sense, Artificial Neural Networks
(ANN) have been used for the automatic classification of the two classes (AD and control subjects). Two human issues have been analyzed for feature selection: Spontaneous Speech
and Emotional Response. Not only linear features but also non-linear ones, such as Fractal
Dimension, have been explored. The approach is non invasive, low cost and without any
side effects. Obtained experimental results were very satisfactory and promising for early
diagnosis and classification of AD patients
Feature selection for spontaneous speech analysis to aid in Alzheimer’s disease diagnosis: A fractal dimension approach
Alzheimer’s disease (AD) is the most prevalent form of degenerative dementia; it has a high socio-economic impact in Westerncountries. The purpose of our project is to contribute to earlier diagnosis of AD and allow better estimates of its severity by usingautomatic analysis performed through new biomarkers extracted through non-invasive intelligent methods. The method selectedis based on speech biomarkers derived from the analysis of spontaneous speech (SS). Thus the main goal of the present work isfeature search in SS, aiming at pre-clinical evaluation whose results can be used to select appropriate tests for AD diagnosis. Thefeature set employed in our earlier work offered some hopeful conclusions but failed to capture the nonlinear dynamics of speechthat are present in the speech waveforms. The extra information provided by the nonlinear features could be especially useful whentraining data is limited. In this work, the fractal dimension (FD) of the observed time series is combined with linear parameters inthe feature vector in order to enhance the performance of the original system while controlling the computational cost.© 2014 Elsevier Ltd. All rights reserved