1,700 research outputs found
Thematic Annotation: extracting concepts out of documents
Contrarily to standard approaches to topic annotation, the technique used in
this work does not centrally rely on some sort of -- possibly statistical --
keyword extraction. In fact, the proposed annotation algorithm uses a large
scale semantic database -- the EDR Electronic Dictionary -- that provides a
concept hierarchy based on hyponym and hypernym relations. This concept
hierarchy is used to generate a synthetic representation of the document by
aggregating the words present in topically homogeneous document segments into a
set of concepts best preserving the document's content.
This new extraction technique uses an unexplored approach to topic selection.
Instead of using semantic similarity measures based on a semantic resource, the
later is processed to extract the part of the conceptual hierarchy relevant to
the document content. Then this conceptual hierarchy is searched to extract the
most relevant set of concepts to represent the topics discussed in the
document. Notice that this algorithm is able to extract generic concepts that
are not directly present in the document.Comment: Technical report EPFL/LIA. 81 pages, 16 figure
What Makes Delusions Pathological?
Bortolotti argues that we cannot distinguish delusions from other irrational beliefs in virtue of their epistemic features alone. Although her arguments are convincing, her analysis leaves an important question unanswered: What makes delusions pathological? In this paper I set out to answer this question by arguing that the pathological character of delusions arises from an executive dysfunction in a subject’s ability to detect relevance in the environment. I further suggest that this dysfunction derives from an underlying emotional imbalance—one that leads delusional subjects to regard some contextual elements as deeply puzzling or highly significant
A real time Named Entity Recognition system for Arabic text mining
Arabic is the most widely spoken language in the Arab World. Most people of the Islamic World understand the Classic Arabic language because it is the language of the Qur'an. Despite the fact that in the last decade the number of Arabic Internet users (Middle East and North and East of Africa) has increased considerably, systems to analyze Arabic digital resources automatically are not as easily available as they are for English. Therefore, in this work, an attempt is made to build a real time Named Entity Recognition system that can be used in web applications to detect the appearance of specific named entities and events in news written in Arabic. Arabic is a highly inflectional language, thus we will try to minimize the impact of Arabic affixes on the quality of the pattern recognition model applied to identify named entities. These patterns are built up by processing and integrating different gazetteers, from DBPedia (http://dbpedia.org/About, 2009) to GATE (A general architecture for text engineering, 2009) and ANERGazet (http://users.dsic.upv.es/grupos/nle/?file=kop4.php).This work has been partially supported by the Spanish Center for Industry
Technological Development (CDTI, Ministry of Industry, Tourism and Trade), through the BUSCAMEDIA
Project (CEN-20091026), and also by the Spanish research projects: MA2VICMR: Improving the
access, analysis and visibility of the multilingual and multimedia information in web for the Region of
Madrid (S2009/TIC-1542), and MULTIMEDICA: Multilingual Information Extraction in Health domain
and application to scientific and informative documents (TIN2010-20644-C03-01). The authors would like
also to thank the IPSC of the European Commission’s Joint Research Centre for allowing us to include the
EMM search engine in our system.Publicad
Deep Learning and Music Adversaries
OA Monitor ExerciseOA Monitor ExerciseAn {\em adversary} is essentially an algorithm intent on making a classification system perform in some particular way given an input, e.g., increase the probability of a false negative. Recent work builds adversaries for deep learning systems applied to image object recognition, which exploits the parameters of the system to find the minimal perturbation of the input image such that the network misclassifies it with high confidence. We adapt this approach to construct and deploy an adversary of deep learning systems applied to music content analysis. In our case, however, the input to the systems is magnitude spectral frames, which requires special care in order to produce valid input audio signals from network-derived perturbations. For two different train-test partitionings of two benchmark datasets, and two different deep architectures, we find that this adversary is very effective in defeating the resulting systems. We find the convolutional networks are more robust, however, compared with systems based on a majority vote over individually classified audio frames. Furthermore, we integrate the adversary into the training of new deep systems, but do not find that this improves their resilience against the same adversary
Automatic Understanding of ATC Speech: Study of Prospectives and Field Experiments for Several Controller Positions
Although there has been a lot of interest in recognizing and understanding air traffic control (ATC) speech, none of the published works have obtained detailed field data results. We have developed a system able to identify the language spoken and recognize and understand sentences in both Spanish and English. We also present field results for several in-tower controller positions. To the best of our knowledge, this is the first time that field ATC speech (not simulated) is captured, processed, and analyzed. The use of stochastic grammars allows variations in the standard phraseology that appear in field data. The robust understanding algorithm developed has 95% concept accuracy from ATC text input. It also allows changes in the presentation order of the concepts and the correction of errors created by the speech recognition engine improving it by 17% and 25%, respectively, absolute in the percentage of fully correctly understood sentences for English and Spanish in relation to the percentages of fully correctly recognized sentences. The analysis of errors due to the spontaneity of the speech and its comparison to read speech is also carried out. A 96% word accuracy for read speech is reduced to 86% word accuracy for field ATC data for Spanish for the "clearances" task confirming that field data is needed to estimate the performance of a system. A literature review and a critical discussion on the possibilities of speech recognition and understanding technology applied to ATC speech are also given
EVALITA Evaluation of NLP and Speech Tools for Italian Proceedings of the Final Workshop
Editor of the proceedings of EVALITA 2016
Personalizing Human-Robot Dialogue Interactions using Face and Name Recognition
Task-oriented dialogue systems are computer systems that aim to provide an interaction
indistinguishable from ordinary human conversation with the goal of completing user-
defined tasks. They are achieving this by analyzing the intents of users and choosing
respective responses. Recent studies show that by personalizing the conversations with
this systems one can positevely affect their perception and long-term acceptance.
Personalised social robots have been widely applied in different fields to provide assistance.
In this thesis we are working on development of a scientific conference assistant. The goal
of this assistant is to provide the conference participants with conference information and
inform about the activities for their spare time during conference. Moreover, to increase
the engagement with the robot our team has worked on personalizing the human-robot
interaction by means of face and name recognition.
To achieve this personalisation, first the name recognition ability of available physical
robot was improved, next by the concent of the participants their pictures were taken
and used for memorization of returning users. As acquiring the consent for personal data
storage is not an optimal solution, an alternative method for participants recognition
using QR Codes on their badges was developed and compared to pre-trained model in
terms of speed. Lastly, the personal details of each participant, as unviversity, country of
origin, was acquired prior to conference or during the conversation and used in dialogues.
The developed robot, called DAGFINN was displayed at two conferences happened this
year in Stavanger, where the first time installment did not involve personalization feature.
Hence, we conclude this thesis by discussing the influence of personalisation on dialogues
with the robot and participants satisfaction with developed social robot
- …