296 research outputs found
From Commands to Goal-based Dialogs: A Roadmap to Achieve Natural Language Interaction in RoboCup@Home
On the one hand, speech is a key aspect to people's communication. On the
other, it is widely acknowledged that language proficiency is related to
intelligence. Therefore, intelligent robots should be able to understand, at
least, people's orders within their application domain. These insights are not
new in RoboCup@Home, but we lack of a long-term plan to evaluate this approach.
In this paper we conduct a brief review of the achievements on automated speech
recognition and natural language understanding in RoboCup@Home. Furthermore, we
discuss main challenges to tackle in spoken human-robot interaction within the
scope of this competition. Finally, we contribute by presenting a pipelined
road map to engender research in the area of natural language understanding
applied to domestic service robotics.Comment: 12 pages, 2 tables, 1 figure. Accepted and presented (poster) in the
RoboCup 2018 Symposium. In pres
Natural Language Processing for Under-resourced Languages: Developing a Welsh Natural Language Toolkit
Language technology is becoming increasingly important across a variety of application domains which have become common place in large, well-resourced languages. However, there is a danger that small, under-resourced languages are being increasingly pushed to the technological margins. Under-resourced languages face significant challenges in delivering the underlying language resources necessary to support such applications. This paper describes the development of a natural language processing toolkit for an under-resourced language, Cymraeg (Welsh). Rather than creating the Welsh Natural Language Toolkit (WNLT) from scratch, the approach involved adapting and enhancing the language processing functionality provided for other languages within an existing framework and making use of external language resources where available. This paper begins by introducing the GATE NLP framework, which was used as the development platform for the WNLT. It then describes each of the core modules of the WNLT in turn, detailing the extensions and adaptations required for Welsh language processing. An evaluation of the WNLT is then reported. Following this, two demonstration applications are presented. The first is a simple text mining application that analyses wedding announcements. The second describes the development of a Twitter NLP application, which extends the core WNLT pipeline. As a relatively small-scale project, the WNLT makes use of existing external language resources where possible, rather than creating new resources. This approach of adaptation and reuse can provide a practical and achievable route to developing language resources for under-resourced languages
Combining automatic speech recognition with semantic natural language processing in schizophrenia
Natural language processing (NLP) tools are increasingly used to quantify semantic anomalies in schizophrenia. Automatic speech recognition (ASR) technology, if robust enough, could significantly speed up the NLP research process. In this study, we assessed the performance of a state-of-the-art ASR tool and its impact on diagnostic classification accuracy based on a NLP model. We compared ASR to human transcripts quantitatively (Word Error Rate (WER)) and qualitatively by analyzing error type and position. Subsequently, we evaluated the impact of ASR on classification accuracy using semantic similarity measures. Two random forest classifiers were trained with similarity measures derived from automatic and manual transcriptions, and their performance was compared. The ASR tool had a mean WER of 30.4%. Pronouns and words in sentence-final position had the highest WERs. The classification accuracy was 76.7% (sensitivity 70%; specificity 86%) using automated transcriptions and 79.8% (sensitivity 75%; specificity 86%) for manual transcriptions. The difference in performance between the models was not significant. These findings demonstrate that using ASR for semantic analysis is associated with only a small decrease in accuracy in classifying schizophrenia, compared to manual transcripts. Thus, combining ASR technology with semantic NLP models qualifies as a robust and efficient method for diagnosing schizophrenia.</p
GailBot: An automatic transcription system for Conversation Analysis
Researchers studying human interaction, such as conversation analysts, psychologists, and linguists, all rely on detailed transcriptions of language use. Ideally, these should include so-called paralinguistic features of talk, such as overlaps, prosody, and intonation, as they convey important information. However, creating conversational transcripts that include these features by hand requires substantial amounts of time by trained transcribers. There are currently no Speech to Text (STT) systems that are able to integrate these features in the generated transcript. To reduce the resources needed to create detailed conversation transcripts that include representation of paralinguistic features, we developed a program called GailBot. GailBot combines STT services with plugins to automatically generate first drafts of transcripts that largely follow the transcription standards common in the field of Conversation Analysis. It also enables researchers to add new plugins to transcribe additional features, or to improve the plugins it currently uses. We describe GailBot’s architecture and its use of computational heuristics and machine learning. We also evaluate its output in relation to transcripts produced by both human transcribers and comparable automated transcription systems. We argue that despite its limitations, GailBot represents a substantial improvement over existing dialogue transcription software
Five sources of bias in natural language processing
Recently, there has been an increased interest in demographically grounded bias in natural language processing (NLP) applications. Much of the recent work has focused on describing bias and providing an overview of bias in a larger context. Here, we provide a simple, actionable summary of this recent work. We outline five sources where bias can occur in NLP systems: (1) the data, (2) the annotation process, (3) the input representations, (4) the models, and finally (5) the research design (or how we conceptualize our research). We explore each of the bias sources in detail in this article, including examples and links to related work, as well as potential counter-measures
FrameNet annotation for multimodal corpora: devising a methodology for the semantic representation of text-image interactions in audiovisual productions
Multimodal analyses have been growing in importance within several approaches to
Cognitive Linguistics and applied fields such as Natural Language Understanding. Nonetheless
fine-grained semantic representations of multimodal objects are still lacking, especially in terms
of integrating areas such as Natural Language Processing and Computer Vision, which are key
for the implementation of multimodality in Computational Linguistics. In this dissertation, we
propose a methodology for extending FrameNet annotation to the multimodal domain, since
FrameNet can provide fine-grained semantic representations, particularly with a database
enriched by Qualia and other interframal and intraframal relations, as it is the case of FrameNet
Brasil. To make FrameNet Brasil able to conduct multimodal analysis, we outlined the
hypothesis that similarly to the way in which words in a sentence evoke frames and organize
their elements in the syntactic locality accompanying them, visual elements in video shots may,
also, evoke frames and organize their elements on the screen or work complementarily with the
frame evocation patterns of the sentences narrated simultaneously to their appearance on screen,
providing different profiling and perspective options for meaning construction. The corpus
annotated for testing the hypothesis is composed of episodes of a Brazilian TV Travel Series
critically acclaimed as an exemplar of good practices in audiovisual composition. The TV genre
chosen also configures a novel experimental setting for research on integrated image and text
comprehension, since, in this corpus, text is not a direct description of the image sequence but
correlates with it indirectly in a myriad of ways. The dissertation also reports on an eye-tracker
experiment conducted to validate the approach proposed to a text-oriented annotation. The
experiment demonstrated that it is not possible to determine that text impacts gaze directly and
was taken as a reinforcement to the approach of valorizing modes combination. Last, we present
the Frame2 dataset, the product of the annotation task carried out for the corpus following both
the methodology and guidelines proposed. The results achieved demonstrate that, at least for
this TV genre but possibly also for others, a fine-grained semantic annotation tackling the
diverse correlations that take place in a multimodal setting provides new perspective in
multimodal comprehension modeling. Moreover, multimodal annotation also enriches the
development of FrameNets, to the extent that correlations found between modalities can attest
the modeling choices made by those building frame-based resources.Análises multimodais vêm crescendo em importância em várias abordagens da
Linguística Cognitiva e em diversas áreas de aplicação, como o da Compreensão de Linguagem
Natural. No entanto, há significativa carência de representações semânticas refinadas de objetos
multimodais, especialmente em termos de integração de áreas como Processamento de
Linguagem Natural e Visão Computacional, que são fundamentais para a implementação de
multimodalidade no campo da Linguística Computacional. Nesta tese, propomos uma
metodologia para estender o método de anotação da FrameNet ao domínio multimodal, uma
vez que a FrameNet pode fornecer representações semânticas refinadas, particularmente com
um banco de dados enriquecido por Qualia e outras relações interframe e intraframe, como é o
caso do FrameNet Brasil. Para tornar a FrameNet Brasil capaz de realizar análises multimodais,
delineamos a hipótese de que, assim como as palavras em uma frase evocam frames e
organizam seus elementos na localidade sintática que os acompanha, os elementos visuais nos
planos de vídeo também podem evocar frames e organizar seus elementos na tela ou trabalhar
de forma complementar aos padrões de evocação de frames das sentenças narradas
simultaneamente ao seu aparecimento na tela, proporcionando diferentes perfis e opções de
perspectiva para a construção de sentido. O corpus anotado para testar a hipótese é composto
por episódios de um programa televisivo de viagens brasileiro aclamado pela crítica como um
exemplo de boas práticas em composição audiovisual. O gênero televisivo escolhido também
configura um novo conjunto experimental para a pesquisa em imagem integrada e compreensão
textual, uma vez que, neste corpus, o texto não é uma descrição direta da sequência de imagens,
mas se correlaciona com ela indiretamente em uma miríade de formas diversa. A Tese também
relata um experimento de rastreamento ocular realizado para validar a abordagem proposta para
uma anotação orientada por texto. O experimento demonstrou que não é possível determinar
que o texto impacta diretamente o direcionamento do olhar e foi tomado como um reforço para
a abordagem de valorização da combinação de modos. Por fim, apresentamos o conjunto de
dados Frame2, produto da tarefa de anotação realizada para o corpus seguindo a metodologia e
as diretrizes propostas. Os resultados obtidos demonstram que, pelo menos para esse gênero de
TV, mas possivelmente também para outros, uma anotação semântica refinada que aborde as
diversas correlações que ocorrem em um ambiente multimodal oferece uma nova perspectiva
na modelagem da compreensão multimodal. Além disso, a anotação multimodal também
enriquece o desenvolvimento de FrameNets, na medida em que as correlações encontradas entre
as modalidades podem atestar as escolhas de modelagem feitas por aqueles que criam recursos
baseados em frames.CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superio
Topic modelling of Finnish Internet discussion forums as a tool for trend identification and marketing applications
The increasing availability of public discussion text data on the Internet motivates to study methods to identify current themes and trends. Being able to extract and summarize relevant information from public data in real time gives rise to competitive advantage and applications in the marketing actions of a company. This thesis presents a method of topic modelling and trend identification to extract information from Finnish Internet discussion forums.
The development of text analytics, and especially topic modelling techniques, is reviewed and suitable methods are identified from the literature. The Latent Dirichlet Allocation topic model and the Dynamic Topic Model are applied in finding underlying topics from the Internet discussion forum data. The discussion data collection with web scarping and text data preprocessing methods are presented. Trends are identified with a method derived from outlier detection.
Real world events, such as the news about Finnish army vegetarian meal day and the Helsinki summit of presidents Trump and Putin, were identified in an unsupervised manner. Applications for marketing are considered, e.g. automatic search engine advert keyword generation and website content recommendation. Future prospects for further improving the developed topical trend identification method are proposed. This includes the use of more complex topic models, extensive framework for tuning trend identification parameters and studying the use of more domain specific text data sources such as blogs, social media feeds or customer feedback
- …