33,011 research outputs found
Estudios acerca del establecimiento de conexiones entre enunciados hablados: ¿qué pueden contribuir a la promoción de la construcción de una representación coherente del discurso por parte de los estudiantes?
The aim of this article is to provide an overview of how the establishment of discourse connections among spoken statements has been studied by approaches to discourse analysis and psycholinguistic studies, in order to highlight what variables appear to be important for understanding how comprehension of spoken discourse can be facilitated. The consideration of discourse analysis approaches allows us to think about the role of the establishment of discourse connections among speech acts in the classroom, the uses of contextualization cues by bilingual students, the identification of social and cultural notions in teachers’ discourse, and the interactional effects of teachers’ interventions. Preliminary psycholinguistic studies contribute to our understanding of the role of establishing causal connections and integrating adjacent statements through the presence of discourse markers in the comprehension of spoken discourse by college students. The results of these approaches and studies provide insight into students’ comprehension of classroom discourse, and hold the potential for implications for instruction.El propósito de este artículo es realizar un recorrido a través de enfoques de análisis del discurso y estudios de psicolingüística que han investigado el establecimiento de conexiones entre enunciados hablados, a fin de destacar las variables que parecen ser centrales para facilitar la comprensión. La consideración de los enfoques del análisis del discurso nos permitirán pensar acerca del rol del establecimiento de conexiones entre actos del lenguaje en el aula, las funciones de las claves de contextualización, la identificación de las nociones sociales y culturales en el discurso de los profesores, los efectos de las intervenciones de los profesores en la interacción con los estudiantes. Los estudios preliminares de psicolingüística contribuirán a nuestra comprensión del rol del establecimiento de conexiones causales e integración de enunciados adyacentes a través de marcadores del discurso por parte de estudiantes universitarios. La consideración de estos enfoques y estudios nos ayudarán a pensar acerca de las contribuciones que sus propuestas y métodos pueden hacer al enriquecimiento de nuestro entendimiento de cómo los estudiantes comprenden el discurso producido durante las clases.Fil: Yomha Cevasco, Jazmin. Universidad de Buenos Aires; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Broek, Paul van den. Leiden University; Países Bajo
Towards a Knowledge Graph based Speech Interface
Applications which use human speech as an input require a speech interface
with high recognition accuracy. The words or phrases in the recognised text are
annotated with a machine-understandable meaning and linked to knowledge graphs
for further processing by the target application. These semantic annotations of
recognised words can be represented as a subject-predicate-object triples which
collectively form a graph often referred to as a knowledge graph. This type of
knowledge representation facilitates to use speech interfaces with any spoken
input application, since the information is represented in logical, semantic
form, retrieving and storing can be followed using any web standard query
languages. In this work, we develop a methodology for linking speech input to
knowledge graphs and study the impact of recognition errors in the overall
process. We show that for a corpus with lower WER, the annotation and linking
of entities to the DBpedia knowledge graph is considerable. DBpedia Spotlight,
a tool to interlink text documents with the linked open data is used to link
the speech recognition output to the DBpedia knowledge graph. Such a
knowledge-based speech recognition interface is useful for applications such as
question answering or spoken dialog systems.Comment: Under Review in International Workshop on Grounding Language
Understanding, Satellite of Interspeech 201
Spoken content retrieval: A survey of techniques and technologies
Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
Mobile Phone Text Processing and Question-Answering
Mobile phone text messaging between mobile users and information services is a growing area of
Information Systems. Users may require the service to provide an answer to queries, or may, in wikistyle, want to contribute to the service by texting in some information within the service’s domain of discourse. Given the volume of such messaging it is essential to do the processing through an automated service. Further, in the case of repeated use of the service, the quality of such a response has the potential to benefit from a dynamic user profile that the service can build up from previous texts of the same user.
This project will investigate the potential for creating such intelligent mobile phone services and aims to produce a computational model to enable their efficient implementation. To make the project feasible, the scope of the automated service is considered to lie within a limited domain of, for example, information about entertainment within a specific town centre. The project will assume the existence of a model of objects within the domain of discourse, hence allowing the analysis of texts within the context of a user model and a domain model. Hence, the project will involve the subject areas of natural language processing, language engineering, machine learning, knowledge extraction, and ontological engineering
On the voice-activated question answering
[EN] Question answering (QA) is probably one of the most challenging tasks in the field of natural language processing. It requires search engines that are capable of extracting concise, precise fragments of text that contain an answer to a question posed by the user. The incorporation of voice interfaces to the QA systems adds a more natural and very appealing perspective for these systems. This paper provides a comprehensive description of current state-of-the-art voice-activated QA systems. Finally, the scenarios that will emerge from the introduction of speech recognition in QA will be discussed. © 2006 IEEE.This work was supported in part by Research Projects TIN2009-13391-C04-03 and TIN2008-06856-C05-02. This paper was recommended by Associate Editor V. Marik.Rosso, P.; Hurtado Oliver, LF.; Segarra Soriano, E.; Sanchís Arnal, E. (2012). On the voice-activated question answering. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews. 42(1):75-85. https://doi.org/10.1109/TSMCC.2010.2089620S758542
GEMINI: A Natural Language System for Spoken-Language Understanding
Gemini is a natural language understanding system developed for spoken
language applications. The paper describes the architecture of Gemini, paying
particular attention to resolving the tension between robustness and
overgeneration. Gemini features a broad-coverage unification-based grammar of
English, fully interleaved syntactic and semantic processing in an all-paths,
bottom-up parser, and an utterance-level parser to find interpretations of
sentences that might not be analyzable as complete sentences. Gemini also
includes novel components for recognizing and correcting grammatical
disfluencies, and for doing parse preferences. This paper presents a
component-by-component view of Gemini, providing detailed relevant measurements
of size, efficiency, and performance.Comment: 8 pages, postscrip
DramaQA: Character-Centered Video Story Understanding with Hierarchical QA
Despite recent progress on computer vision and natural language processing,
developing video understanding intelligence is still hard to achieve due to the
intrinsic difficulty of story in video. Moreover, there is not a theoretical
metric for evaluating the degree of video understanding. In this paper, we
propose a novel video question answering (Video QA) task, DramaQA, for a
comprehensive understanding of the video story. The DramaQA focused on two
perspectives: 1) hierarchical QAs as an evaluation metric based on the
cognitive developmental stages of human intelligence. 2) character-centered
video annotations to model local coherence of the story. Our dataset is built
upon the TV drama "Another Miss Oh" and it contains 16,191 QA pairs from 23,928
various length video clips, with each QA pair belonging to one of four
difficulty levels. We provide 217,308 annotated images with rich
character-centered annotations, including visual bounding boxes, behaviors, and
emotions of main characters, and coreference resolved scripts. Additionally, we
provide analyses of the dataset as well as Dual Matching Multistream model
which effectively learns character-centered representations of video to answer
questions about the video. We are planning to release our dataset and model
publicly for research purposes and expect that our work will provide a new
perspective on video story understanding research.Comment: 21 pages, 10 figures, submitted to ECCV 202
- …