432 research outputs found

    Preliminary Study of Validating Vocabulary Selection and Organization of A Manual Communication Board in Malay.

    Get PDF
    An integral component of a language-based augmentative and alternative communication (AAC) system is providing vocabulary typical of fluent native language speakers. In the absence of reliable and valid research on Malay vocabulary for AAC, this descriptive study explored the validation process of vocabulary selection and organization for a 144-location manual communication board. An hour of aided language samples (talking while pointing to a prototype display) followed by self-administered surveys were gathered from four typical native Malay speakers (n=4), aged between 22 to 36 years at the University of Pittsburgh. Vocabulary frequency analysis, word commonality, and overall perceptions and feedback on the prototype display were compiled and analyzed. A total of 1112 word tokens and 454 word types were analyzed to support preliminary validation of the selected vocabulary and word organization of the prototype. Approximately 40% of the words on the display were used during the interview and the top 20 words were reported. Findings also suggest the importance of morphology and syntax considerations at early design stages. The positive overall perception of the display including vocabulary selection, the cultural and ethnicity appropriateness, and suggestions for system improvement were confirmed by the usability survey. Minimal rearrangement of the icon display needs to be performed to improve the usability of the system. Thus, the study findings support the early Malay manual communication board for AAC intervention. However, the limitation of the sample size and additional research is required to support a final display that optimizes vocabulary and morphosyntactic organization of a manual communication board in Malay

    Narrative voice in popular science in the British press: a corpus analysis on the construal of attributed meanings

    Get PDF
    [ES] Esta tesis doctoral versa sobre el estudio de la construcción de la atribución del significado en la diseminación de la ciencia en la prensa británica a través del análisis de los recursos utilizados por el periodista para integrar en su narración de los hechos científicos lo que otras fuentes externas han dicho. El fenómeno de la atribución en el discurso académico, científico y de los medios de comunicación se ha descrito previamente desde una perspectiva interpersonal mediante el análisis de la evaluación y de la teoría de la valoración (‘appraisal’). Además, otras investigaciones previas se han centrado también en el estudio de cómo los elementos experienciales de la atribución, tales como la fuente de atribución, los procesos (verbales y/o mentales), y la estructura (estilo directo o indirecto) contribuyen de manera independiente a la construcción de la atribución. Sin embargo, el estudio llevado a cabo en esta tesis doctoral trata de proporcionar una descripción más exhaustiva y una visión global de cómo se construye la atribución desde una perspectiva experiencial. La hipótesis planteada es que en los textos polifónicos se puede estudiar la atribución de manera exhaustiva, identificando en todo momento en el texto la fuente de atribución de las citas literales, de los reportajes, o de la narración. Con este propósito, se ha definido y estudiado una nueva unidad de análisis (‘la unidad de voz’), que permite distinguir entre fuentes externas de atribución (‘attribution’) y la voz del periodista (‘averral’). El objetivo de esta tesis es explorar cómo los elementos experienciales que construyen la atribución coocurren en cada una de las unidades de voz identificadas y contribuyen tanto a la interacción del periodista con sus lectores como al posicionamiento epistemológico de dicho periodista con respecto de la información narrada. Con el objetivo de obtener datos tantos cuantitativos como cualitativos, se ha compilado un corpus de 180 artículos de divulgación científica y se han identificado 1.625 unidades de voz desde una perspectiva sistémica funcional. Además, se ha propuesto un esquema de anotación para el análisis exhaustivo de los tres elementos cruciales (procesos, participantes y relaciones lógico-dependientes) de las unidades de voz identificadas. Los resultados indican que en los textos la polifonía se manifiesta en un alto grado, debido al hecho de que de los 1.625 casos de unidades de voz analizados, los casos en los que la información está atribuida a fuentes externas suponen más del doble de los casos en los que información está narrada por el propio periodista (‘averral’). Si se consideran los resultados del análisis de los elementos de la atribución de manera independiente, se puede observar como la atribución principalmente se construye mediante el equilibrio entre citas directas e indirectas, mediante procesos neutrales, y mediante participantes humanos. Estos resultados concuerdan con las expectativas típicamente asociadas a este tipo de texto y que están relacionadas con la objetividad esperada por parte del periodista, sugiriendo que dicho periodista actúa como mediador de la información desde una posición invisible o casi invisible. Sin embargo, los resultados también revelan que, dentro de las unidades de voz, el complejo entrecruzamiento e interacción de casos de atribución y de ‘averral’ conllevan la aparición de casos que son ambiguos y que sugieren que el periodista también se posiciona con respecto a la información dada aliándose con la fuente externa de atribución, haciendo que su voz y la de la fuente externa sean literalmente indistinguibles. Además, los procesos que utilizan los periodistas para proyectar las voces de los otros son variados, e incluyen procesos no neutrales que los periodistas emplean para construir su rol de mediador de una manera más visible, aunque no muestren de manera real su evaluación sobre la información, sino más bien contextualizando y explicando esa información para sus lectores, lo cual es también consistente con la función pedagógica que caracteriza a los artículos de divulgación científica. Los resultados relacionados con los ‘grupos de proyección’ encontrados muestran que los periodistas tienden a construir las fuentes externas de atribución refiriéndose a ellas por su nombre propio o por su profesión cuando les citan de manera literal, mientras que si los periodistas parafrasean sus palabras, entonces hacen un mayor uso (un tercio del total) de participantes que son materiales (el estudio, la investigación, etc.). También existe mayor preferencia por el uso de procesos neutrales cuando los periodistas citan de manera literal, y estos procesos aparecen junto con participantes humanos denominados ‘Named’. Por el contrario, cuando los periodistas parafrasean la información, hacen un mayor uso de procesos no neutrales y de participantes humanos denominados ‘Semi-named’. La comparación de estos dos casos muestra una clara diferencia en la manera en la que el periodista construye su papel como mediador de la información, ya que no muestra ningún signo de mediación cuando usa citas literales, mientras que su papel como mediador está mucho más marcado cuando parafrasea las palabras dichas por otros. Finalmente, el papel mediador del periodista también se construye a través del fenómeno de ‘embedding’, y en concreto a través del uso de nombres de proyección, que presentan la información que ha sido mediada por el periodista como empaquetada y, por lo tanto, no susceptible de ser cuestionada por los lectores. Esto está directamente ligado con un mayor control por parte del periodista cuando está narrando los hechos científicos. Toda la información obtenida de cómo funciona la construcción de significados de atribución desde una perspectiva experiencial muestra que el constante entrecruzamiento de la atribución y de ‘averral’ se utiliza por parte del periodista para construir una representación de los hechos científicos que radica en la construcción del papel mediador del periodista con el objetivo de guiar a sus lectores no expertos a lo largo del texto, siendo esta constante interacción de significados narrados y atribuidos mucho más dinámica que lo que se había demostrado en estudios anteriores.[EN] This dissertation approaches the construal of attributed meanings in the dissemination of science in the British press by analysing the resources used by journalists to integrate in their narrations of scientific findings what other sources have said. Attribution in scientific, academic and media discourse has been previously described from an interpersonal viewpoint through the analysis of evaluation and appraisal. In addition, research has also addressed how experiential elements such as the source, the process (verbal and/or mental), and the structure (direct speech or indirect report) contribute independently to the construal of attribution. However, the approach followed in this dissertation attempts to provide a more comprehensive description of how attribution is construed experientially. The assumption made is that in polyphonic texts it is possible to analyse attribution comprehensively, by identifying to whom we can attribute each idea which is quoted, or reported, or narrated in the text. For this purpose, an analytical unit (‘unit of voice’) has been defined and studied, which distinguishes between external sources of attribution (‘attribution’) and the journalist’s voice (‘averral’). The aim in this dissertation is to explore how the experiential elements construing attribution co-occur in each unit of voice and contribute to the journalist’s interaction with his/her readers as well as to his/her epistemological positioning with respect to the attributed information. In order to obtain both quantitative and qualitative data, a corpus consisting of 180 popularizations has been compiled and 1,625 units of voice have been identified from the perspective of systemic functional linguistics. In addition, an annotation scheme for the comprehensive analysis of the three crucial components (processes, participants and logico-dependent relations) of the units of voice has been proposed. Results point to the fact that the texts in the corpus show a high degree of polyphony, due to the fact that in the 1,625 units of voice, the cases of attributed information almost double the cases of information averred. When taking the analysed features of attribution in isolation, results suggest that attribution is construed in these texts mainly through a balance between reporting and quoting, through neutral projecting processes, and through Human participants. These results correspond to traditional expectations pointing to the objectivity of the journalists in science dissemination, and seem to suggest that the journalist represents his/her mediation role from an invisible or almost invisible position. However, the analysis has also revealed that, within the unit of voice, the often complex intertwining of attribution and averral shows sometimes an ambiguous blurring between the voice of the journalist and the voice of the external source of attribution, which seems to suggest that the journalist also positions him/herself as literally aligned with the external source, by making both voices literally undistinguishable. In addition, the processes used by the journalist for projecting what others have said are varied, also including stance processes which the journalist uses to construe his mediating role in a more visible way, not really showing his/her personal views or opinions on the narrated information, but rather contextualising and interpreting its significance for readers, which is consistent with the pedagogic function expected from these texts. Results of the projection clusters considered show that journalist tend to construe the sources of attribution by labelling them either by their proper name or by their professional role when quoting them, whereas when reporting what they have said journalists show a much higher preference (up to one third of the total) to refer to material sources (e.g. the report, the study, etc.) instead. Preference is also shown to use projecting processes for quoting which are neutral together with participants construed as Human Named, versus a higher tendency to rely on stance processes when the journalist is reporting, for which they rely more often on the construal of participants as Human Semi-named. The comparison of these shows a clear difference on how the journalist represents his/her mediating role in each case, by not showing any kind of mediating presence in the case of quotes, to presenting a sounder presence as mediator in the case of reports. Finally, the journalist’s mediating role is also construed through embedding, particularly through the use of nouns of projection, which construe the journalist’s mediation as packaged and, therefore, not open to question, and which can be linked to a more prominent role on the part of the journalist in the control of the information narrated. This experiential account of the construal of attribution in science popularizations shows, in sum, that the intertwining of attribution and averral in the text is used by the journalist to construe a representation of the scientific findings narrated which relies on a mediating role of the journalist in his/her aim to guide lay readers along the narration which is essentially much more dynamic than previous accounts have shown

    Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources

    Get PDF
    Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen

    Automating information extraction task for Turkish texts

    Get PDF
    Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2011.Thesis (Ph. D.) -- Bilkent University, 2011.Includes bibliographical references leaves 85-97.Throughout history, mankind has often suffered from a lack of necessary resources. In today’s information world, the challenge can sometimes be a wealth of resources. That is to say, an excessive amount of information implies the need to find and extract necessary information. Information extraction can be defined as the identification of selected types of entities, relations, facts or events in a set of unstructured text documents in a natural language. The goal of our research is to build a system that automatically locates and extracts information from Turkish unstructured texts. Our study focuses on two basic Information Extraction (IE) tasks: Named Entity Recognition and Entity Relation Detection. Named Entity Recognition, finding named entities (persons, locations, organizations, etc.) located in unstructured texts, is one of the most fundamental IE tasks. Entity Relation Detection task tries to identify relationships between entities mentioned in text documents. Using supervised learning strategy, the developed systems start with a set of examples collected from a training dataset and generate the extraction rules from the given examples by using a carefully designed coverage algorithm. Moreover, several rule filtering and rule refinement techniques are utilized to maximize generalization and accuracy at the same time. In order to obtain accurate generalization, we use several syntactic and semantic features of the text, including: orthographical, contextual, lexical and morphological features. In particular, morphological features of the text are effectively used in this study to increase the extraction performance for Turkish, an agglutinative language. Since the system does not rely on handcrafted rules/patterns, it does not heavily suffer from domain adaptability problem. The results of the conducted experiments show that (1) the developed systems are successfully applicable to the Named Entity Recognition and Entity Relation Detection tasks, and (2) exploiting morphological features can significantly improve the performance of information extraction from Turkish, an agglutinative language.Tatar, SerhanPh.D

    Predictive modeling of human placement decisions in an English Writing Placement Test

    Get PDF
    Writing is an important component in standardized tests that are utilized for admission decisions, class placement, and academic or professional development. Placement results of the EPT Writing Test at the undergraduate level are used to determine whether international students meet English requirements for writing skills (i.e., Pass); and to direct students to appropriate ESL writing classes (i.e., 101B or 101C). Practical constraints during evaluation processes in the English Writing Placement Test (the EPT Writing Test) at Iowa State University, such as rater disagreement, rater turnover, and heavy administrative workload, have demonstrated the necessity to develop valid scoring models for an automated writing evaluation tool. Statistical algorithms of the scoring engines were essential to predict human raters\u27 quality judgments of EPT essays in the future. Furthermore, in measuring L2 writing performance, previous research has heavily focused on writer-oriented text features in students\u27 writing performance, rather than reader-oriented linguistic features that were influential to human raters for making quality judgments. To address the practical concerns of the EPT Writing Test and the existing gap in the literature, the current project aimed at developing a predictive model that best defines human placement decisions in the EPT Writing Test. A two-phase multistage mixed-methods design was adopted in this study within a model-specification phase and in interconnection with model-specification and model-construction phases. In the model-specification phase, results of a Multifaceted-Rasch-Measurement (MFRM) analysis allowed for selection of five EPT expert raters that represented rating severity levels. Concurrent think-aloud protocols provided by the five participants while evaluating EPT sample essays were analyzed qualitatively to identify text features to which raters attended. Based on the qualitative findings, 52 evaluative variables and metrics were generated. Among the 52 variables, 36 variables were chosen to be analyzed in the whole EPT essay corpus. After that, a corpus-based analysis of 297 EPT essays in terms of 37 text features was conducted to obtain quantitative data on the 36 variables in the model-construction phase. Principal Component Analysis (PCA) helped extract seven principal components (PCs). Results of MANOVA and one-way ANOVA tests revealed 17 original variables and six PCs that significantly differentiated the three EPT placement levels (i.e., 101B, 101C, and Pass). A profile analysis suggested that the lowest level (101B) and the highest level (Pass) seemed to have distinct profiles in terms of text features. Test takers placed in 101C classes were likely to be characterized as an average group. Like 101B students, 101C students appeared to have some linguistic problems. However, students in 101C classes and those who passed the test similarly demonstrated an ability to develop an essay. In the model-construction phase, random forests (Breiman, 2001) were deployed as a data mining technique to define predictive models of human raters\u27 placement decisions in different task types. Results of the random forests indicated that fragments, part-of-speech-related errors and PC2 (clear organization but limited paragraph development) were significant predictors of the 101B level, and PC6 (academic word use) of the Pass level. The generic classifier on the 17 original variables was seemingly the best model that could perfectly predict the training data set (0% error) and successfully forecast the test set (8% error). Differences in prediction performance between the generic and task specific models were negligible. Results of this project provided little evidence of generalizability of the predictive models in classifying new EPT essays. However, within-class examinations showed that the best classifier could recognize the highest and lowest essays, but crossover cases existed at the adjacent levels. Implications of the project for placement assessment purposes, pedagogical practices in ESL writing courses and automated essay scoring (AES) development for the EPT Writing Test are brought into the discussion

    Sustainability Conversations for Impact: Transdisciplinarity on Four Scales

    Get PDF
    Sustainability is a dynamic, multi-scale endeavor. Coherence can be lost between scales – from project teams, to organizations, to networks, and, most importantly, down to conversations. Sustainability researchers have embraced transdisciplinarity, as it is grounded in science, shared language, broad participation, and respect for difference. Yet, transdisciplinarity at these four scales is not well-defined. In this dissertation I extend transdisciplinarity out from the project to networks and organizations, and down into conversation, adding novel lenses and quantitative approaches. In Chapter 2, I propose transdisciplinarity incorporate academic disciplines which help cross scales: Organizational Learning, Knowledge Management, Applied Cooperation, and Data Science. In Chapter 3 I then use a mixed-method approach to study a transdisciplinary organization, the Maine Aquaculture Hub, as it develops strategy. Using social network analysis and conversation analytics, I evaluate how the Hub’s network-convening, strategic thinking and conversation practices turn organization-scale transdisciplinarity into strategic advantage. In Chapters 4 and 5, conversation is the nexus of transdisciplinarity. I study seven public aquaculture lease scoping meetings (informal town halls) and classify conversation activity by “discussion discipline,” i.e., rhetorical and social intent. I compute the relationship between discussion discipline proportions and three sustainability outcomes of intent-to-act, options-generation, and relationship-building. I consider exogenous factors, such as signaling, gender balance, timing and location. I show that where inquiry is high, so is innovation. Where acknowledgement is high, so is intent-to-act. Where respect is high, so is relationship-building. Indirectness and sarcasm dampen outcomes. I propose seven interventions to improve sustainability conversation capacity, such as nudging, networks, and using empirical models. Chapter 5 explores those empirical models: I use natural language-processing (NLP) to detect the discussion disciplines by training a model using the previously coded transcripts. Then I use that model to classify 591 open-source conversation transcripts, and regress the sustainability outcomes, per-transcript, on discussion discipline proportions. I show that all three conversation outcomes can be predicted by the discussion disciplines, and most statistically-significant being intent-to-act, which responds directly to acknowledgement and respect. Conversation AI is the next frontier of transdisciplinarity for sustainability solutions

    Dynamic language modeling for European Portuguese

    Get PDF
    Doutoramento em Engenharia InformáticaActualmente muitas das metodologias utilizadas para transcrição e indexação de transmissões noticiosas são baseadas em processos manuais. Com o processamento e transcrição deste tipo de dados os prestadores de serviços noticiosos procuram extrair informação semântica que permita a sua interpretação, sumarização, indexação e posterior disseminação selectiva. Pelo que, o desenvolvimento e implementação de técnicas automáticas para suporte deste tipo de tarefas têm suscitado ao longo dos últimos anos o interesse pela utilização de sistemas de reconhecimento automático de fala. Contudo, as especificidades que caracterizam este tipo de tarefas, nomeadamente a diversidade de tópicos presentes nos blocos de notícias, originam um elevado número de ocorrência de novas palavras não incluídas no vocabulário finito do sistema de reconhecimento, o que se traduz negativamente na qualidade das transcrições automáticas produzidas pelo mesmo. Para línguas altamente flexivas, como é o caso do Português Europeu, este problema torna-se ainda mais relevante. Para colmatar este tipo de problemas no sistema de reconhecimento, várias abordagens podem ser exploradas: a utilização de informações específicas de cada um dos blocos noticiosos a ser transcrito, como por exemplo os scripts previamente produzidos pelo pivot e restantes jornalistas, e outro tipo de fontes como notícias escritas diariamente disponibilizadas na Internet. Este trabalho engloba essencialmente três contribuições: um novo algoritmo para selecção e optimização do vocabulário, utilizando informação morfosintáctica de forma a compensar as diferenças linguísticas existentes entre os diferentes conjuntos de dados; uma metodologia diária para adaptação dinâmica e não supervisionada do modelo de linguagem, utilizando múltiplos passos de reconhecimento; metodologia para inclusão de novas palavras no vocabulário do sistema, mesmo em situações de não existência de dados de adaptação e sem necessidade re-estimação global do modelo de linguagem.Most of today methods for transcription and indexation of broadcast audio data are manual. Broadcasters process thousands hours of audio and video data on a daily basis, in order to transcribe that data, to extract semantic information, and to interpret and summarize the content of those documents. The development of automatic and efficient support for these manual tasks has been a great challenge and over the last decade there has been a growing interest in the usage of automatic speech recognition as a tool to provide automatic transcription and indexation of broadcast news and random and relevant access to large broadcast news databases. However, due to the common topic changing over time which characterizes this kind of tasks, the appearance of new events leads to high out-of-vocabulary (OOV) word rates and consequently to degradation of recognition performance. This is especially true for highly inflected languages like the European Portuguese language. Several innovative techniques can be exploited to reduce those errors. The use of news shows specific information, such as topic-based lexicons, pivot working script, and other sources such as the online written news daily available in the Internet can be added to the information sources employed by the automatic speech recognizer. In this thesis we are exploring the use of additional sources of information for vocabulary optimization and language model adaptation of a European Portuguese broadcast news transcription system. Hence, this thesis has 3 different main contributions: a novel approach for vocabulary selection using Part-Of-Speech (POS) tags to compensate for word usage differences across the various training corpora; language model adaptation frameworks performed on a daily basis for single-stage and multistage recognition approaches; a new method for inclusion of new words in the system vocabulary without the need of additional data or language model retraining
    corecore