Search CORE

5,290 research outputs found

Linguistic Structure in Statistical Machine Translation

Author: Herrmann Teresa
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2015
Field of study

This thesis investigates the influence of linguistic structure in statistical machine translation. We develop a word reordering model based on syntactic parse trees and address the issues of pronouns and morphological agreement with a source discriminative word lexicon predicting the translation for individual words using structural features. When used in phrase-based machine translation, the models improve the translation for language pairs with different word order and morphological variation

KITopen

Evaluation of Automatic Text Summarization Using Synthetic Facts

Author: Ahn Jaewook
Publication venue: DigitalCommons@CalPoly
Publication date: 01/06/2022
Field of study

Automatic text summarization has achieved remarkable success with the development of deep neural networks and the availability of standardized benchmark datasets. It can generate fluent, human-like summaries. However, the unreliability of the existing evaluation metrics hinders its practical usage and slows down its progress. To address this issue, we propose an automatic reference-less text summarization evaluation system with dynamically generated synthetic facts. We hypothesize that if a system guarantees a summary that has all the facts that are 100% known in the synthetic document, it can provide natural interpretability and high feasibility in measuring factual consistency and comprehensiveness. To our knowledge, our system is the first system that measures the overarching quality of the text summarization models with factual consistency, comprehensiveness, and compression rate. We validate our system by comparing its correlation with human judgment with existing N-gram overlap-based metrics such as ROUGE and BLEU and a BERT-based evaluation metric, BERTScore. Our system\u27s experimental evaluation of PEGASUS, BART, and T5 outperforms the current evaluation metrics in measuring factual consistency with a noticeable margin and demonstrates its statistical significance in measuring comprehensiveness and overall summary quality

DigitalCommons@CalPoly

Cultures and Traditions of Wordplay and Wordplay Research

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 21/11/2022
Field of study

This volume focuses on realisations of wordplay in different cultures and social and historical contexts, and brings together various research traditions of approaching wordplay. Together with the volume DWP 7, it assembles selected papers presented at the interdisciplinary conference The Dynamics of Wordplay / La dynamique du jeu de mots (Trier, 2016) and stresses the inherent dynamicity of wordplay and wordplay research

Directory of Open Access Books (DOAB)

Automatic generation of named entity taggers leveraging parallel corpora

Author: Chung Yi-Ling
Publication venue
Publication date: 26/09/2017
Field of study

The lack of hand curated data is a major impediment to developing statistical semantic processors for many of the world languages. A major issue of semantic processors in Nat- ural Language Processing (NLP) is that they require manually annotated data to perform accurately. Our work aims to address this issue by leveraging existing annotations and semantic processors from multiple source languages by projecting their annotations via statistical word alignments traditionally used in Machine Translation. Taking the Named Entity Recognition (NER) task as a use case of semantic processing, this work presents a method to automatically induce Named Entity taggers using parallel data, without any manual intervention. Our method leverages existing semantic processors and annotations to overcome the lack of annotation data for a given language. The intuition is to transfer or project semantic annotations, from multiple sources to a target language, by statistical word alignment methods applied to parallel texts (Och and Ney, 2000; Liang et al., 2006). The projected annotations can then be used to automatically generate semantic processors for the target language. In this way we would be able to provide NLP processors with- out training data for the target language. The experiments are focused on 4 languages: German, English, Spanish and Italian, and our empirical evaluation results show that our method obtains competitive results when compared with models trained on gold-standard out-of-domain data. This shows that our projection algorithm is effective to transport NER annotations across languages via parallel data thus providing a fully automatic method to obtain NER taggers for as many as the number of languages aligned via parallel corpora

Archivo Digital para la Docencia y la Investigación

Knowledge Discovery and Management within Service Centers

Author: Zaman Nazia
Publication venue: North Dakota State University
Publication date: 01/01/2016
Field of study

These days, most enterprise service centers deploy Knowledge Discovery and Management (KDM) systems to address the challenge of timely delivery of a resourceful service request resolution while efficiently utilizing the huge amount of data. These KDM systems facilitate prompt response to the critical service requests and if possible then try to prevent the service requests getting triggered in the first place. Nevertheless, in most cases, information required for a request resolution is dispersed and suppressed under the mountain of irrelevant information over the Internet in unstructured and heterogeneous formats. These heterogeneous data sources and formats complicate the access to reusable knowledge and increase the response time required to reach a resolution. Moreover, the state-of-the art methods neither support effective integration of domain knowledge with the KDM systems nor promote the assimilation of reusable knowledge or Intellectual Capital (IC). With the goal of providing an improved service request resolution within the shortest possible time, this research proposes an IC Management System. The proposed tool efficiently utilizes domain knowledge in the form of semantic web technology to extract the most valuable information from those raw unstructured data and uses that knowledge to formulate service resolution model as a combination of efficient data search, classification, clustering, and recommendation methods. Our proposed solution also handles the technology categorization of a service request which is very crucial in the request resolution process. The system has been extensively evaluated with several experiments and has been used in a real enterprise customer service center

NDSU Libraries Institutional Repository

Linguistic-based Patterns for Figurative Language Processing: The Case of Humor Recognition and Irony Detection

Author: Reyes Pérez Antonio
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 19/07/2012
Field of study

El lenguaje figurado representa una de las tareas más difíciles del procesamiento del lenguaje natural. A diferencia del lenguaje literal, el lenguaje figurado hace uso de recursos lingüísticos tales como la ironía, el humor, el sarcasmo, la metáfora, la analogía, entre otros, para comunicar significados indirectos que la mayoría de las veces no son interpretables sólo en términos de información sintáctica o semántica. Por el contrario, el lenguaje figurado refleja patrones del pensamiento que adquieren significado pleno en contextos comunicativos y sociales, lo cual hace que tanto su representación lingüística, así como su procesamiento computacional, se vuelvan tareas por demás complejas. En este contexto, en esta tesis de doctorado se aborda una problemática relacionada con el procesamiento del lenguaje figurado a partir de patrones lingüísticos. En particular, nuestros esfuerzos se centran en la creación de un sistema capaz de detectar automáticamente instancias de humor e ironía en textos extraídos de medios sociales. Nuestra hipótesis principal se basa en la premisa de que el lenguaje refleja patrones de conceptualización; es decir, al estudiar el lenguaje, estudiamos tales patrones. Por tanto, al analizar estos dos dominios del lenguaje figurado, pretendemos dar argumentos respecto a cómo la gente los concibe, y sobre todo, a cómo esa concepción hace que tanto humor como ironía sean verbalizados de una forma particular en diversos medios sociales. En este contexto, uno de nuestros mayores intereses es demostrar cómo el conocimiento que proviene del análisis de diferentes niveles de estudio lingüístico puede representar un conjunto de patrones relevantes para identificar automáticamente usos figurados del lenguaje. Cabe destacar que contrario a la mayoría de aproximaciones que se han enfocado en el estudio del lenguaje figurado, en nuestra investigación no buscamos dar argumentos basados únicamente en ejemplos prototípicos, sino en textos cuyas característicasReyes Pérez, A. (2012). Linguistic-based Patterns for Figurative Language Processing: The Case of Humor Recognition and Irony Detection [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/16692Palanci

Crossref

RiuNet

Recommended from our members

Learning to Live with Machine Translation

Author: Long Hoyt
Publication venue: 'Project Muse'
Publication date: 08/06/2023
Field of study

Rapid advancements in technologies of text and image generation have increasingly put the perceived autonomy of human creativity under threat. Even before ChatGPT and other large-language models sent such anxieties into overdrive, literary critics were arguing for a hermeneutics of automatic writing and revisiting long-held assumptions about artistic originality. Few, however, gave much thought to these model's quirky cousins—a family branch that once ruled over the utopian dreams invested in AI: machine translation (MT). This essay reflects on why translation has been lost in all the recent talk about these models and offers a necessary corrective. It considers what a critical response to MT might look like when reframed around an understanding of current technologies and a vision of MT as potential collaborator rather than human replacement. First, it offers an overview of current neural-based MT and the theories of translation that underwrite it. It then uses literary texts as a limit case for surveying the technology's most visible gaps, providing a deep, qualitative analysis of Japanese literary texts machine translated into English. Finally, it takes a speculative turn and considers what "good enough" machine translation of a large corpus of world literature might be good for in a future of ubiquitous and ever more accessible MT. The results hint at more immediate ways that MT invites inquiry into the present conditions of world literature, but also to a future where the entanglement of human translation and agency with the material agency of the technology bring forth potentials in both

Knowledge UChicago