5,290 research outputs found
Linguistic Structure in Statistical Machine Translation
This thesis investigates the influence of linguistic structure in statistical machine translation. We develop a word reordering model based on syntactic parse trees and address the issues of pronouns and morphological agreement with a source discriminative word lexicon predicting the translation for individual words using structural features. When used in phrase-based machine translation, the models improve the translation for language pairs with different word order and morphological variation
Evaluation of Automatic Text Summarization Using Synthetic Facts
Automatic text summarization has achieved remarkable success with the development of deep neural networks and the availability of standardized benchmark datasets. It can generate fluent, human-like summaries. However, the unreliability of the existing evaluation metrics hinders its practical usage and slows down its progress. To address this issue, we propose an automatic reference-less text summarization evaluation system with dynamically generated synthetic facts. We hypothesize that if a system guarantees a summary that has all the facts that are 100% known in the synthetic document, it can provide natural interpretability and high feasibility in measuring factual consistency and comprehensiveness. To our knowledge, our system is the first system that measures the overarching quality of the text summarization models with factual consistency, comprehensiveness, and compression rate. We validate our system by comparing its correlation with human judgment with existing N-gram overlap-based metrics such as ROUGE and BLEU and a BERT-based evaluation metric, BERTScore. Our system\u27s experimental evaluation of PEGASUS, BART, and T5 outperforms the current evaluation metrics in measuring factual consistency with a noticeable margin and demonstrates its statistical significance in measuring comprehensiveness and overall summary quality
Cultures and Traditions of Wordplay and Wordplay Research
This volume focuses on realisations of wordplay in different cultures and social and historical contexts, and brings together various research traditions of approaching wordplay. Together with the volume DWP 7, it assembles selected papers presented at the interdisciplinary conference The Dynamics of Wordplay / La dynamique du jeu de mots (Trier, 2016) and stresses the inherent dynamicity of wordplay and wordplay research
Automatic generation of named entity taggers leveraging parallel corpora
The lack of hand curated data is a major impediment to developing statistical semantic
processors for many of the world languages. A major issue of semantic processors in Nat-
ural Language Processing (NLP) is that they require manually annotated data to perform
accurately. Our work aims to address this issue by leveraging existing annotations and
semantic processors from multiple source languages by projecting their annotations via
statistical word alignments traditionally used in Machine Translation. Taking the Named
Entity Recognition (NER) task as a use case of semantic processing, this work presents
a method to automatically induce Named Entity taggers using parallel data, without any
manual intervention. Our method leverages existing semantic processors and annotations
to overcome the lack of annotation data for a given language. The intuition is to transfer
or project semantic annotations, from multiple sources to a target language, by statistical
word alignment methods applied to parallel texts (Och and Ney, 2000; Liang et al., 2006).
The projected annotations can then be used to automatically generate semantic processors
for the target language. In this way we would be able to provide NLP processors with-
out training data for the target language. The experiments are focused on 4 languages:
German, English, Spanish and Italian, and our empirical evaluation results show that our
method obtains competitive results when compared with models trained on gold-standard
out-of-domain data. This shows that our projection algorithm is effective to transport NER
annotations across languages via parallel data thus providing a fully automatic method to
obtain NER taggers for as many as the number of languages aligned via parallel corpora
Knowledge Discovery and Management within Service Centers
These days, most enterprise service centers deploy Knowledge Discovery and Management (KDM) systems to address the challenge of timely delivery of a resourceful service request resolution while efficiently utilizing the huge amount of data. These KDM systems facilitate prompt response to the critical service requests and if possible then try to prevent the service requests getting triggered in the first place. Nevertheless, in most cases, information required for a request resolution is dispersed and suppressed under the mountain of irrelevant information over the Internet in unstructured and heterogeneous formats. These heterogeneous data sources and formats complicate the access to reusable knowledge and increase the response time required to reach a resolution. Moreover, the state-of-the art methods neither support effective integration of domain knowledge with the KDM systems nor promote the assimilation of reusable knowledge or Intellectual Capital (IC). With the goal of providing an improved service request resolution within the shortest possible time, this research proposes an IC Management System. The proposed tool efficiently utilizes domain knowledge in the form of semantic web technology to extract the most valuable information from those raw unstructured data and uses that knowledge to formulate service resolution model as a combination of efficient data search, classification, clustering, and recommendation methods. Our proposed solution also handles the technology categorization of a service request which is very crucial in the request resolution process. The system has been extensively evaluated with several experiments and has been used in a real enterprise customer service center
Linguistic-based Patterns for Figurative Language Processing: The Case of Humor Recognition and Irony Detection
El lenguaje figurado representa una de las tareas más difíciles del procesamiento del lenguaje natural. A
diferencia del lenguaje literal, el lenguaje figurado hace uso de recursos lingüísticos tales como la
ironía, el humor, el sarcasmo, la metáfora, la analogía, entre otros, para comunicar significados
indirectos que la mayoría de las veces no son interpretables sólo en términos de información sintáctica
o semántica. Por el contrario, el lenguaje figurado refleja patrones del pensamiento que adquieren
significado pleno en contextos comunicativos y sociales, lo cual hace que tanto su representación
lingüística, así como su procesamiento computacional, se vuelvan tareas por demás complejas.
En este contexto, en esta tesis de doctorado se aborda una problemática relacionada con el
procesamiento del lenguaje figurado a partir de patrones lingüísticos. En particular, nuestros esfuerzos
se centran en la creación de un sistema capaz de detectar automáticamente instancias de humor e ironía
en textos extraídos de medios sociales. Nuestra hipótesis principal se basa en la premisa de que el
lenguaje refleja patrones de conceptualización; es decir, al estudiar el lenguaje, estudiamos tales
patrones. Por tanto, al analizar estos dos dominios del lenguaje figurado, pretendemos dar argumentos
respecto a cómo la gente los concibe, y sobre todo, a cómo esa concepción hace que tanto humor como
ironía sean verbalizados de una forma particular en diversos medios sociales. En este contexto, uno de
nuestros mayores intereses es demostrar cómo el conocimiento que proviene del análisis de diferentes
niveles de estudio lingüístico puede representar un conjunto de patrones relevantes para identificar
automáticamente usos figurados del lenguaje. Cabe destacar que contrario a la mayoría de
aproximaciones que se han enfocado en el estudio del lenguaje figurado, en nuestra investigación no
buscamos dar argumentos basados únicamente en ejemplos prototípicos, sino en textos cuyas
característicasReyes Pérez, A. (2012). Linguistic-based Patterns for Figurative Language Processing: The Case of Humor Recognition and Irony Detection [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/16692Palanci
Recommended from our members
Learning to Live with Machine Translation
Rapid advancements in technologies of text and image generation have increasingly put the perceived autonomy of human creativity under threat. Even before ChatGPT and other large-language models sent such anxieties into overdrive, literary critics were arguing for a hermeneutics of automatic writing and revisiting long-held assumptions about artistic originality. Few, however, gave much thought to these model's quirky cousins—a family branch that once ruled over the utopian dreams invested in AI: machine translation (MT). This essay reflects on why translation has been lost in all the recent talk about these models and offers a necessary corrective. It considers what a critical response to MT might look like when reframed around an understanding of current technologies and a vision of MT as potential collaborator rather than human replacement. First, it offers an overview of current neural-based MT and the theories of translation that underwrite it. It then uses literary texts as a limit case for surveying the technology's most visible gaps, providing a deep, qualitative analysis of Japanese literary texts machine translated into English. Finally, it takes a speculative turn and considers what "good enough" machine translation of a large corpus of world literature might be good for in a future of ubiquitous and ever more accessible MT. The results hint at more immediate ways that MT invites inquiry into the present conditions of world literature, but also to a future where the entanglement of human translation and agency with the material agency of the technology bring forth potentials in both
- …