5,290 research outputs found

    Linguistic Structure in Statistical Machine Translation

    Get PDF
    This thesis investigates the influence of linguistic structure in statistical machine translation. We develop a word reordering model based on syntactic parse trees and address the issues of pronouns and morphological agreement with a source discriminative word lexicon predicting the translation for individual words using structural features. When used in phrase-based machine translation, the models improve the translation for language pairs with different word order and morphological variation

    Evaluation of Automatic Text Summarization Using Synthetic Facts

    Get PDF
    Automatic text summarization has achieved remarkable success with the development of deep neural networks and the availability of standardized benchmark datasets. It can generate fluent, human-like summaries. However, the unreliability of the existing evaluation metrics hinders its practical usage and slows down its progress. To address this issue, we propose an automatic reference-less text summarization evaluation system with dynamically generated synthetic facts. We hypothesize that if a system guarantees a summary that has all the facts that are 100% known in the synthetic document, it can provide natural interpretability and high feasibility in measuring factual consistency and comprehensiveness. To our knowledge, our system is the first system that measures the overarching quality of the text summarization models with factual consistency, comprehensiveness, and compression rate. We validate our system by comparing its correlation with human judgment with existing N-gram overlap-based metrics such as ROUGE and BLEU and a BERT-based evaluation metric, BERTScore. Our system\u27s experimental evaluation of PEGASUS, BART, and T5 outperforms the current evaluation metrics in measuring factual consistency with a noticeable margin and demonstrates its statistical significance in measuring comprehensiveness and overall summary quality

    Cultures and Traditions of Wordplay and Wordplay Research

    Get PDF
    This volume focuses on realisations of wordplay in different cultures and social and historical contexts, and brings together various research traditions of approaching wordplay. Together with the volume DWP 7, it assembles selected papers presented at the interdisciplinary conference The Dynamics of Wordplay / La dynamique du jeu de mots (Trier, 2016) and stresses the inherent dynamicity of wordplay and wordplay research

    Automatic generation of named entity taggers leveraging parallel corpora

    Get PDF
    The lack of hand curated data is a major impediment to developing statistical semantic processors for many of the world languages. A major issue of semantic processors in Nat- ural Language Processing (NLP) is that they require manually annotated data to perform accurately. Our work aims to address this issue by leveraging existing annotations and semantic processors from multiple source languages by projecting their annotations via statistical word alignments traditionally used in Machine Translation. Taking the Named Entity Recognition (NER) task as a use case of semantic processing, this work presents a method to automatically induce Named Entity taggers using parallel data, without any manual intervention. Our method leverages existing semantic processors and annotations to overcome the lack of annotation data for a given language. The intuition is to transfer or project semantic annotations, from multiple sources to a target language, by statistical word alignment methods applied to parallel texts (Och and Ney, 2000; Liang et al., 2006). The projected annotations can then be used to automatically generate semantic processors for the target language. In this way we would be able to provide NLP processors with- out training data for the target language. The experiments are focused on 4 languages: German, English, Spanish and Italian, and our empirical evaluation results show that our method obtains competitive results when compared with models trained on gold-standard out-of-domain data. This shows that our projection algorithm is effective to transport NER annotations across languages via parallel data thus providing a fully automatic method to obtain NER taggers for as many as the number of languages aligned via parallel corpora

    Knowledge Discovery and Management within Service Centers

    Get PDF
    These days, most enterprise service centers deploy Knowledge Discovery and Management (KDM) systems to address the challenge of timely delivery of a resourceful service request resolution while efficiently utilizing the huge amount of data. These KDM systems facilitate prompt response to the critical service requests and if possible then try to prevent the service requests getting triggered in the first place. Nevertheless, in most cases, information required for a request resolution is dispersed and suppressed under the mountain of irrelevant information over the Internet in unstructured and heterogeneous formats. These heterogeneous data sources and formats complicate the access to reusable knowledge and increase the response time required to reach a resolution. Moreover, the state-of-the art methods neither support effective integration of domain knowledge with the KDM systems nor promote the assimilation of reusable knowledge or Intellectual Capital (IC). With the goal of providing an improved service request resolution within the shortest possible time, this research proposes an IC Management System. The proposed tool efficiently utilizes domain knowledge in the form of semantic web technology to extract the most valuable information from those raw unstructured data and uses that knowledge to formulate service resolution model as a combination of efficient data search, classification, clustering, and recommendation methods. Our proposed solution also handles the technology categorization of a service request which is very crucial in the request resolution process. The system has been extensively evaluated with several experiments and has been used in a real enterprise customer service center

    Linguistic-based Patterns for Figurative Language Processing: The Case of Humor Recognition and Irony Detection

    Full text link
    El lenguaje figurado representa una de las tareas más difíciles del procesamiento del lenguaje natural. A diferencia del lenguaje literal, el lenguaje figurado hace uso de recursos lingüísticos tales como la ironía, el humor, el sarcasmo, la metáfora, la analogía, entre otros, para comunicar significados indirectos que la mayoría de las veces no son interpretables sólo en términos de información sintáctica o semántica. Por el contrario, el lenguaje figurado refleja patrones del pensamiento que adquieren significado pleno en contextos comunicativos y sociales, lo cual hace que tanto su representación lingüística, así como su procesamiento computacional, se vuelvan tareas por demás complejas. En este contexto, en esta tesis de doctorado se aborda una problemática relacionada con el procesamiento del lenguaje figurado a partir de patrones lingüísticos. En particular, nuestros esfuerzos se centran en la creación de un sistema capaz de detectar automáticamente instancias de humor e ironía en textos extraídos de medios sociales. Nuestra hipótesis principal se basa en la premisa de que el lenguaje refleja patrones de conceptualización; es decir, al estudiar el lenguaje, estudiamos tales patrones. Por tanto, al analizar estos dos dominios del lenguaje figurado, pretendemos dar argumentos respecto a cómo la gente los concibe, y sobre todo, a cómo esa concepción hace que tanto humor como ironía sean verbalizados de una forma particular en diversos medios sociales. En este contexto, uno de nuestros mayores intereses es demostrar cómo el conocimiento que proviene del análisis de diferentes niveles de estudio lingüístico puede representar un conjunto de patrones relevantes para identificar automáticamente usos figurados del lenguaje. Cabe destacar que contrario a la mayoría de aproximaciones que se han enfocado en el estudio del lenguaje figurado, en nuestra investigación no buscamos dar argumentos basados únicamente en ejemplos prototípicos, sino en textos cuyas característicasReyes Pérez, A. (2012). Linguistic-based Patterns for Figurative Language Processing: The Case of Humor Recognition and Irony Detection [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/16692Palanci
    corecore