22,815 research outputs found

    A Survey of Paraphrasing and Textual Entailment Methods

    Full text link
    Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true. Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often similar. Both kinds of methods are useful, at least in principle, in a wide range of natural language processing applications, including question answering, summarization, text generation, and machine translation. We summarize key ideas from the two areas by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of Informatics, Athens University of Economics and Business, Greece, 201

    A Novel Schema-Oriented Approach for Chinese New Word Identification

    Get PDF

    A Corpus-based Approach to the Chinese Word Segmentation

    Get PDF
    For a society based upon laws and reason, it has become too easy for us to believe that we live in a world without them. And given that our linguistics wisdom was originally motivated by the search for rules, it seems strange that we now consider these rules to be the exceptions and take exceptions as the norm. The current task of contemporary computational linguistics is to describe these exceptions. In particular, it suffices for most language processing needs, to just describe the argument and predicate within an elementary sentence, under the framework of local grammar. Therefore, a corpus-based approach to the Chinese Word Segmentation problem is proposed, as the first step towards a local grammar for the Chinese language. The two main issues with existing lexicon-based approaches are (a) the classification of unknown character sequences, i.e. sequences that are not listed in the lexicon, and (b) the disambiguation of situations where two candidate words overlap. For (a), we propose an automatic method of enriching the lexicon by comparing candidate sequences to occurrences of the same strings in a manually segmented reference corpus, and using methods of machine learning to select the optimal segmentation for them. These methods are developed in the course of the thesis specifically for this task. The possibility of applying these machine learning method will be discussed in NP-extraction and alignment domain. (b) is approached by designing a general processing framework for Chinese text, which will be called multi-level processing. Under this framework, sentences are recursively split into fragments, according to a language-specific, but domainindependent heuristics. The resulting fragments then define the ultimate boundaries between candidate words and therefore resolve any segmentation ambiguity caused by overlapping sequences. A new shallow semantical annotation is also proposed under the frame work of multi-level processing. A word segmentation algorithm based on these principles has been implemented and tested; results of the evaluation are given and compared to the performance of previous approaches as reported in the literature. The first chapter of this thesis discusses the goals of segmentation and introduces some background concepts. The second chapter analyses the current state-of-theart approach to Chinese language segmentation. Chapter 3 proposes a new corpusbased approach to the identification of unknown words. In chapter 4, a new shallow semantical annotation is also proposed under the framework of multi-level processing

    Roots Reloaded. Culture, Identity and Social Development in the Digital Age

    Get PDF
    This edited volume is designed to explore different perspectives of culture, identity and social development using the impact of the digital age as a common thread, aiming at interdisciplinary audiences. Cases of communities and individuals using new technology as a tool to preserve and explore their cultural heritage alongside new media as a source for social orientation ranging from language acquisition to health-related issues will be covered. Therefore, aspects such as Art and Cultural Studies, Media and Communication, Behavioral Science, Psychology, Philosophy and innovative approaches used by creative individuals are included. From the Aboriginal tribes of Australia, to the Maoris of New Zealand, to the mystical teachings of Sufi brotherhoods, the significance of the oral and written traditions and their current relation to online activities shall be discussed in the opening article. The book continues with a closer look at obesity awareness support groups and their impact on social media, Facebook usage in language learning context, smartphone addiction and internet dependency, as well as online media reporting of controversial ethical issues. The Digital progress has already left its dominating mark as the world entered the 21st century. Without a doubt, as technology continues its ascent, society will be faced with new and altering values in an effort to catch-up with this extraordinary Digitization, adapt satisfactorily in order to utilize these strong developments in everyday life

    CLiFF Notes: Research In Natural Language Processing at the University of Pennsylvania

    Get PDF
    CLIFF is the Computational Linguists\u27 Feedback Forum. We are a group of students and faculty who gather once a week to hear a presentation and discuss work currently in progress. The \u27feedback\u27 in the group\u27s name is important: we are interested in sharing ideas, in discussing ongoing research, and in bringing together work done by the students and faculty in Computer Science and other departments. However, there are only so many presentations which we can have in a year. We felt that it would be beneficial to have a report which would have, in one place, short descriptions of the work in Natural Language Processing at the University of Pennsylvania. This report then, is a collection of abstracts from both faculty and graduate students, in Computer Science, Psychology and Linguistics. We want to stress the close ties between these groups, as one of the things that we pride ourselves on here at Penn is the communication among different departments and the inter-departmental work. Rather than try to summarize the varied work currently underway at Penn, we suggest reading the abstracts to see how the students and faculty themselves describe their work. The report illustrates the diversity of interests among the researchers here, as well as explaining the areas of common interest. In addition, since it was our intent to put together a document that would be useful both inside and outside of the university, we hope that this report will explain to everyone some of what we are about
    • …
    corecore