22,815 research outputs found
A Survey of Paraphrasing and Textual Entailment Methods
Paraphrasing methods recognize, generate, or extract phrases, sentences, or
longer natural language expressions that convey almost the same information.
Textual entailment methods, on the other hand, recognize, generate, or extract
pairs of natural language expressions, such that a human who reads (and trusts)
the first element of a pair would most likely infer that the other element is
also true. Paraphrasing can be seen as bidirectional textual entailment and
methods from the two areas are often similar. Both kinds of methods are useful,
at least in principle, in a wide range of natural language processing
applications, including question answering, summarization, text generation, and
machine translation. We summarize key ideas from the two areas by considering
in turn recognition, generation, and extraction methods, also pointing to
prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of
Informatics, Athens University of Economics and Business, Greece, 201
A Corpus-based Approach to the Chinese Word Segmentation
For a society based upon laws and reason, it has become too easy for us to believe
that we live in a world without them. And given that our linguistics wisdom was
originally motivated by the search for rules, it seems strange that we now consider
these rules to be the exceptions and take exceptions as the norm.
The current task of contemporary computational linguistics is to describe these
exceptions. In particular, it suffices for most language processing needs, to just
describe the argument and predicate within an elementary sentence, under the
framework of local grammar. Therefore, a corpus-based approach to the Chinese
Word Segmentation problem is proposed, as the first step towards a local grammar
for the Chinese language.
The two main issues with existing lexicon-based approaches are (a) the classification
of unknown character sequences, i.e. sequences that are not listed in
the lexicon, and (b) the disambiguation of situations where two candidate words
overlap.
For (a), we propose an automatic method of enriching the lexicon by comparing
candidate sequences to occurrences of the same strings in a manually segmented
reference corpus, and using methods of machine learning to select the optimal
segmentation for them. These methods are developed in the course of the thesis
specifically for this task. The possibility of applying these machine learning
method will be discussed in NP-extraction and alignment domain.
(b) is approached by designing a general processing framework for Chinese text,
which will be called multi-level processing. Under this framework, sentences are
recursively split into fragments, according to a language-specific, but domainindependent
heuristics. The resulting fragments then define the ultimate boundaries
between candidate words and therefore resolve any segmentation ambiguity
caused by overlapping sequences. A new shallow semantical annotation is also
proposed under the frame work of multi-level processing.
A word segmentation algorithm based on these principles has been implemented
and tested; results of the evaluation are given and compared to the performance of
previous approaches as reported in the literature.
The first chapter of this thesis discusses the goals of segmentation and introduces
some background concepts. The second chapter analyses the current state-of-theart
approach to Chinese language segmentation. Chapter 3 proposes a new corpusbased
approach to the identification of unknown words. In chapter 4, a new shallow
semantical annotation is also proposed under the framework of multi-level
processing
Roots Reloaded. Culture, Identity and Social Development in the Digital Age
This edited volume is designed to explore different perspectives of culture, identity and social development using the impact of the digital age as a common thread, aiming at interdisciplinary audiences. Cases of communities and individuals using new technology as a tool to preserve and explore their cultural heritage alongside new media as a source for social orientation ranging from language acquisition to health-related issues will be covered. Therefore, aspects such as Art and Cultural Studies, Media and Communication, Behavioral Science, Psychology, Philosophy and innovative approaches used by creative individuals are included. From the Aboriginal tribes of Australia, to the Maoris of New Zealand, to the mystical teachings of Sufi brotherhoods, the significance of the oral and written traditions and their current relation to online activities shall be discussed in the opening article. The book continues with a closer look at obesity awareness support groups and their impact on social media, Facebook usage in language learning context, smartphone addiction and internet dependency, as well as online media reporting of controversial ethical issues. The Digital progress has already left its dominating mark as the world entered the 21st century. Without a doubt, as technology continues its ascent, society will be faced with new and altering values in an effort to catch-up with this extraordinary Digitization, adapt satisfactorily in order to utilize these strong developments in everyday life
CLiFF Notes: Research In Natural Language Processing at the University of Pennsylvania
CLIFF is the Computational Linguists\u27 Feedback Forum. We are a group of students and faculty who gather once a week to hear a presentation and discuss work currently in progress. The \u27feedback\u27 in the group\u27s name is important: we are interested in sharing ideas, in discussing ongoing research, and in bringing together work done by the students and faculty in Computer Science and other departments.
However, there are only so many presentations which we can have in a year. We felt that it would be beneficial to have a report which would have, in one place, short descriptions of the work in Natural Language Processing at the University of Pennsylvania. This report then, is a collection of abstracts from both faculty and graduate students, in Computer Science, Psychology and Linguistics. We want to stress the close ties between these groups, as one of the things that we pride ourselves on here at Penn is the communication among different departments and the inter-departmental work.
Rather than try to summarize the varied work currently underway at Penn, we suggest reading the abstracts to see how the students and faculty themselves describe their work. The report illustrates the diversity of interests among the researchers here, as well as explaining the areas of common interest. In addition, since it was our intent to put together a document that would be useful both inside and outside of the university, we hope that this report will explain to everyone some of what we are about
- …