784 research outputs found
A Survey of Paraphrasing and Textual Entailment Methods
Paraphrasing methods recognize, generate, or extract phrases, sentences, or
longer natural language expressions that convey almost the same information.
Textual entailment methods, on the other hand, recognize, generate, or extract
pairs of natural language expressions, such that a human who reads (and trusts)
the first element of a pair would most likely infer that the other element is
also true. Paraphrasing can be seen as bidirectional textual entailment and
methods from the two areas are often similar. Both kinds of methods are useful,
at least in principle, in a wide range of natural language processing
applications, including question answering, summarization, text generation, and
machine translation. We summarize key ideas from the two areas by considering
in turn recognition, generation, and extraction methods, also pointing to
prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of
Informatics, Athens University of Economics and Business, Greece, 201
Reverse-Engineering Satire, or "Paper on Computational Humor Accepted Despite Making Serious Advances"
Humor is an essential human trait. Efforts to understand humor have called
out links between humor and the foundations of cognition, as well as the
importance of humor in social engagement. As such, it is a promising and
important subject of study, with relevance for artificial intelligence and
human-computer interaction. Previous computational work on humor has mostly
operated at a coarse level of granularity, e.g., predicting whether an entire
sentence, paragraph, document, etc., is humorous. As a step toward deep
understanding of humor, we seek fine-grained models of attributes that make a
given text humorous. Starting from the observation that satirical news
headlines tend to resemble serious news headlines, we build and analyze a
corpus of satirical headlines paired with nearly identical but serious
headlines. The corpus is constructed via Unfun.me, an online game that
incentivizes players to make minimal edits to satirical headlines with the goal
of making other players believe the results are serious headlines. The edit
operations used to successfully remove humor pinpoint the words and concepts
that play a key role in making the original, satirical headline funny. Our
analysis reveals that the humor tends to reside toward the end of headlines,
and primarily in noun phrases, and that most satirical headlines follow a
certain logical pattern, which we term false analogy. Overall, this paper
deepens our understanding of the syntactic and semantic structure of satirical
news headlines and provides insights for building humor-producing systems.Comment: Proceedings of the 33rd AAAI Conference on Artificial Intelligence,
201
Exploring regularities in software with statistical models and their applications
Software systems are becoming popular. They are used with different platforms for different applications. Software systems are developed with support from programming languages, which help developers work conveniently. Programming languages can have different paradigms with different form, syntactic structures, keywords, and representation ways. In many cases, however, programming languages are similar in different important aspects: 1. They are used to support description of specific tasks, 2. Source codes are written in languages and includes a limit set of distinctive tokens, many tokens are repeated like keywords, function calls, and 3. They follow specific syntactic rules to make machine understanding. Those points also respect the similarity between programming language and natural language.
Due to its critical role in many applications, natural language processing (NLP) has been studied much and given many promising results like automatic cross-language translation, speech-to-text, information searching, etc. It is interesting to observe if there are similar characteristics between natural language and programming language and whether techniques in NLP can be reused for programming language processing? Recent works in software engineering (SE) shows that their similarities between NLP and programming language processing and techniques in NLP can be reused for PLP.
This dissertation introduces my works with contributions in study of characteristics of programming languages, the models which employed them and the main applications that show the usefulness of the proposed models. Study in both three aspects has drawn interests from software engineering community and received awards due to their innovation and applicability. \u27
I hope that this dissertation will bring a systematic view of how advantage techniques in natural language processing and machine learning can be re-used and give huge benefit for programming language processing, and how those techniques are adapted with characteristics of programming language and software systems
Anaphora Resolution in Business Process Requirement Engineering
Anaphora resolution (AR) is one of the most important tasks in natural language processing which focuses on the problem of resolving what a pronoun, or a noun phrase refers to. Moreover, AR plays an essential role when dealing with business process textual description, either when trying to discover the process model from the text, or when validating an existing model. It helps these systems in discovering the core components in any process model (actors and objects).In this paper, we propose a domain specific AR system. The approach starts by automatically generating the concept map of the text, then the system uses this map to resolve references using the syntactic and semantic relations in the concept map. The approach outperforms the state-of-the art performance in the domain of business process texts with more than 73% accuracy. In addition, this approach could be easily adopted to resolve references in other domains
- …