54,677 research outputs found
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
From Frequency to Meaning: Vector Space Models of Semantics
Computers understand very little of the meaning of human language. This
profoundly limits our ability to give instructions to computers, the ability of
computers to explain their actions to us, and the ability of computers to
analyse and process text. Vector space models (VSMs) of semantics are beginning
to address these limits. This paper surveys the use of VSMs for semantic
processing of text. We organize the literature on VSMs according to the
structure of the matrix in a VSM. There are currently three broad classes of
VSMs, based on term-document, word-context, and pair-pattern matrices, yielding
three classes of applications. We survey a broad range of applications in these
three categories and we take a detailed look at a specific open source project
in each category. Our goal in this survey is to show the breadth of
applications of VSMs for semantics, to provide a new perspective on VSMs for
those who are already familiar with the area, and to provide pointers into the
literature for those who are less familiar with the field
Multimedia information technology and the annotation of video
The state of the art in multimedia information technology has not progressed to the point where a single solution is available to meet all reasonable needs of documentalists and users of video archives. In general, we do not have an optimistic view of the usability of new technology in this domain, but digitization and digital power can be expected to cause a small revolution in the area of video archiving. The volume of data leads to two views of the future: on the pessimistic side, overload of data will cause lack of annotation capacity, and on the optimistic side, there will be enough data from which to learn selected concepts that can be deployed to support automatic annotation. At the threshold of this interesting era, we make an attempt to describe the state of the art in technology. We sample the progress in text, sound, and image processing, as well as in machine learning
A Survey of Paraphrasing and Textual Entailment Methods
Paraphrasing methods recognize, generate, or extract phrases, sentences, or
longer natural language expressions that convey almost the same information.
Textual entailment methods, on the other hand, recognize, generate, or extract
pairs of natural language expressions, such that a human who reads (and trusts)
the first element of a pair would most likely infer that the other element is
also true. Paraphrasing can be seen as bidirectional textual entailment and
methods from the two areas are often similar. Both kinds of methods are useful,
at least in principle, in a wide range of natural language processing
applications, including question answering, summarization, text generation, and
machine translation. We summarize key ideas from the two areas by considering
in turn recognition, generation, and extraction methods, also pointing to
prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of
Informatics, Athens University of Economics and Business, Greece, 201
A Machine Learning Based Analytical Framework for Semantic Annotation Requirements
The Semantic Web is an extension of the current web in which information is
given well-defined meaning. The perspective of Semantic Web is to promote the
quality and intelligence of the current web by changing its contents into
machine understandable form. Therefore, semantic level information is one of
the cornerstones of the Semantic Web. The process of adding semantic metadata
to web resources is called Semantic Annotation. There are many obstacles
against the Semantic Annotation, such as multilinguality, scalability, and
issues which are related to diversity and inconsistency in content of different
web pages. Due to the wide range of domains and the dynamic environments that
the Semantic Annotation systems must be performed on, the problem of automating
annotation process is one of the significant challenges in this domain. To
overcome this problem, different machine learning approaches such as supervised
learning, unsupervised learning and more recent ones like, semi-supervised
learning and active learning have been utilized. In this paper we present an
inclusive layered classification of Semantic Annotation challenges and discuss
the most important issues in this field. Also, we review and analyze machine
learning applications for solving semantic annotation problems. For this goal,
the article tries to closely study and categorize related researches for better
understanding and to reach a framework that can map machine learning techniques
into the Semantic Annotation challenges and requirements
- …