2,532 research outputs found
Knowledge Extraction from Natural Language Requirements into a Semantic Relation Graph
Knowledge extraction and representation aims to identify information and to transform it into a machine-readable format. Knowledge representations support Information Retrieval tasks such as searching for single statements, documents, or metadata.
Requirements specifications of complex systems such as automotive software systems are usually divided into different subsystem specifications. Nevertheless, there are semantic relations between individual documents of the separated subsystems, which have to be considered in further processes (e.g. dependencies). If requirements engineers or other developers are not aware of these relations, this can lead to inconsistencies or malfunctions of the overall system. Therefore, there is a strong need for tool support in order to detects semantic relations in a set of large natural language requirements specifications.
In this work we present a knowledge extraction approach based on an explicit knowledge representation of the content of natural language requirements as a semantic relation graph. Our approach is fully automated and includes an NLP pipeline to transform unrestricted natural language requirements into a graph. We split the natural language into different parts and relate them to each other based on their semantic relation. In addition to semantic relations, other relationships can also be included in the graph. We envision to use a semantic search algorithm like spreading activation to allow users to search different semantic relations in the graph
Building a Generation Knowledge Source using Internet-Accessible Newswire
In this paper, we describe a method for automatic creation of a knowledge
source for text generation using information extraction over the Internet. We
present a prototype system called PROFILE which uses a client-server
architecture to extract noun-phrase descriptions of entities such as people,
places, and organizations. The system serves two purposes: as an information
extraction tool, it allows users to search for textual descriptions of
entities; as a utility to generate functional descriptions (FD), it is used in
a functional-unification based generation system. We present an evaluation of
the approach and its applications to natural language generation and
summarization.Comment: 8 pages, uses eps
Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and Its Automatic Evaluation
To acquire noun phrases from running texts is useful for many applications,
such as word grouping,terminology indexing, etc. The reported literatures adopt
pure probabilistic approach, or pure rule-based noun phrases grammar to tackle
this problem. In this paper, we apply a probabilistic chunker to deciding the
implicit boundaries of constituents and utilize the linguistic knowledge to
extract the noun phrases by a finite state mechanism. The test texts are
SUSANNE Corpus and the results are evaluated by comparing the parse field of
SUSANNE Corpus automatically. The results of this preliminary experiment are
encouraging.Comment: 8 pages, Postscript file, Unix compressed, uuencode
Learning Analogies and Semantic Relations
We present an algorithm for learning from unlabeled text, based on the
Vector Space Model (VSM) of information retrieval, that can solve verbal
analogy questions of the kind found in the Scholastic Aptitude Test (SAT).
A verbal analogy has the form A:B::C:D, meaning "A is to B as C is to D";
for example, mason:stone::carpenter:wood. SAT analogy questions provide
a word pair, A:B, and the problem is to select the most analogous word
pair, C:D, from a set of five choices. The VSM algorithm correctly
answers 47% of a collection of 374 college-level analogy questions
(random guessing would yield 20% correct). We motivate this research by
relating it to work in cognitive science and linguistics, and by applying
it to a difficult problem in natural language processing, determining
semantic relations in noun-modifier pairs. The problem is to classify a
noun-modifier pair, such as "laser printer", according to the semantic
relation between the noun (printer) and the modifier (laser). We use a
supervised nearest-neighbour algorithm that assigns a class to a given
noun-modifier pair by finding the most analogous noun-modifier pair in
the training data. With 30 classes of semantic relations, on a collection
of 600 labeled noun-modifier pairs, the learning algorithm attains an F
value of 26.5% (random guessing: 3.3%). With 5 classes of semantic
relations, the F value is 43.2% (random: 20%). The performance is
state-of-the-art for these challenging problems
- …