5,249 research outputs found
Automatic extraction of knowledge from web documents
A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from multiple documents based on a predefined ontology. The ontology represents the type and form of knowledge to extract. This knowledge is then used to generate tailored biographies. The information extraction process of Artequakt is detailed and evaluated in this paper
Synthesizing Program Input Grammars
We present an algorithm for synthesizing a context-free grammar encoding the
language of valid program inputs from a set of input examples and blackbox
access to the program. Our algorithm addresses shortcomings of existing grammar
inference algorithms, which both severely overgeneralize and are prohibitively
slow. Our implementation, GLADE, leverages the grammar synthesized by our
algorithm to fuzz test programs with structured inputs. We show that GLADE
substantially increases the incremental coverage on valid inputs compared to
two baseline fuzzers
Multimedia search without visual analysis: the value of linguistic and contextual information
This paper addresses the focus of this special issue by analyzing the potential contribution of linguistic content and other non-image aspects to the processing of audiovisual data. It summarizes the various ways in which linguistic content analysis contributes to enhancing the semantic annotation of multimedia content, and, as a consequence, to improving the effectiveness of conceptual media access tools. A number of techniques are presented, including the time-alignment of textual resources, audio and speech processing, content reduction and reasoning tools, and the exploitation of surface features
- âŠ