4,109 research outputs found
Towards a relation extraction framework for cyber-security concepts
In order to assist security analysts in obtaining information pertaining to
their network, such as novel vulnerabilities, exploits, or patches, information
retrieval methods tailored to the security domain are needed. As labeled text
data is scarce and expensive, we follow developments in semi-supervised Natural
Language Processing and implement a bootstrapping algorithm for extracting
security entities and their relationships from text. The algorithm requires
little input data, specifically, a few relations or patterns (heuristics for
identifying relations), and incorporates an active learning component which
queries the user on the most important decisions to prevent drifting from the
desired relations. Preliminary testing on a small corpus shows promising
results, obtaining precision of .82.Comment: 4 pages in Cyber & Information Security Research Conference 2015, AC
TopExNet: Entity-Centric Network Topic Exploration in News Streams
The recent introduction of entity-centric implicit network representations of
unstructured text offers novel ways for exploring entity relations in document
collections and streams efficiently and interactively. Here, we present
TopExNet as a tool for exploring entity-centric network topics in streams of
news articles. The application is available as a web service at
https://topexnet.ifi.uni-heidelberg.de/ .Comment: Published in Proceedings of the Twelfth ACM International Conference
on Web Search and Data Mining, WSDM 2019, Melbourne, VIC, Australia, February
11-15, 201
Automatic Synonym Discovery with Knowledge Bases
Recognizing entity synonyms from text has become a crucial task in many
entity-leveraging applications. However, discovering entity synonyms from
domain-specific text corpora (e.g., news articles, scientific papers) is rather
challenging. Current systems take an entity name string as input to find out
other names that are synonymous, ignoring the fact that often times a name
string can refer to multiple entities (e.g., "apple" could refer to both Apple
Inc and the fruit apple). Moreover, most existing methods require training data
manually created by domain experts to construct supervised-learning systems. In
this paper, we study the problem of automatic synonym discovery with knowledge
bases, that is, identifying synonyms for knowledge base entities in a given
domain-specific corpus. The manually-curated synonyms for each entity stored in
a knowledge base not only form a set of name strings to disambiguate the
meaning for each other, but also can serve as "distant" supervision to help
determine important features for the task. We propose a novel framework, called
DPE, to integrate two kinds of mutually-complementing signals for synonym
discovery, i.e., distributional features based on corpus-level statistics and
textual patterns based on local contexts. In particular, DPE jointly optimizes
the two kinds of signals in conjunction with distant supervision, so that they
can mutually enhance each other in the training stage. At the inference stage,
both signals will be utilized to discover synonyms for the given entities.
Experimental results prove the effectiveness of the proposed framework
Automated construction and analysis of political networks via open government and media sources
We present a tool to generate real world political networks from user provided lists of politicians and news sites. Additional output includes visualizations, interactive tools and maps that allow a user to better understand the politicians and their surrounding environments as portrayed by the media. As a case study, we construct a comprehensive list of current Texas politicians, select news sites that convey a spectrum of political viewpoints covering Texas politics, and examine the results. We propose a ”Combined” co-occurrence distance metric to better reflect the relationship between two entities. A topic modeling technique is also proposed as a novel, automated way of labeling communities that exist within a politician’s ”extended” network.Peer ReviewedPostprint (author's final draft
Building a Generation Knowledge Source using Internet-Accessible Newswire
In this paper, we describe a method for automatic creation of a knowledge
source for text generation using information extraction over the Internet. We
present a prototype system called PROFILE which uses a client-server
architecture to extract noun-phrase descriptions of entities such as people,
places, and organizations. The system serves two purposes: as an information
extraction tool, it allows users to search for textual descriptions of
entities; as a utility to generate functional descriptions (FD), it is used in
a functional-unification based generation system. We present an evaluation of
the approach and its applications to natural language generation and
summarization.Comment: 8 pages, uses eps
Comprehensive Review of Opinion Summarization
The abundance of opinions on the web has kindled the study of opinion summarization over the last few years. People have introduced various techniques and paradigms to solving this special task. This survey attempts to systematically investigate the different techniques and approaches used in opinion summarization. We provide a multi-perspective classification of the approaches used and highlight some of the key weaknesses of these approaches. This survey also covers evaluation techniques and data sets used in studying the opinion summarization problem. Finally, we provide insights into some of the challenges that are left to be addressed as this will help set the trend for future research in this area.unpublishednot peer reviewe
- …