20,587 research outputs found
Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art
Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover
Overcoming data scarcity of Twitter: using tweets as bootstrap with application to autism-related topic content analysis
Notwithstanding recent work which has demonstrated the potential of using
Twitter messages for content-specific data mining and analysis, the depth of
such analysis is inherently limited by the scarcity of data imposed by the 140
character tweet limit. In this paper we describe a novel approach for targeted
knowledge exploration which uses tweet content analysis as a preliminary step.
This step is used to bootstrap more sophisticated data collection from directly
related but much richer content sources. In particular we demonstrate that
valuable information can be collected by following URLs included in tweets. We
automatically extract content from the corresponding web pages and treating
each web page as a document linked to the original tweet show how a temporal
topic model based on a hierarchical Dirichlet process can be used to track the
evolution of a complex topic structure of a Twitter community. Using
autism-related tweets we demonstrate that our method is capable of capturing a
much more meaningful picture of information exchange than user-chosen hashtags.Comment: IEEE/ACM International Conference on Advances in Social Networks
Analysis and Mining, 201
Terminology Extraction for and from Communications in Multi-disciplinary Domains
Terminology extraction generally refers to methods and systems for identifying term candidates in a uni-disciplinary and uni-lingual
environment such as engineering, medical, physical and geological sciences, or administration, business and leisure. However, as
human enterprises get more and more complex, it has become increasingly important for teams in one discipline to collaborate with
others from not only a non-cognate discipline but also speaking a different language. Disaster mitigation and recovery, and conflict
resolution are amongst the areas where there is a requirement to use standardised multilingual terminology for communication. This
paper presents a feasibility study conducted to build terminology (and ontology) in the domain of disaster management and is part of the
broader work conducted for the EU project Sland \ub4 ail (FP7 607691). We have evaluated CiCui (for Chinese name \ub4 \u8bcd\u8403, which translates to
words gathered), a corpus-based text analytic system that combine frequency, collocation and linguistic analyses to extract candidates
terminologies from corpora comprised of domain texts from diverse sources. CiCui was assessed against four terminology extraction
systems and the initial results show that it has an above average precision in extracting terms
Growing Story Forest Online from Massive Breaking News
We describe our experience of implementing a news content organization system
at Tencent that discovers events from vast streams of breaking news and evolves
news story structures in an online fashion. Our real-world system has distinct
requirements in contrast to previous studies on topic detection and tracking
(TDT) and event timeline or graph generation, in that we 1) need to accurately
and quickly extract distinguishable events from massive streams of long text
documents that cover diverse topics and contain highly redundant information,
and 2) must develop the structures of event stories in an online manner,
without repeatedly restructuring previously formed stories, in order to
guarantee a consistent user viewing experience. In solving these challenges, we
propose Story Forest, a set of online schemes that automatically clusters
streaming documents into events, while connecting related events in growing
trees to tell evolving stories. We conducted extensive evaluation based on 60
GB of real-world Chinese news data, although our ideas are not
language-dependent and can easily be extended to other languages, through
detailed pilot user experience studies. The results demonstrate the superior
capability of Story Forest to accurately identify events and organize news text
into a logical structure that is appealing to human readers, compared to
multiple existing algorithm frameworks.Comment: Accepted by CIKM 2017, 9 page
Designing an automated prototype tool for preservation quality metadata extraction for ingest into digital repository
We present a viable framework for the automated extraction of preservation quality metadata, which is adjusted to meet the needs of, ingest to digital repositories. It has three distinctive features: wide coverage, specialisation and emphasis on quality. Wide coverage is achieved through the use of a distributed system of tool repositories, which helps to implement it over a broad range of document object types. Specialisation is maintained through the selection of the most appropriate metadata extraction tool for each case based on the identification of the digital object genre. And quality is sustained by introducing control points at selected stages of the workflow of the system. The integration of these three features as components in the ingest of material into digital repositories is a defining step ahead in the current quest for improved management of digital resources
SEMA4A: An ontology for emergency notification systems accessibility
This is the post-print version of the final paper published in Expert Systems with Applications. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2009 Elsevier B.V.Providing alert communication in emergency situations is vital to reduce the number of victims. Reaching this goal is challenging due to users’ diversity: people with disabilities, elderly and children, and other vulnerable groups. Notifications are critical when an emergency scenario is going to happen (e.g. a typhoon approaching) so the ability to transmit notifications to different kind of users is a crucial feature for such systems. In this work an ontology was developed by investigating different sources: accessibility guidelines, emergency response systems, communication devices and technologies, taking into account the different abilities of people to react to different alarms (e.g. mobile phone vibration as an alarm for deafblind people). We think that the proposed ontology addresses the information needs for sharing and integrating emergency notification messages over distinct emergency response information systems providing accessibility under different conditions and for different kind of users.Ministerio de Educación y Cienci
- …