16,291 research outputs found
Semantics-Based Content Extraction in Typewritten Historical Documents
This paper presents a flexible approach to extracting content from scanned historical documents using semantic information. The final electronic document is the result of a "digital historical document lifecycle" process, where the expert knowledge of the historian/archivist user is incorporated at different stages. Results show that such a conversion strategy aided by (expert) user-specified semantic information and which enables the processing of individual parts of the document in a specialised way, produces superior (in a variety of significant ways) results than document analysis and understanding techniques devised for contemporary documents
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
Recommended from our members
OBOME - Ontology based opinion mining in UBIPOL
Ontologies have a special role in the UBIPOL system, they help to structure the policy related context, provide conceptualization for policy domain and use in the opinion mining process. In this work we presented a system called Ontology Based Opinion Mining Engine (OBOME) for analyzing a domain-specific opinion corpus by first assisting the user with the creation of a domain ontology from the corpus. We determined the polarity of opinion on the various domain aspects. In the former step, the policy domain aspect has are identified (namely which policy category is represented by the concept). This identification is supported by the policy modelling ontology, which describe the most important policy â related classes and structure. Then the most informative documents from the corpus are extracted and asked the user to create a set of aspects and related keywords using these documents. In the latter step, we used the corpus specific ontology to model the domain and extracted aspect-polarity associations using grammatical dependencies between words. Later, summarized results are shown to the user to analyze and store. Finally, in an offline process policy modeling ontology is updated
Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art
Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover
Generating adaptive hypertext content from the semantic web
Accessing and extracting knowledge from online documents is crucial for therealisation of the Semantic Web and the provision of advanced knowledge services.
The Artequakt project is an ongoing investigation tackling these issues to facilitate the creation of tailored biographies from information harvested from the web.
In this paper we will present the methods we currently use to model, consolidate and store knowledge extracted from the web so that it can be re-purposed as adaptive content. We look at how Semantic Web technology could be used within this process and also how such techniques might be used to provide content to be published via the Semantic Web
- âŠ