22,766 research outputs found
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
Recommended from our members
User sentiment detection: a YouTube use case
In this paper we propose an unsupervised lexicon-based approach to detect the sentiment polarity of user comments in YouTube. Polarity detection in social media content is challenging not only because of the existing limitations in current sentiment dictionaries but also due to the informal linguistic styles used by users. Present dictionaries fail to capture the sentiments of community-created terms. To address the challenge we adopted a data-driven approach and prepared a social media specific list of terms and phrases expressing user sentiments and opinions. Experimental evaluation shows the combinatorial approach has greater potential. Finally, we discuss many research challenges involving social media sentiment analysis
Working out a common task: design and evaluation of user-intelligent system collaboration
This paper describes the design and user evaluation of an intelligent user interface intended to mediate between users and an Adaptive Information Extraction (AIE) system. The design goal was to support a synergistic and cooperative
work. Laboratory tests showed the approach was efficient and effective; focus groups were run to assess its ease of use. Logs, user satisfaction questionnaires, and interviews were exploited to investigate the interaction experience.
We found that user’ attitude is mainly hierarchical with the user wishing to control and check the system’s initiatives. However when confidence in the system capabilities rises, a more cooperative interaction is adopted
Organisational challenges of the semantic web in digital libraries: A Norwegian case study
This is the post-print version of the Article. The official published version can be accessed from the link below - Copyright @ 2009 Emerald Group Publishing LimitedPurpose – The purpose of this paper is to examine from a socio-technical point of view the impact of semantic web technology on the strategic, organisational and technological levels. The semantic web initiative holds great promise for the future for digital libraries. There is, however, a considerable gap in semantic web research between the contributions in the technological field and research in the organisational field. Design/methodology/approach – A comprehensive case study of the National Library of Norway (NL) is conducted, building on two major sources of information: the documentation of the digitising project of the NL; and interviews with nine different stakeholders at three levels of NL's organisation during June to August 2007. Top managers are interviewed on strategy, middle managers and librarians are interviewed regarding organisational issues and ICT professionals are interviewed on technology issues. Findings – The findings indicate that the highest impact will be at the organisational level. This is mainly because inter-organisational and cross-organisational structures have to be established to address the problems of ontology engineering, and a development framework for ontology engineering in digital libraries must be examined. Originality/value – ICT professionals and library practitioners should be more mindful of organisational issues when planning and executing semantic web projects in digital libraries. In particular, practitioners should be aware that the ontology engineering process and the semantic meta-data production will affect the entire organisation. For public digital libraries this probably will also call for a more open policy towards user groups to properly manage the process of ontology engineering
Multi Visualization and Dynamic Query for Effective Exploration of Semantic Data
Semantic formalisms represent content in a uniform way according to ontologies. This enables manipulation and reasoning via automated means (e.g. Semantic Web services), but limits the user’s ability to explore the semantic data from a point of view that originates from knowledge representation motivations. We show how, for user consumption, a visualization of semantic data according to some easily graspable dimensions (e.g. space and time) provides effective sense-making of data. In this paper, we look holistically at the interaction between users and semantic data, and propose multiple visualization strategies and dynamic filters to support the exploration of semantic-rich data.
We discuss a user evaluation and how interaction challenges could be overcome to create an effective user-centred framework for the visualization and manipulation of semantic data. The approach has been implemented and evaluated on a real company archive
- …