Search CORE

14,362 research outputs found

Web Data Extraction, Applications and Techniques: A Survey

Author: Abel
Amalfitano
Balduzzi
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Berger
Berthold
Bettencourt
Califf
Catanese
Chang
Chen
Chen
Chen
Collins
Conover
Crandall
Crescenzi
Crescenzi
Dalvi
Dalvi
De Meo
De Meo
Doan
Emilio Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Flesca
Freitag
Furche
Gatterbauer
Gatterbauer
Giacomo Fiumara
Gjoka
Gkotsis
Gottlob
Gottlob
Hammersley
Han
Hecht
Hsu
Irmak
Khare
Kim
Kinsella
Kleinberg
Kleinberg
Kohlschütter
Kokkoras
Kokkoras
Kokkoras
Krüpl
Kushmerick
Kwak
Laender
Liu
Manning
Masanès
Mathes
Meng
Mislove
Monge
Muslea
Oro
Pan
Pasquale De Meo
Perito
Phan
Plake
Rahm
Rahm
Reis
Robert Baumgartner
Sahuguet
Sarawagi
Schifanella
Selkow
Shi
Soderland
Szomszor
Turmo
Vosecky
Wang
Wang
Weikum
Wilson
Winograd
Yang
Ye
Zafarani
Zanasi
Zhai
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 09/06/2014
Field of study

Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

arXiv.org e-Print Archive

Crossref

BlogForever D2.6: Data Extraction Methodology

Author: Banos V.
Davis R.
Gkotsis G.
Pincent E.
Stepanyan K.
Publication venue
Publication date: 25/10/2013
Field of study

This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Applications and Uses of Dental Ontologies

Author: Sadraie Marjan
Smart Paul R.
Publication venue
Publication date: 07/03/2012
Field of study

The development of a number of large-scale semantically-rich ontologies for biomedicine attests to the interest of life science researchers and clinicians in Semantic Web technologies. To date, however, the dental profession has lagged behind other areas of biomedicine in developing a commonly accepted, standardized ontology to support the representation of dental knowledge and information. This paper attempts to identify some of the potential uses of dental ontologies as part of an effort to motivate the development of ontologies for the dental domain. The identified uses of dental ontologies include support for advanced data analysis and knowledge discovery capabilities, the implementation of novel education and training technologies, the development of information exchange and interoperability solutions, the better integration of scientific and clinical evidence into clinical decision-making, and the development of better clinical decision support systems. Some of the social issues raised by these uses include the ethics of using patient data without consent, the role played by ontologies in enforcing compliance with regulatory criteria and legislative constraints, and the extent to which the advent of the Semantic Web introduces new training requirements for dental students. Some of the technological issues relate to the need to extract information from a variety of resources (for example, natural language texts), the need to automatically annotate information resources with ontology elements, and the need to establish mappings between a variety of existing dental terminologies

Southampton (e-Prints Soton)

Language technologies and the evolution of the semantic web

Author: Motta Enrico
Sabou Marta
Publication venue
Publication date: 01/01/2006
Field of study

The availability of huge amounts of semantic markup on the Web promises to enable a quantum leap in the level of support available to Web users for locating, aggregating, sharing, interpreting and customizing information. While we cannot claim that a large scale Semantic Web already exists, a number of applications have been produced, which generate and exploit semantic markup, to provide advanced search and querying functionalities, and to allow the visualization and management of heterogeneous, distributed data. While these tools provide evidence of the feasibility and tremendous potential value of the enterprise, they all suffer from major limitations, to do primarily with the limited degree of scale and heterogeneity of the semantic data they use. Nevertheless, we argue that we are at a key point in the brief history of the Semantic Web and that the very latest demonstrators already give us a glimpse of what future applications will look like. In this paper, we describe the already visible effects of these changes by analyzing the evolution of Semantic Web tools from smart databases towards applications that harness collective intelligence. We also point out that language technology plays an important role in making this evolution sustainable and we highlight the need for improved support, especially in the area of large-scale linguistic resources

CiteSeerX

Open Research Online (The Open University)

Informatics Research Institute (IRIS) May 2005 newsletter

Author: Rezgui Y
Publication venue: University of Salford, UK
Publication date: 01/05/2005
Field of study

University of Salford Institutional Repository

Tracking the History and Evolution of Entities: Entity-centric Temporal Analysis of Large Social Media Archives

Author: Fafalios Pavlos
Iosifidis Vasileios
Ntoutsi Eirini
Stefanidis Kostas
Publication venue
Publication date: 24/10/2018
Field of study

How did the popularity of the Greek Prime Minister evolve in 2015? How did the predominant sentiment about him vary during that period? Were there any controversial sub-periods? What other entities were related to him during these periods? To answer these questions, one needs to analyze archived documents and data about the query entities, such as old news articles or social media archives. In particular, user-generated content posted in social networks, like Twitter and Facebook, can be seen as a comprehensive documentation of our society, and thus meaningful analysis methods over such archived data are of immense value for sociologists, historians and other interested parties who want to study the history and evolution of entities and events. To this end, in this paper we propose an entity-centric approach to analyze social media archives and we define measures that allow studying how entities were reflected in social media in different time periods and under different aspects, like popularity, attitude, controversiality, and connectedness with other entities. A case study using a large Twitter archive of four years illustrates the insights that can be gained by such an entity-centric and multi-aspect analysis.Comment: This is a preprint of an article accepted for publication in the International Journal on Digital Libraries (2018

arXiv.org e-Print Archive

Trepo - Institutional Repository of Tampere University

BlogForever D2.4: Weblog spider prototype and associated methodology

Author: Banos V.
Gulliksen M.
Joy M.
Manolopoulos I.
Rynning M.
Stepanyan K.
Tselepidis I.
Publication venue
Publication date: 25/10/2013
Field of study

The purpose of this document is to present the evaluation of different solutions for capturing blogs, established methodology and to describe the developed blog spider prototype

ZENODO