Search CORE

13,916 research outputs found

The Hidden Web, XML and Semantic Web: A Scientific Data Management Perspective

Author: Nayak Richi
Senellart Pierre
Suchanek Fabian
Varde Aparna
Publication venue
Publication date: 01/01/2011
Field of study

The World Wide Web no longer consists just of HTML pages. Our work sheds light on a number of trends on the Internet that go beyond simple Web pages. The hidden Web provides a wealth of data in semi-structured form, accessible through Web forms and Web services. These services, as well as numerous other applications on the Web, commonly use XML, the eXtensible Markup Language. XML has become the lingua franca of the Internet that allows customized markups to be defined for specific domains. On top of XML, the Semantic Web grows as a common structured data source. In this work, we first explain each of these developments in detail. Using real-world examples from scientific domains of great interest today, we then demonstrate how these new developments can assist the managing, harvesting, and organization of data on the Web. On the way, we also illustrate the current research avenues in these domains. We believe that this effort would help bridge multiple database tracks, thereby attracting researchers with a view to extend database technology.Comment: EDBT - Tutorial (2011

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

Montclair State University Digital Commons

INRIA a CCSD electronic archive server

Queensland University of Technology ePrints Archive

Hal-Diderot

HAL-Rennes 1

BlogForever D2.6: Data Extraction Methodology

Author: Banos V.
Davis R.
Gkotsis G.
Pincent E.
Stepanyan K.
Publication venue
Publication date: 25/10/2013
Field of study

This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Research Directions, Challenges and Issues in Opinion Mining

Author: Hariharan Shanmugasundaram
Lu Joan
Sudhakaran Periakaruppan
Publication venue: 'Science and Engineering Research Support Society'
Publication date: 01/01/2013
Field of study

Rapid growth of Internet and availability of user reviews on the web for any product has provided a need for an effective system to analyze the web reviews. Such reviews are useful to some extent, promising both the customers and product manufacturers. For any popular product, the number of reviews can be in hundreds or even thousands. This creates difficulty for a customer to analyze them and make important decisions on whether to purchase the product or to not. Mining such product reviews or opinions is termed as opinion mining which is broadly classified into two main categories namely facts and opinions. Though there are several approaches for opinion mining, there remains a challenge to decide on the recommendation provided by the system. In this paper, we analyze the basics of opinion mining, challenges, pros & cons of past opinion mining systems and provide some directions for the future research work, focusing on the challenges and issues

Crossref

University of Huddersfield Repository

Recommended from our members

A Web Services Component Discovery and Deployment Architecture for Simulation Model Reuse

Author: Bell D
de Cesare S
Lycett M
Mustafee N
Taylor S J E
Publication venue
Publication date: 01/01/2006
Field of study

CSPs are widely used in industry, although have yet to operate across organizational boundaries. Reuse across organizations is restricted by the same semantic issues that restrict the inter-organization use of web services. The current representations of web components are predominantly syntactic in nature lacking the fundamental semantic underpinning required to support discovery on the emerging semantic web. Semantic models, in the form of ontology, utilized by web service discovery and deployment architecture provide one approach to support simulation model reuse. Semantic interoperation is achieved through the use of simulation component ontology to identify required components at varying levels of granularity (including both abstract and specialized components). Selected simulation components are loaded into a CSP, modified according to the requirements of the new model and executed. The paper presents the development carried out within CSPI-PDG and Fluidity Group at Brunel University, of an ontology, connector software and web service discovery architecture. The ontology is extracted from simulation scenarios involving airport, restaurant and kitchen service suppliers. The ontology engineering framework and discovery architecture provide a novel approach to inter-organization simulation, adopting a less intrusive interface between participants. Although specific to CSPs the work has wider implications for the simulation community

Brunel University Research Archive