1,731 research outputs found

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

    Preface of the Proceedings of WRAP 2004

    Get PDF

    CXQuery: A novel XML query language

    Get PDF
    XML is becoming the data exchange standard on the Internet. Previously proposed XML query languages, such as XQuery, Quilt, YALT, Lorel, and XML-QL, lack schema definition of the query result; therefore, they are limited for defining views, integrating data, updating, and further querying, all of which are often needed in e-Business applications. We propose a novel XML query language called CXQuery, which defines the schema of the query results explicitly and can easily define views, and integrate, update, and query XML data. In addition, CXQuery can express spatial and spatio-temporal queries using a constraint-based querying approach

    Knowledge Rich Natural Language Queries over Structured Biological Databases

    Full text link
    Increasingly, keyword, natural language and NoSQL queries are being used for information retrieval from traditional as well as non-traditional databases such as web, document, image, GIS, legal, and health databases. While their popularity are undeniable for obvious reasons, their engineering is far from simple. In most part, semantics and intent preserving mapping of a well understood natural language query expressed over a structured database schema to a structured query language is still a difficult task, and research to tame the complexity is intense. In this paper, we propose a multi-level knowledge-based middleware to facilitate such mappings that separate the conceptual level from the physical level. We augment these multi-level abstractions with a concept reasoner and a query strategy engine to dynamically link arbitrary natural language querying to well defined structured queries. We demonstrate the feasibility of our approach by presenting a Datalog based prototype system, called BioSmart, that can compute responses to arbitrary natural language queries over arbitrary databases once a syntactic classification of the natural language query is made

    A virtual environment to support the distributed design of large made-to-order products

    Get PDF
    An overview of a virtual design environment (virtual platform) developed as part of the European Commission funded VRShips-ROPAX (VRS) project is presented. The main objectives for the development of the virtual platform are described, followed by the discussion of the techniques chosen to address the objectives, and finally a description of a use-case for the platform. Whilst the focus of the VRS virtual platform was to facilitate the design of ROPAX (roll-on passengers and cargo) vessels, the components within the platform are entirely generic and may be applied to the distributed design of any type of vessel, or other complex made-to-order products
    • …
    corecore