63,443 research outputs found

    Web Content Mining for Information on Information Scientists

    Get PDF
    This paper presents a search system for information on scientists which was implemented prototypically for the area of information science, employing Web Content Mining techniques. The sources that are used in the implemented approach are online publication services and personal homepages of scientists. The system contains wrappers for querying the publication services and information extraction from their result pages, as well as methods for information extraction from homepages, which are based on heuristics concerning structure and composition of the pages. Moreover a specialised search technique for searching for personal homepages of information scientists was developed

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

    Template Mining for Information Extraction from Digital Documents

    Get PDF
    published or submitted for publicatio

    Data mining and fusion

    No full text

    Deliverable D2.6 LinkedTV Framework for Generating Video Enrichments with Annotations

    Get PDF
    This deliverable describes the final LinkedTV framework that provides a set of possible enrichment resources for seed video content using techniques such as text and web mining, information extraction and information retrieval technologies. The enrichment content is obtained from four type of sources: a) by crawling and indexing web sites described in a white list specified by the content partners, b) by querying the API or SPARQL endpoint of the Europeana digital library network which is publicly exposed, c) by querying multiple social networking APIs, d) by hyperlinking to other parts of TV programs within the same collection using a Solr index. This deliverable also describes an additional content annotation functionality, namely labelling enrichment (as well as seed) content with thematic topics, as well as the process of exposing content annotations to this module and to the filtering services of LinkedTV’s personalization workflow. We illustrate the enrichment workflow for the two main scenarios of LinkedTV which have lead to the development of the LinkedCulture and LinkedNews applications, which respectively use the TVEnricher and TVNewsEnricher enrichment services. The original title of this deliverable from the DoW was Advanced concept labelling by complementary Web mining

    A review of the state of the art in Machine Learning on the Semantic Web: Technical Report CSTR-05-003

    Get PDF
    • …
    corecore