Search CORE

54,961 research outputs found

Data Mining

Author: Parker Julian
Sloan Terence
Yau Hon
Publication venue
Publication date: 01/01/1998
Field of study

BlogForever D2.6: Data Extraction Methodology

Author: Banos V.
Davis R.
Gkotsis G.
Pincent E.
Stepanyan K.
Publication venue
Publication date: 25/10/2013
Field of study

This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Forum Session at the First International Conference on Service Oriented Computing (ICSOC03)

Author: Aiello Marco
Bussler Chris
D'Andrea Vincenzo
Yang Jian
Publication venue
Publication date: 01/11/2003
Field of study

The First International Conference on Service Oriented Computing (ICSOC) was held in Trento, December 15-18, 2003. The focus of the conference ---Service Oriented Computing (SOC)--- is the new emerging paradigm for distributed computing and e-business processing that has evolved from object-oriented and component computing to enable building agile networks of collaborating business applications distributed within and across organizational boundaries. Of the 181 papers submitted to the ICSOC conference, 10 were selected for the forum session which took place on December the 16th, 2003. The papers were chosen based on their technical quality, originality, relevance to SOC and for their nature of being best suited for a poster presentation or a demonstration. This technical report contains the 10 papers presented during the forum session at the ICSOC conference. In particular, the last two papers in the report ere submitted as industrial papers

Unitn-eprints Research

Recommended from our members

A Semantic-based framework for discovering business process patterns

Author: Aldin L
de Cesare S
Lycett M
Publication venue: OOPSLA
Publication date: 01/01/2009
Field of study

Patterns currently play an important role in modern information systems (IS) development and their use has mainly been restricted to the design and implementation phases of the development lifecycle. Given the increasing significance of business modeling in IS development, patterns have the potential of providing a viable solution for promoting reusability of recurrent generalized models in the very early stages of development. This paper focuses on business process patterns and proposes an initial framework for the discovery and reuse of business process patterns within the IS development lifecycle. The framework synthesizes the idea from the domain engineering literature and proposes the use of semantics to drive both the discovery of patterns as well as their reuse

Brunel University Research Archive

Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets

Author: Lee M. S.
Moore A.
Publication venue
Publication date: 01/01/1997
Field of study

This paper introduces new algorithms and data structures for quick counting for machine learning datasets. We focus on the counting task of constructing contingency tables, but our approach is also applicable to counting the number of records in a dataset that match conjunctive queries. Subject to certain assumptions, the costs of these operations can be shown to be independent of the number of records in the dataset and loglinear in the number of non-zero entries in the contingency table. We provide a very sparse data structure, the ADtree, to minimize memory use. We provide analytical worst-case bounds for this structure for several models of data distribution. We empirically demonstrate that tractably-sized data structures can be produced for large real-world datasets by (a) using a sparse tree structure that never allocates memory for counts of zero, (b) never allocating memory for counts that can be deduced from other counts, and (c) not bothering to expand the tree fully near its leaves. We show how the ADtree can be used to accelerate Bayes net structure finding algorithms, rule learning algorithms, and feature selection algorithms, and we provide a number of empirical results comparing ADtree methods against traditional direct counting approaches. We also discuss the possible uses of ADtrees in other machine learning methods, and discuss the merits of ADtrees in comparison with alternative representations such as kd-trees, R-trees and Frequent Sets.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX