598,783 research outputs found
A Complete Characterization of Statistical Query Learning with Applications to Evolvability
Statistical query (SQ) learning model of Kearns (1993) is a natural
restriction of the PAC learning model in which a learning algorithm is allowed
to obtain estimates of statistical properties of the examples but cannot see
the examples themselves. We describe a new and simple characterization of the
query complexity of learning in the SQ learning model. Unlike the previously
known bounds on SQ learning our characterization preserves the accuracy and the
efficiency of learning. The preservation of accuracy implies that that our
characterization gives the first characterization of SQ learning in the
agnostic learning framework. The preservation of efficiency is achieved using a
new boosting technique and allows us to derive a new approach to the design of
evolutionary algorithms in Valiant's (2006) model of evolvability. We use this
approach to demonstrate the existence of a large class of monotone evolutionary
learning algorithms based on square loss performance estimation. These results
differ significantly from the few known evolutionary algorithms and give
evidence that evolvability in Valiant's model is a more versatile phenomenon
than there had been previous reason to suspect.Comment: Simplified Lemma 3.8 and it's application
SACOC: A spectral-based ACO clustering algorithm
The application of ACO-based algorithms in data mining is growing over the last few years and several supervised and unsupervised learning algorithms have been developed using this bio-inspired approach. Most recent works concerning unsupervised learning have been focused on clustering, where ACO-based techniques have showed a great potential. At the same time, new clustering techniques that seek the continuity of data, specially focused on spectral-based approaches in opposition to classical centroid-based approaches, have attracted an increasing research interest–an area still under study by ACO clustering techniques. This work presents a hybrid spectral-based ACO clustering algorithm inspired by the ACO Clustering (ACOC) algorithm. The proposed approach combines ACOC with the spectral Laplacian to generate a new search space for the algorithm in order to obtain more promising solutions. The new algorithm, called SACOC, has been compared against well-known algorithms (K-means and Spectral Clustering) and with ACOC. The experiments measure the accuracy of the algorithm for both synthetic datasets and real-world datasets extracted from the UCI Machine Learning Repository
An automated ETL for online datasets
While using online datasets for machine learning is commonplace today, the quality of these datasets impacts on the performance
of prediction algorithms. One method for improving the semantics of new data sources is to map these sources to a common
data model or ontology. While semantic and structural heterogeneities must still be resolved, this provides a well established
approach to providing clean datasets, suitable for machine learning and analysis. However, when there is a requirement for a
close to real time usage of online data, a method for dynamic Extract-Transform-Load of new sources data must be developed.
In this work, we present a framework for integrating online and enterprise data sources, in close to real time, to provide
datasets for machine learning and predictive algorithms. An exhaustive evaluation compares a human built data transformation
process with our system’s machine generated ETL process, with very favourable results, illustrating the value and impact of
an automated approach
An automated ETL for online datasets
While using online datasets for machine learning is commonplace today, the quality of these datasets impacts on the performance
of prediction algorithms. One method for improving the semantics of new data sources is to map these sources to a common
data model or ontology. While semantic and structural heterogeneities must still be resolved, this provides a well established
approach to providing clean datasets, suitable for machine learning and analysis. However, when there is a requirement for a
close to real time usage of online data, a method for dynamic Extract-Transform-Load of new sources data must be developed.
In this work, we present a framework for integrating online and enterprise data sources, in close to real time, to provide
datasets for machine learning and predictive algorithms. An exhaustive evaluation compares a human built data transformation
process with our system’s machine generated ETL process, with very favourable results, illustrating the value and impact of
an automated approach
An automated ETL for online datasets
While using online datasets for machine learning is commonplace today, the quality of these datasets impacts on the performance
of prediction algorithms. One method for improving the semantics of new data sources is to map these sources to a common
data model or ontology. While semantic and structural heterogeneities must still be resolved, this provides a well established
approach to providing clean datasets, suitable for machine learning and analysis. However, when there is a requirement for a
close to real time usage of online data, a method for dynamic Extract-Transform-Load of new sources data must be developed.
In this work, we present a framework for integrating online and enterprise data sources, in close to real time, to provide
datasets for machine learning and predictive algorithms. An exhaustive evaluation compares a human built data transformation
process with our system’s machine generated ETL process, with very favourable results, illustrating the value and impact of
an automated approach
- …