303,733 research outputs found
Learning from Data with Heterogeneous Noise using SGD
We consider learning from data of variable quality that may be obtained from
different heterogeneous sources. Addressing learning from heterogeneous data in
its full generality is a challenging problem. In this paper, we adopt instead a
model in which data is observed through heterogeneous noise, where the noise
level reflects the quality of the data source. We study how to use stochastic
gradient algorithms to learn in this model. Our study is motivated by two
concrete examples where this problem arises naturally: learning with local
differential privacy based on data from multiple sources with different privacy
requirements, and learning from data with labels of variable quality.
The main contribution of this paper is to identify how heterogeneous noise
impacts performance. We show that given two datasets with heterogeneous noise,
the order in which to use them in standard SGD depends on the learning rate. We
propose a method for changing the learning rate as a function of the
heterogeneity, and prove new regret bounds for our method in two cases of
interest. Experiments on real data show that our method performs better than
using a single learning rate and using only the less noisy of the two datasets
when the noise level is low to moderate
Introducing Dynamic Behavior in Amalgamated Knowledge Bases
The problem of integrating knowledge from multiple and heterogeneous sources
is a fundamental issue in current information systems. In order to cope with
this problem, the concept of mediator has been introduced as a software
component providing intermediate services, linking data resources and
application programs, and making transparent the heterogeneity of the
underlying systems. In designing a mediator architecture, we believe that an
important aspect is the definition of a formal framework by which one is able
to model integration according to a declarative style. To this purpose, the use
of a logical approach seems very promising. Another important aspect is the
ability to model both static integration aspects, concerning query execution,
and dynamic ones, concerning data updates and their propagation among the
various data sources. Unfortunately, as far as we know, no formal proposals for
logically modeling mediator architectures both from a static and dynamic point
of view have already been developed. In this paper, we extend the framework for
amalgamated knowledge bases, presented by Subrahmanian, to deal with dynamic
aspects. The language we propose is based on the Active U-Datalog language, and
extends it with annotated logic and amalgamation concepts. We model the sources
of information and the mediator (also called supervisor) as Active U-Datalog
deductive databases, thus modeling queries, transactions, and active rules,
interpreted according to the PARK semantics. By using active rules, the system
can efficiently perform update propagation among different databases. The
result is a logical environment, integrating active and deductive rules, to
perform queries and update propagation in an heterogeneous mediated framework.Comment: Other Keywords: Deductive databases; Heterogeneous databases; Active
rules; Update
Uniform management of heterogeneous semi-structured information sources
Nowadays, data can be represented and stored by using different formats
ranging from non structured data, typical of file systems, to semi-structured
data, typical of Web sources, to highly structured data, typical of relational
database systems. Therefore, the necessity arises to define new tools and
models for uniformly handling all these heterogeneous information sources. In
this paper we propose both a framework and a conceptual model which aim at
uniformly managing information sources having different nature and structure
for obtaining a global, integrated and uniform representation. We show also
how the proposed framework and the conceptual model can be useful in many
application contexts
- …