Search CORE

155 research outputs found

Recommended from our members

Structure identification in relational data

Author: Dechter Rina
Pearl Judea
Publication venue: eScholarship, University of California
Publication date: 08/07/1992
Field of study

This paper presents several investigations into the prospects for identifying meaningful structures in empirical data, namely, structures permitting effective organization of the data to meet requirements of future queries. We propose a general framework whereby the notion of identifiability is given a precise formal definition similar to that of learnability. Using this framework, we then explore if a tractable procedure exists for deciding whether a given relation is decomposable into a constraint network or a CNF theory with desirable topology and, if the answer is positive, identifying the desired decomposition. Finally, we address the problem of expressing a given relation as a Horn theory and, if this is impossible, finding the best k-Horn approximation to the given relation. We show that both problems can be solved in time polynomial in the length of the data

eScholarship - University of California

Empirical Risk Minimization for Probabilistic Grammars: Sample Complexity and Hardness of Learning

Author: Cohen S. B.
Smith N. A.
Publication venue
Publication date: 01/01/2012
Field of study

Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. They are used ubiquitously in computational linguistics. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of probabilistic grammars using the log-loss. We derive sample complexity bounds in this framework that apply both to the supervised setting and the unsupervised setting. By making assumptions about the underlying distribution that are appropriate for natural language scenarios, we are able to derive distribution-dependent sample complexity bounds for probabilistic grammars. We also give simple algorithms for carrying out empirical risk minimization using this framework in both the supervised and unsupervised settings. In the unsupervised case, we show that the problem of minimizing empirical risk is NP-hard. We therefore suggest an approximate algorithm, similar to expectation-maximization, to minimize the empirical risk. Learning from data is central to contemporary computational linguistics. It is in common in such learning to estimate a model in a parametric family using the maximum likelihood principle. This principle applies in the supervised case (i.e., using annotate

CiteSeerX

Edinburgh Research Explorer

A Unified Characterization of Private Learnability via Graph Theory

Author: Alon Noga
Moran Shay
Schefler Hilla
Yehudayoff Amir
Publication venue
Publication date: 08/04/2023
Field of study

We provide a unified framework for characterizing pure and approximate differentially private (DP) learnabiliity. The framework uses the language of graph theory: for a concept class

\mathcal{H}

, we define the contradiction graph

G

\mathcal{H}

. It vertices are realizable datasets, and two datasets

S,S'

are connected by an edge if they contradict each other (i.e., there is a point

x

that is labeled differently in

S

and

S'

). Our main finding is that the combinatorial structure of

G

is deeply related to learning

\mathcal{H}

under DP. Learning

\mathcal{H}

under pure DP is captured by the fractional clique number of

G

. Learning

\mathcal{H}

under approximate DP is captured by the clique number of

G

. Consequently, we identify graph-theoretic dimensions that characterize DP learnability: the clique dimension and fractional clique dimension. Along the way, we reveal properties of the contradiction graph which may be of independent interest. We also suggest several open questions and directions for future research

arXiv.org e-Print Archive

Double Coverage with Machine-Learned Advice

Author: Lindermayr Alexander
Megow Nicole
Simon Bertrand
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 13th Innovations in Theoretical Computer Science Conference (ITCS 2022)
Publication date: 16/11/2021
Field of study

We study the fundamental online k-server problem in a learning-augmented setting. While in the traditional online model, an algorithm has no information about the request sequence, we assume that there is given some advice (e.g. machine-learned predictions) on an algorithm's decision. There is, however, no guarantee on the quality of the prediction and it might be far from being correct. Our main result is a learning-augmented variation of the well-known Double Coverage algorithm for k-server on the line (Chrobak et al., SIDMA 1991) in which we integrate predictions as well as our trust into their quality. We give an error-dependent competitive ratio, which is a function of a user-defined confidence parameter, and which interpolates smoothly between an optimal consistency, the performance in case that all predictions are correct, and the best-possible robustness regardless of the prediction quality. When given good predictions, we improve upon known lower bounds for online algorithms without advice. We further show that our algorithm achieves for any k an almost optimal consistency-robustness tradeoff, within a class of deterministic algorithms respecting local and memoryless properties. Our algorithm outperforms a previously proposed (more general) learning-augmented algorithm. It is remarkable that the previous algorithm crucially exploits memory, whereas our algorithm is memoryless. Finally, we demonstrate in experiments the practicability and the superior performance of our algorithm on real-world data.Comment: Accepted at ITCS 202

arXiv.org e-Print Archive

HAL-IN2P3

Dagstuhl Research Online Publication Server

Hal-Diderot