1,461 research outputs found
Online Optimization Methods for the Quantification Problem
The estimation of class prevalence, i.e., the fraction of a population that
belongs to a certain class, is a very useful tool in data analytics and
learning, and finds applications in many domains such as sentiment analysis,
epidemiology, etc. For example, in sentiment analysis, the objective is often
not to estimate whether a specific text conveys a positive or a negative
sentiment, but rather estimate the overall distribution of positive and
negative sentiments during an event window. A popular way of performing the
above task, often dubbed quantification, is to use supervised learning to train
a prevalence estimator from labeled data.
Contemporary literature cites several performance measures used to measure
the success of such prevalence estimators. In this paper we propose the first
online stochastic algorithms for directly optimizing these
quantification-specific performance measures. We also provide algorithms that
optimize hybrid performance measures that seek to balance quantification and
classification performance. Our algorithms present a significant advancement in
the theory of multivariate optimization and we show, by a rigorous theoretical
analysis, that they exhibit optimal convergence. We also report extensive
experiments on benchmark and real data sets which demonstrate that our methods
significantly outperform existing optimization techniques used for these
performance measures.Comment: 26 pages, 6 figures. A short version of this manuscript will appear
in the proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery
and Data Mining, KDD 201
A sufficient condition for a number to be the order of a nonsingular derivation of a Lie algebra
A study of the set N_p of positive integers which occur as orders of
nonsingular derivations of finite-dimensional non-nilpotent Lie algebras of
characteristic p>0 was initiated by Shalev and continued by the present author.
The main goal of this paper is to show the abundance of elements of N_p. Our
main result shows that any divisor n of q-1, where q is a power of p, such that
, belongs to N_p. This extends its special
case for p=2 which was proved in a previous paper by a different method.Comment: 10 pages. This version has been revised according to a referee's
suggestions. The additions include a discussion of the (lower) density of the
set N_p, and the results of more extensive machine computations. Note that
the title has also changed. To appear in Israel J. Mat
Subgraphs and network motifs in geometric networks
Many real-world networks describe systems in which interactions decay with
the distance between nodes. Examples include systems constrained in real space
such as transportation and communication networks, as well as systems
constrained in abstract spaces such as multivariate biological or economic
datasets and models of social networks. These networks often display network
motifs: subgraphs that recur in the network much more often than in randomized
networks. To understand the origin of the network motifs in these networks, it
is important to study the subgraphs and network motifs that arise solely from
geometric constraints. To address this, we analyze geometric network models, in
which nodes are arranged on a lattice and edges are formed with a probability
that decays with the distance between nodes. We present analytical solutions
for the numbers of all 3 and 4-node subgraphs, in both directed and
non-directed geometric networks. We also analyze geometric networks with
arbitrary degree sequences, and models with a field that biases for directed
edges in one direction. Scaling rules for scaling of subgraph numbers with
system size, lattice dimension and interaction range are given. Several
invariant measures are found, such as the ratio of feedback and feed-forward
loops, which do not depend on system size, dimension or connectivity function.
We find that network motifs in many real-world networks, including social
networks and neuronal networks, are not captured solely by these geometric
models. This is in line with recent evidence that biological network motifs
were selected as basic circuit elements with defined information-processing
functions.Comment: 9 pages, 6 figure
Private Incremental Regression
Data is continuously generated by modern data sources, and a recent challenge
in machine learning has been to develop techniques that perform well in an
incremental (streaming) setting. In this paper, we investigate the problem of
private machine learning, where as common in practice, the data is not given at
once, but rather arrives incrementally over time.
We introduce the problems of private incremental ERM and private incremental
regression where the general goal is to always maintain a good empirical risk
minimizer for the history observed under differential privacy. Our first
contribution is a generic transformation of private batch ERM mechanisms into
private incremental ERM mechanisms, based on a simple idea of invoking the
private batch ERM procedure at some regular time intervals. We take this
construction as a baseline for comparison. We then provide two mechanisms for
the private incremental regression problem. Our first mechanism is based on
privately constructing a noisy incremental gradient function, which is then
used in a modified projected gradient procedure at every timestep. This
mechanism has an excess empirical risk of , where is the
dimensionality of the data. While from the results of [Bassily et al. 2014]
this bound is tight in the worst-case, we show that certain geometric
properties of the input and constraint set can be used to derive significantly
better results for certain interesting regression problems.Comment: To appear in PODS 201
Etching of random solids: hardening dynamics and self-organized fractality
When a finite volume of an etching solution comes in contact with a
disordered solid, a complex dynamics of the solid-solution interface develops.
Since only the weak parts are corroded, the solid surface hardens
progressively. If the etchant is consumed in the chemical reaction, the
corrosion dynamics slows down and stops spontaneously leaving a fractal solid
surface, which reveals the latent percolation criticality hidden in any random
system. Here we introduce and study, both analytically and numerically, a
simple model for this phenomenon. In this way we obtain a detailed description
of the process in terms of percolation theory. In particular we explain the
mechanism of hardening of the surface and connect it to Gradient Percolation.Comment: Latex, aipproc, 6 pages, 3 figures, Proceedings of 6th Granada
Seminar on Computational Physic
Male frequent attenders of general practice and their help seeking preferences
Background: Low rates of health service usage by men are commonly linked to masculine values and traditional male gender roles. However, not all men conform to these stereotypical notions of masculinity, with some men choosing to attend health services on a frequent basis, for a variety of different reasons. This study draws upon the accounts of male frequent attenders of the General Practitioner's (GP) surgery, examining their help-seeking preferences and their reasons for choosing services within general practice over other sources of support. Methods: The study extends thematic analysis of interview data from the Self Care in Primary Care study (SCinPC), a large scale multi-method evaluation study of a self care programme delivered to frequent attenders of general practice. Data were collected from 34 semi-structured interviews conducted with men prior to their exposure to the intervention. Results: The ages of interviewed men ranged from 16 to 72 years, and 91% of the sample (n= 31) stated that they had a current health condition. The thematic analysis exposed diverse perspectives within male help-seeking preferences and the decision-making behind men's choice of services. The study also draws attention to the large variation in men's knowledge of available health services, particularly alternatives to general practice. Furthermore, the data revealed some men's lack of confidence in existing alternatives to general practice. Conclusions: The study highlights the complex nature of male help-seeking preferences, and provides evidence that there should be no 'one size fits all' approach to male service provision. It also provides impetus for conducting further studies into this under researched area of interest. © 2011 WPMH GmbH
Premise Selection for Mathematics by Corpus Analysis and Kernel Methods
Smart premise selection is essential when using automated reasoning as a tool
for large-theory formal proof development. A good method for premise selection
in complex mathematical libraries is the application of machine learning to
large corpora of proofs. This work develops learning-based premise selection in
two ways. First, a newly available minimal dependency analysis of existing
high-level formal mathematical proofs is used to build a large knowledge base
of proof dependencies, providing precise data for ATP-based re-verification and
for training premise selection algorithms. Second, a new machine learning
algorithm for premise selection based on kernel methods is proposed and
implemented. To evaluate the impact of both techniques, a benchmark consisting
of 2078 large-theory mathematical problems is constructed,extending the older
MPTP Challenge benchmark. The combined effect of the techniques results in a
50% improvement on the benchmark over the Vampire/SInE state-of-the-art system
for automated reasoning in large theories.Comment: 26 page
- …