2,133 research outputs found
Closing the Gap Between Short and Long XORs for Model Counting
Many recent algorithms for approximate model counting are based on a
reduction to combinatorial searches over random subsets of the space defined by
parity or XOR constraints. Long parity constraints (involving many variables)
provide strong theoretical guarantees but are computationally difficult. Short
parity constraints are easier to solve but have weaker statistical properties.
It is currently not known how long these parity constraints need to be. We
close the gap by providing matching necessary and sufficient conditions on the
required asymptotic length of the parity constraints. Further, we provide a new
family of lower bounds and the first non-trivial upper bounds on the model
count that are valid for arbitrarily short XORs. We empirically demonstrate the
effectiveness of these bounds on model counting benchmarks and in a
Satisfiability Modulo Theory (SMT) application motivated by the analysis of
contingency tables in statistics.Comment: The 30th Association for the Advancement of Artificial Intelligence
(AAAI-16) Conferenc
Quantum machine learning: a classical perspective
Recently, increased computational power and data availability, as well as
algorithmic advances, have led machine learning techniques to impressive
results in regression, classification, data-generation and reinforcement
learning tasks. Despite these successes, the proximity to the physical limits
of chip fabrication alongside the increasing size of datasets are motivating a
growing number of researchers to explore the possibility of harnessing the
power of quantum computation to speed-up classical machine learning algorithms.
Here we review the literature in quantum machine learning and discuss
perspectives for a mixed readership of classical machine learning and quantum
computation experts. Particular emphasis will be placed on clarifying the
limitations of quantum algorithms, how they compare with their best classical
counterparts and why quantum resources are expected to provide advantages for
learning problems. Learning in the presence of noise and certain
computationally hard problems in machine learning are identified as promising
directions for the field. Practical questions, like how to upload classical
data into quantum form, will also be addressed.Comment: v3 33 pages; typos corrected and references adde
Can Querying for Bias Leak Protected Attributes? Achieving Privacy With Smooth Sensitivity
Existing regulations prohibit model developers from accessing protected
attributes (gender, race, etc.), often resulting in fairness assessments on
populations without knowing their protected groups. In such scenarios,
institutions often adopt a separation between the model developers (who train
models with no access to the protected attributes) and a compliance team (who
may have access to the entire dataset for auditing purpose). However, the model
developers might be allowed to test their models for bias by querying the
compliance team for group fairness metrics. In this paper, we first demonstrate
that simply querying for fairness metrics, such as statistical parity and
equalized odds can leak the protected attributes of individuals to the model
developers. We demonstrate that there always exist strategies by which the
model developers can identify the protected attribute of a targeted individual
in the test dataset from just a single query. In particular, we show that one
can reconstruct the protected attributes of all the individuals from O(Nk log
n/Nk) queries when Nk<<n using techniques from compressed sensing (n: size of
the test dataset, Nk: size of smallest group). Our results pose an interesting
debate in algorithmic fairness: should querying for fairness metrics be viewed
as a neutral-valued solution to ensure compliance with regulations? Or, does it
constitute a violation of regulations and privacy if the number of queries
answered is enough for the model developers to identify the protected
attributes of specific individuals? To address this supposed violation, we also
propose Attribute-Conceal, a novel technique that achieves differential privacy
by calibrating noise to the smooth sensitivity of our bias query, outperforming
naive techniques such as Laplace mechanism. We also include experimental
results on the Adult dataset and synthetic data (broad range of parameters).Comment: Accepted at NeurIPS 2022 workshop on Algorithmic Fairness through the
Lens of Causality and Privac
An efficient graph generative model for navigating ultra-large combinatorial synthesis libraries
Virtual, make-on-demand chemical libraries have transformed early-stage drug
discovery by unlocking vast, synthetically accessible regions of chemical
space. Recent years have witnessed rapid growth in these libraries from
millions to trillions of compounds, hiding undiscovered, potent hits for a
variety of therapeutic targets. However, they are quickly approaching a size
beyond that which permits explicit enumeration, presenting new challenges for
virtual screening. To overcome these challenges, we propose the Combinatorial
Synthesis Library Variational Auto-Encoder (CSLVAE). The proposed generative
model represents such libraries as a differentiable, hierarchically-organized
database. Given a compound from the library, the molecular encoder constructs a
query for retrieval, which is utilized by the molecular decoder to reconstruct
the compound by first decoding its chemical reaction and subsequently decoding
its reactants. Our design minimizes autoregression in the decoder, facilitating
the generation of large, valid molecular graphs. Our method performs fast and
parallel batch inference for ultra-large synthesis libraries, enabling a number
of important applications in early-stage drug discovery. Compounds proposed by
our method are guaranteed to be in the library, and thus synthetically and
cost-effectively accessible. Importantly, CSLVAE can encode out-of-library
compounds and search for in-library analogues. In experiments, we demonstrate
the capabilities of the proposed method in the navigation of massive
combinatorial synthesis libraries.Comment: 36th Conference on Neural Information Processing Systems (NeurIPS
2022
Tools and Algorithms for the Construction and Analysis of Systems
This open access book constitutes the proceedings of the 28th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2022, which was held during April 2-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 46 full papers and 4 short papers presented in this volume were carefully reviewed and selected from 159 submissions. The proceedings also contain 16 tool papers of the affiliated competition SV-Comp and 1 paper consisting of the competition report. TACAS is a forum for researchers, developers, and users interested in rigorously based tools and algorithms for the construction and analysis of systems. The conference aims to bridge the gaps between different communities with this common interest and to support them in their quest to improve the utility, reliability, exibility, and efficiency of tools and algorithms for building computer-controlled systems
Neural Methods for Effective, Efficient, and Exposure-Aware Information Retrieval
Neural networks with deep architectures have demonstrated significant
performance improvements in computer vision, speech recognition, and natural
language processing. The challenges in information retrieval (IR), however, are
different from these other application areas. A common form of IR involves
ranking of documents--or short passages--in response to keyword-based queries.
Effective IR systems must deal with query-document vocabulary mismatch problem,
by modeling relationships between different query and document terms and how
they indicate relevance. Models should also consider lexical matches when the
query contains rare terms--such as a person's name or a product model
number--not seen during training, and to avoid retrieving semantically related
but irrelevant results. In many real-life IR tasks, the retrieval involves
extremely large collections--such as the document index of a commercial Web
search engine--containing billions of documents. Efficient IR methods should
take advantage of specialized IR data structures, such as inverted index, to
efficiently retrieve from large collections. Given an information need, the IR
system also mediates how much exposure an information artifact receives by
deciding whether it should be displayed, and where it should be positioned,
among other results. Exposure-aware IR systems may optimize for additional
objectives, besides relevance, such as parity of exposure for retrieved items
and content publishers. In this thesis, we present novel neural architectures
and methods motivated by the specific needs and challenges of IR tasks.Comment: PhD thesis, Univ College London (2020
Adaptive probability scheme for behaviour monitoring of the elderly using a specialised ambient device
A Hidden Markov Model (HMM) modified to work in combination with a Fuzzy System is utilised to determine the current behavioural state of the user from information obtained with specialised hardware. Due to the high dimensionality and not-linearly-separable nature of the Fuzzy System and the sensor data obtained with the hardware which informs the state decision, a new method is devised to update the HMM and replace the initial Fuzzy System such that subsequent state decisions are based on the most recent information. The resultant system first reduces the dimensionality of the original information by using a manifold representation in the high dimension which is unfolded in the lower dimension. The data is then linearly separable in the lower dimension where a simple linear classifier, such as the perceptron used here, is applied to determine the probability of the observations belonging to a state. Experiments using the new system verify its applicability in a real scenario
- …