Search CORE

462 research outputs found

Closing the Gap Between Short and Long XORs for Model Counting

Author: Chaturapruek Sorathan
Ermon Stefano
Sabharwal Ashish
Zhao Shengjia
Publication venue
Publication date: 05/03/2016
Field of study

Many recent algorithms for approximate model counting are based on a reduction to combinatorial searches over random subsets of the space defined by parity or XOR constraints. Long parity constraints (involving many variables) provide strong theoretical guarantees but are computationally difficult. Short parity constraints are easier to solve but have weaker statistical properties. It is currently not known how long these parity constraints need to be. We close the gap by providing matching necessary and sufficient conditions on the required asymptotic length of the parity constraints. Further, we provide a new family of lower bounds and the first non-trivial upper bounds on the model count that are valid for arbitrarily short XORs. We empirically demonstrate the effectiveness of these bounds on model counting benchmarks and in a Satisfiability Modulo Theory (SMT) application motivated by the analysis of contingency tables in statistics.Comment: The 30th Association for the Advancement of Artificial Intelligence (AAAI-16) Conferenc

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Recommended from our members

Practical Methods for High-Dimensional Data Publication with Differential Privacy

Author: McKenna Ryan H
Publication venue: ScholarWorks@UMass Amherst
Publication date: 16/06/2022
Field of study

In recent years, differential privacy has seen significant growth, and has been widely embraced as the dominant privacy definition by the research community. Much progress has been made on designing theoretically principled and practically sound privacy mechanisms. There have even been some real-world deployments of differential privacy, although it has not yet seen widespread adoption. One challenge is that for some problems, there is a gap between the privacy budget required to have a meaningful privacy guarantee and to retain data utility. A second challenge is that many privacy mechanisms have trouble scaling to high-dimensional data, limiting their applicability to real world data. In this work, we take significant steps towards addressing these challenges, by designing mechanisms and tools that mitigate this gap and scale effectively to high-dimensional settings. This thesis consists of three high-level contributions. In Chapt 3, we present HDMM, a mechanism for linear query answering under differential privacy that scales effectively to large multi-dimensional domains while providing more utility than a large body of prior work. In Chapter 4, we present PrivatePGM, a general-purpose post-processing tool that can estimate a discrete data distribution from noisy observations, improving the utility and scalability of many existing mechanisms at no cost to privacy. In Chapter 5, we present AIM, a mechanism for differentially private synthetic data generation, that leverages PrivatePGM to scale to high-dimensional settings, while introducing a number of novel components to overcome the utility limitations of prior work

ScholarWorks@UMass Amherst

Blunt Force Trauma to the Ribs: Creating Predictive Models

Author: Hulse Cortney Nell
Publication venue
Publication date: 28/01/2022
Field of study

Forensic anthropologists receive more requests for trauma analysis than any other aspect of the biological profile. Blunt force trauma to the ribs is some of the most common trauma recorded in a medical examiner’s setting, however the structural complexity of ribs make it difficult to move beyond descriptive documentation of injuries. The purpose of this study is to identify common rib fracture patterns, influential variables, and provide probabilistic statements to guide rib fracture interpretations. A sample of 1,415 deceased individuals with known blunt force trauma to the torso were collected from four geographically diverse medical examiner offices. Demographic data and fracture variables were recorded per individual. Frequency distributions, chi-squared tests, Kruskal-Wallis tests of independence, Dunn’s tests, and multiple correspondence analysis were employed to understand variable relationships. Conditional probabilities were calculated to provide probabilistic statements. Additionally, random forest analysis was conducted to classify location and type of fracture based on covariates. A total of 24, 853 fractures were recorded. The most common fractures were displaced and simple fractures on ribs three through seven in the anterolateral and posterolateral locations. The less common fracture patterns revealed significant relationships with demographic data and provided empirical evidence for previously untested statements. BMI had a significant relationship with location, such that fractures were more frequently recorded in lower ribs in individuals with a BMI category of obese. Age had a significant relationship with fracture type and fracture location in all analyses; younger individuals were more likely to have incomplete fractures and incur fractures anteriorly, and older individuals were more likely to have multi-fragmentary fractures. The current study indicates that rib fracture types and location are dependent on the demographics of the individual. Demographics, such as age and health of the individual inform the material properties and structural geometry of bone, which is how bone biomechanics are recommended to be incorporated into trauma analysis. Furthermore, the results from this research can be applied to motor vehicle safety research, experimental research avenues, and bioarcheological trauma analysis. Future rib fracture research should focus on including a more holistic view of an individual during the interpretation of fracture patterns

University of Nevada, Reno ScholarWorks Repository

Functional Sensory Representations of Natural Stimuli: the Case of Spatial Hearing

Author: Mlynarski Wiktor
Publication venue
Publication date: 21/01/2015
Field of study

In this thesis I attempt to explain mechanisms of neuronal coding in the auditory system as a form of adaptation to statistics of natural stereo sounds. To this end I analyse recordings of real-world auditory environments and construct novel statistical models of these data. I further compare regularities present in natural stimuli with known, experimentally observed neuronal mechanisms of spatial hearing. In a more general perspective, I use binaural auditory system as a starting point to consider the notion of function implemented by sensory neurons. In particular I argue for two, closely-related tenets: 1. The function of sensory neurons can not be fully elucidated without understanding statistics of natural stimuli they process. 2. Function of sensory representations is determined by redundancies present in the natural sensory environment. I present the evidence in support of the first tenet by describing and analysing marginal statistics of natural binaural sound. I compare observed, empirical distributions with knowledge from reductionist experiments. Such comparison allows to argue that the complexity of the spatial hearing task in the natural environment is much higher than analytic, physics-based predictions. I discuss the possibility that early brain stem circuits such as LSO and MSO do not \"compute sound localization\" as is often being claimed in the experimental literature. I propose that instead they perform a signal transformation, which constitutes the first step of a complex inference process. To support the second tenet I develop a hierarchical statistical model, which learns a joint sparse representation of amplitude and phase information from natural stereo sounds. I demonstrate that learned higher order features reproduce properties of auditory cortical neurons, when probed with spatial sounds. Reproduced aspects were hypothesized to be a manifestation of a fine-tuned computation specific to the sound-localization task. Here it is demonstrated that they rather reflect redundancies present in the natural stimulus. Taken together, results presented in this thesis suggest that efficient coding is a strategy useful for discovering structures (redundancies) in the input data. Their meaning has to be determined by the organism via environmental feedback

Qucosa - Publikationsserver der Universität Leipzig

Recessions Or Partisanship: What Explains Climate Skepticism in the U.S.?

Author: Sambatur Abhishek S
Publication venue: Digital Commons @ IWU
Publication date: 10/12/2019
Field of study

This paper investigates the variations in public mood pertaining to climate skepticism and attempts to empirically assess whether economic recessions or partisanship help explain aggregate-level trends and movements across a 16-year time horizon. Public survey data from the iPoll and Gallup Organization were used to construct the Climate Change Skeptic Index (CCSI) that served as a proxy to capture public opinion trends in skepticism across the U.S. A two-part vector autoregressive model suggests that while economic recessions might be causally linked to climate skepticism, partisanship plays a more influential role in explaining it over time. The key result is that holding all included variables constant, anti-climate change statements by Republican Congresspersons made three quarters ago raise the CCSI by 0.17 percentage points on average in the current quarter

Digital Commons @ Illinois Wesleyan University

Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization

Author: Geambasu Roxana
Huang Tzu-Kuo
Lecuyer Mathias
Sen Siddhartha
Spahn Riley
Publication venue
Publication date: 21/05/2017
Field of study

Protecting vast quantities of data poses a daunting challenge for the growing number of organizations that collect, stockpile, and monetize it. The ability to distinguish data that is actually needed from data collected "just in case" would help these organizations to limit the latter's exposure to attack. A natural approach might be to monitor data use and retain only the working-set of in-use data in accessible storage; unused data can be evicted to a highly protected store. However, many of today's big data applications rely on machine learning (ML) workloads that are periodically retrained by accessing, and thus exposing to attack, the entire data store. Training set minimization methods, such as count featurization, are often used to limit the data needed to train ML workloads to improve performance or scalability. We present Pyramid, a limited-exposure data management system that builds upon count featurization to enhance data protection. As such, Pyramid uniquely introduces both the idea and proof-of-concept for leveraging training set minimization methods to instill rigor and selectivity into big data management. We integrated Pyramid into Spark Velox, a framework for ML-based targeting and personalization. We evaluate it on three applications and show that Pyramid approaches state-of-the-art models while training on less than 1% of the raw data

arXiv.org e-Print Archive

Crossref

Recommended from our members

Fingerprinting codes and the price of approximate differential privacy

Author: Bun Mark Mar
Ullman Jonathan
Vadhan Salil P.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/04/2017
Field of study

We show new lower bounds on the sample complexity of (ε, δ)-differentially private algorithms that accurately answer large sets of counting queries. A counting query on a database D ∈ ({0, 1}d)n has the form "What fraction of the individual records in the database satisfy the property q?" We show that in order to answer an arbitrary set Q of » nd counting queries on D to within error ±α it is necessary that [EQUATION] This bound is optimal up to poly-logarithmic factors, as demonstrated by the Private Multiplicative Weights algorithm (Hardt and Rothblum, FOCS'10). It is also the first to show that the sample complexity required for (ε, δ)-differential privacy is asymptotically larger than what is required merely for accuracy, which is O(log |Q|/α2). In addition, we show that our lower bound holds for the specific case of k-way marginal queries (where |Q| = 2k(d/k)) when α is a constant. Our results rely on the existence of short fingerprinting codes (Boneh and Shaw, CRYPTO'95; Tardos, STOC'03), which we show are closely connected to the sample complexity of differentially private data release. We also give a new method for combining certain types of sample complexity lower bounds into stronger lower bounds.Engineering and Applied Science

Harvard University - DASH