2,018 research outputs found
Learning and Designing Stochastic Processes from Logical Constraints
Stochastic processes offer a flexible mathematical formalism to model and
reason about systems. Most analysis tools, however, start from the premises
that models are fully specified, so that any parameters controlling the
system's dynamics must be known exactly. As this is seldom the case, many
methods have been devised over the last decade to infer (learn) such parameters
from observations of the state of the system. In this paper, we depart from
this approach by assuming that our observations are {\it qualitative}
properties encoded as satisfaction of linear temporal logic formulae, as
opposed to quantitative observations of the state of the system. An important
feature of this approach is that it unifies naturally the system identification
and the system design problems, where the properties, instead of observations,
represent requirements to be satisfied. We develop a principled statistical
estimation procedure based on maximising the likelihood of the system's
parameters, using recent ideas from statistical machine learning. We
demonstrate the efficacy and broad applicability of our method on a range of
simple but non-trivial examples, including rumour spreading in social networks
and hybrid models of gene regulation
Bayesian Logistic Regression Model for Siting Biomass-using Facilities
Key sources of oil for western markets are located in complex geopolitical environments that increase economic and social risk. The amalgamation of economic, environmental, social and national security concerns for petroleum-based economies have created a renewed emphasis on alternative sources of energy which include biomass. The stability of sustainable biomass markets hinges on improved methods to predict and visualize business risk and cost to the supply chain.
This thesis develops Bayesian logistic regression models, with comparisons of classical maximum likelihood models, to quantify significant factors that influence the siting of biomass-using facilities and predict potential locations in the 13-state Southeastern United States for three types of biomass-using facilities. Group I combines all biomass-using mills, biorefineries using agricultural residues and wood-using bioenergy/biofuels plants. Group II included pulp and paper mills, and biorefineries that use agricultural and wood residues. Group III included food processing mills and biorefineries that use agricultural and wood residues. The resolution of this research is the 5-digit ZIP Code Tabulation Area (ZCTA), and there are 9,416 ZCTAs in the 13-state Southeastern study region.
For both classical and Bayesian approaches, a training set of data was used plus a separate validation (hold out) set of data using a pseudo-random number-generating function in SAS® Enterprise Miner. Four predefined priors are constructed. Bayesian estimation assuming a Gaussian prior distribution provides the highest correct classification rate of 86.40% for Group I; Bayesian methods assuming the non-informative uniform prior has the highest correct classification rate of 95.97% for Group II; and Bayesian methods assuming a Gaussian prior gives the highest correct classification rate of 92.67% for Group III. Given the comparative low sensitivity for Group II and Group III, a hybrid model that integrates classification trees and local Bayesian logistic regression was developed as part of this research to further improve the predictive power. The hybrid model increases the sensitivity of Group II from 58.54% to 64.40%, and improves both of the specificity and sensitivity significantly for Group III from 98.69% to 99.42% and 39.35% to 46.45%, respectively. Twenty-five optimal locations for the biomass-using facility groupings at the 5-digit ZCTA resolution, based upon the best fitted Bayesian logistic regression model and the hybrid model, are predicted and plotted for the 13-state Southeastern study region
Disguise without Disruption: Utility-Preserving Face De-Identification
With the rise of cameras and smart sensors, humanity generates an exponential
amount of data. This valuable information, including underrepresented cases
like AI in medical settings, can fuel new deep-learning tools. However, data
scientists must prioritize ensuring privacy for individuals in these untapped
datasets, especially for images or videos with faces, which are prime targets
for identification methods. Proposed solutions to de-identify such images often
compromise non-identifying facial attributes relevant to downstream tasks. In
this paper, we introduce Disguise, a novel algorithm that seamlessly
de-identifies facial images while ensuring the usability of the modified data.
Unlike previous approaches, our solution is firmly grounded in the domains of
differential privacy and ensemble-learning research. Our method involves
extracting and substituting depicted identities with synthetic ones, generated
using variational mechanisms to maximize obfuscation and non-invertibility.
Additionally, we leverage supervision from a mixture-of-experts to disentangle
and preserve other utility attributes. We extensively evaluate our method using
multiple datasets, demonstrating a higher de-identification rate and superior
consistency compared to prior approaches in various downstream tasks.Comment: Accepted at AAAI 2024. Paper + supplementary materia
- …