148 research outputs found
An Online Decision-Theoretic Pipeline for Responder Dispatch
The problem of dispatching emergency responders to service traffic accidents,
fire, distress calls and crimes plagues urban areas across the globe. While
such problems have been extensively looked at, most approaches are offline.
Such methodologies fail to capture the dynamically changing environments under
which critical emergency response occurs, and therefore, fail to be implemented
in practice. Any holistic approach towards creating a pipeline for effective
emergency response must also look at other challenges that it subsumes -
predicting when and where incidents happen and understanding the changing
environmental dynamics. We describe a system that collectively deals with all
these problems in an online manner, meaning that the models get updated with
streaming data sources. We highlight why such an approach is crucial to the
effectiveness of emergency response, and present an algorithmic framework that
can compute promising actions for a given decision-theoretic model for
responder dispatch. We argue that carefully crafted heuristic measures can
balance the trade-off between computational time and the quality of solutions
achieved and highlight why such an approach is more scalable and tractable than
traditional approaches. We also present an online mechanism for incident
prediction, as well as an approach based on recurrent neural networks for
learning and predicting environmental features that affect responder dispatch.
We compare our methodology with prior state-of-the-art and existing dispatch
strategies in the field, which show that our approach results in a reduction in
response time with a drastic reduction in computational time.Comment: Appeared in ICCPS 201
Sharing Social Network Data: Differentially Private Estimation of Exponential-Family Random Graph Models
Motivated by a real-life problem of sharing social network data that contain
sensitive personal information, we propose a novel approach to release and
analyze synthetic graphs in order to protect privacy of individual
relationships captured by the social network while maintaining the validity of
statistical results. A case study using a version of the Enron e-mail corpus
dataset demonstrates the application and usefulness of the proposed techniques
in solving the challenging problem of maintaining privacy \emph{and} supporting
open access to network data to ensure reproducibility of existing studies and
discovering new scientific insights that can be obtained by analyzing such
data. We use a simple yet effective randomized response mechanism to generate
synthetic networks under -edge differential privacy, and then use
likelihood based inference for missing data and Markov chain Monte Carlo
techniques to fit exponential-family random graph models to the generated
synthetic networks.Comment: Updated, 39 page
BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees
The rising volume of datasets has made training machine learning (ML) models
a major computational cost in the enterprise. Given the iterative nature of
model and parameter tuning, many analysts use a small sample of their entire
data during their initial stage of analysis to make quick decisions (e.g., what
features or hyperparameters to use) and use the entire dataset only in later
stages (i.e., when they have converged to a specific model). This sampling,
however, is performed in an ad-hoc fashion. Most practitioners cannot precisely
capture the effect of sampling on the quality of their model, and eventually on
their decision-making process during the tuning phase. Moreover, without
systematic support for sampling operators, many optimizations and reuse
opportunities are lost.
In this paper, we introduce BlinkML, a system for fast, quality-guaranteed ML
training. BlinkML allows users to make error-computation tradeoffs: instead of
training a model on their full data (i.e., full model), BlinkML can quickly
train an approximate model with quality guarantees using a sample. The quality
guarantees ensure that, with high probability, the approximate model makes the
same predictions as the full model. BlinkML currently supports any ML model
that relies on maximum likelihood estimation (MLE), which includes Generalized
Linear Models (e.g., linear regression, logistic regression, max entropy
classifier, Poisson regression) as well as PPCA (Probabilistic Principal
Component Analysis). Our experiments show that BlinkML can speed up the
training of large-scale ML tasks by 6.26x-629x while guaranteeing the same
predictions, with 95% probability, as the full model.Comment: 22 pages, SIGMOD 201
Improving automatic source code summarization via deep reinforcement learning
© 2018 Association for Computing Machinery. Code summarization provides a high level natural language description of the function performed by code, as it can benefit the software maintenance, code categorization and retrieval. To the best of our knowledge, most state-of-the-art approaches follow an encoder-decoder framework which encodes the code into a hidden space and then decode it into natural language space, suffering from two major drawbacks: a) Their encoders only consider the sequential content of code, ignoring the tree structure which is also critical for the task of code summarization; b) Their decoders are typically trained to predict the next word by maximizing the likelihood of next ground-truth word with previous ground-truth word given. However, it is expected to generate the entire sequence from scratch at test time. This discrepancy can cause an exposure bias issue, making the learnt decoder suboptimal. In this paper, we incorporate an abstract syntax tree structure as well as sequential content of code snippets into a deep reinforcement learning framework (i.e., actor-critic network). The actor network provides the confidence of predicting the next word according to current state. On the other hand, the critic network evaluates the reward value of all possible extensions of the current state and can provide global guidance for explorations. We employ an advantage reward composed of BLEU metric to train both networks. Comprehensive experiments on a real-world dataset show the effectiveness of our proposed model when compared with some state-of-the-art methods
Convex optimization of programmable quantum computers
A fundamental model of quantum computation is the programmable quantum gate array. This is a quantum processor that is fed by a program state that induces a corresponding quantum operation on input states. While being programmable, any finite-dimensional design of this model is known to be non-universal, meaning that the processor cannot perfectly simulate an arbitrary quantum channel over the input. Characterizing how close the simulation is and finding the optimal program state have been open questions for the past 20 years. Here, we answer these questions by showing that the search for the optimal program state is a convex optimization problem that can be solved via semi-definite programming and gradient-based methods commonly employed for machine learning. We apply this general result to different types of processors, from a shallow design based on quantum teleportation, to deeper schemes relying on port-based teleportation and parametric quantum circuits
Recommended from our members
Primary versus secondary contributions to particle number concentrations in the European boundary layer
It is important to understand the relative contribution of primary and secondary particles to regional and global aerosol so that models can attribute aerosol radiative forcing to different sources. In large-scale models, there is considerable uncertainty associated with treatments of particle formation (nucleation) in the boundary layer (BL) and in the size distribution of emitted primary particles, leading to uncertainties in predicted cloud condensation nuclei (CCN) concentrations. Here we quantify how primary particle emissions and secondary particle formation influence size-resolved particle number concentrations in the BL using a global aerosol microphysics model and aircraft and ground site observations made during the May 2008 campaign of the European Integrated Project on Aerosol Cloud Climate Air Quality Interactions (EUCAARI). We tested four different parameterisations for BL nucleation and two assumptions for the emission size distribution of anthropogenic and wildfire carbonaceous particles. When we emit carbonaceous particles at small sizes (as recommended by the Aerosol Intercomparison project, AEROCOM), the spatial distributions of campaign-mean number concentrations of particles with diameter >50 nm (N50) and >100 nm (N100) were well captured by the model (R2≥0.8) and the normalised mean bias (NMB) was also small (−18% for N50 and −1% for N100). Emission of carbonaceous particles at larger sizes, which we consider to be more realistic for low spatial resolution global models, results in equally good correlation but larger bias (R2≥0.8, NMB = −52% and −29%), which could be partly but not entirely compensated by BL nucleation. Within the uncertainty of the observations and accounting for the uncertainty in the size of emitted primary particles, BL nucleation makes a statistically significant contribution to CCN-sized particles at less than a quarter of the ground sites. Our results show that a major source of uncertainty in CCN-sized particles in polluted European air is the emitted size of primary carbonaceous particles. New information is required not just from direct observations, but also to determine the "effective emission size" and composition of primary particles appropriate for different resolution models
SURF1 knockout cloned pigs : early onset of a severe lethal phenotype
Leigh syndrome (LS) associated with cytochrome c oxidase (COX) deficiency is an early onset, fatal mitochondrial encephalopathy, leading to multiple neurological failure and eventually death, usually in the first decade of life. Mutations in SURF1, a nuclear gene encoding a mitochondrial protein involved in COX assembly, are among the most common causes of LS. LSSURF1 patients display severe, isolated COX deficiency in all tissues, including cultured fibroblasts and skeletal muscle. Recombinant, constitutive SURF1 12/ 12 mice show diffuse COX deficiency, but fail to recapitulate the severity of the human clinical phenotype. Pigs are an attractive alternative model for human diseases, because of their size, as well as metabolic, physiological and genetic similarity to humans. Here, we determined the complete sequence of the swine SURF1 gene, disrupted it in pig primary fibroblast cell lines using both TALENs and CRISPR/Cas9 genome editing systems, before finally generating SURF1 12/ 12 and SURF1 12/+ pigs by Somatic Cell Nuclear Transfer (SCNT). SURF1 12/ 12 pigs were characterized by failure to thrive, muscle weakness and highly reduced life span with elevated perinatal mortality, compared to heterozygous SURF1 12/+ and wild type littermates. Surprisingly, no obvious COX deficiency was detected in SURF1 12/ 12 tissues, although histochemical analysis revealed the presence of COX deficiency in jejunum villi and total mRNA sequencing (RNAseq) showed that several COX subunit-encoding genes were significantly down-regulated in SURF1 12/ 12 skeletal muscles. In addition, neuropathological findings, indicated a delay in central nervous system development of newborn SURF1 12/ 12 piglets. Our results suggest a broader role of sSURF1 in mitochondrial bioenergetics
- …