115 research outputs found
Social Media Analytics Reporting Toolkit
With the fast growth of social media services, vast amount of user-generated content with time-space stamps are produced everyday. Considerable amount of these data are publicly available online, some of which collectively convey information that are of interest to data analysts. Social media data are dynamic and unstructured by nature, which makes it very hard for analysts to efficiently and effectively retrieve useful information. Social Media Analytics Reporting Toolkit (SMART), a system developed at Purdue VACCINE lab, aims to support such analyzing. The current framework collects real-time Twitter messages and visualizes volume densities on a map. It uses Latent Dirichilet Allocation (LDA) to extract regional topics and can optionally apply Seasonal-Trend decomposition using Loess (STL) to detect abnormal events. While Twitter has a fair amount of active users, they account for a small portion of total active social media users. Data generated by many other social media services are not currently utilized by SMART. Therefore, my work focused on expanding data sources of SAMRT system by creating means to collect data from other sources such as Facebook and Instagram. During a test run using a collection of 88 specified keywords in search, over two million Facebook posts were collected in one week. Besides, current SMART framework utilizes only one topic model, i.e. LDA, which is considered to be slower than Non-negative Matrix Factorization (NMF) model, thus I also put my effort into integrating NMF algorithm into the system. The improved SMART system can be used to fulfill a variety of analyzing tasks such as monitoring regional social media responses from different sources in disastrous events, detecting user reported crimes and so on. SMART is currently an ongoing and promising project that can be further improved by integrating new features
Defining the Resolution of a Network for Transportation Analyses: a New Method to Improve Transportation Planning Decisions
Travel demand models are important tools used in the analysis of transportation plans, projects, and policies. The modeling results are useful for transportation planners making transportation decisions and for policy makers developing transportation policies. Defining the level of detail (i.e., the number of roads) of the transport network in consistency with the travel demand model’s zone system is crucial to the accuracy of modeling results. However, travel demand modelers have not had tools to determine how much detail is needed in a transport network for a travel demand model. This dissertation seeks to fill this knowledge gap by (1) providing methodology to define an appropriate level of detail for a transport network in a given travel demand model; (2) implementing this methodology in a travel demand model in the Baltimore area; and (3) identifying how this methodology improves the modeling accuracy.
All analyses identify the spatial resolution of the transport network has great impacts on the modeling results. For example, when compared to the observed traffic data, a very detailed network underestimates traffic congestion in the Baltimore area, while a network developed by this dissertation provides a more accurate modeling result of the traffic conditions. Through the evaluation of the impacts a new transportation project has on both networks, the differences in their analysis results point out the importance of having an appropriate level of network detail for making improved planning decisions.
The results corroborate a suggested guideline concerning the development of a transport network in consistency with the travel demand model’s zone system. To conclude this dissertation, limitations are identified in data sources and methodology, based on which a plan of future studies is laid out
Masked Imitation Learning: Discovering Environment-Invariant Modalities in Multimodal Demonstrations
Multimodal demonstrations provide robots with an abundance of information to
make sense of the world. However, such abundance may not always lead to good
performance when it comes to learning sensorimotor control policies from human
demonstrations.
Extraneous data modalities can lead to state over-specification, where the
state contains modalities that are not only useless for decision-making but
also can change data distribution across environments. State over-specification
leads to issues such as the learned policy not generalizing outside of the
training data distribution.
In this work, we propose Masked Imitation Learning (MIL) to address state
over-specification by selectively using informative modalities. Specifically,
we design a masked policy network with a binary mask to block certain
modalities. We develop a bi-level optimization algorithm that learns this mask
to accurately filter over-specified modalities. We demonstrate empirically that
MIL outperforms baseline algorithms in simulated domains including MuJoCo and a
robot arm environment using the Robomimic dataset, and effectively recovers the
environment-invariant modalities on a multimodal dataset collected on a real
robot. Our project website presents supplemental details and videos of our
results at: https://tinyurl.com/masked-ilComment: 13 page
Autoregressive Diffusion Model for Graph Generation
Diffusion-based graph generative models have recently obtained promising
results for graph generation. However, existing diffusion-based graph
generative models are mostly one-shot generative models that apply Gaussian
diffusion in the dequantized adjacency matrix space. Such a strategy can suffer
from difficulty in model training, slow sampling speed, and incapability of
incorporating constraints. We propose an \emph{autoregressive diffusion} model
for graph generation. Unlike existing methods, we define a node-absorbing
diffusion process that operates directly in the discrete graph space. For
forward diffusion, we design a \emph{diffusion ordering network}, which learns
a data-dependent node absorbing ordering from graph topology. For reverse
generation, we design a \emph{denoising network} that uses the reverse node
ordering to efficiently reconstruct the graph by predicting the node type of
the new node and its edges with previously denoised nodes at a time. Based on
the permutation invariance of graph, we show that the two networks can be
jointly trained by optimizing a simple lower bound of data likelihood. Our
experiments on six diverse generic graph datasets and two molecule datasets
show that our model achieves better or comparable generation performance with
previous state-of-the-art, and meanwhile enjoys fast generation speed.Comment: 18 page
End-to-End Stochastic Optimization with Energy-Based Model
Decision-focused learning (DFL) was recently proposed for stochastic
optimization problems that involve unknown parameters. By integrating
predictive modeling with an implicitly differentiable optimization layer, DFL
has shown superior performance to the standard two-stage predict-then-optimize
pipeline. However, most existing DFL methods are only applicable to convex
problems or a subset of nonconvex problems that can be easily relaxed to convex
ones. Further, they can be inefficient in training due to the requirement of
solving and differentiating through the optimization problem in every training
iteration. We propose SO-EBM, a general and efficient DFL method for stochastic
optimization using energy-based models. Instead of relying on KKT conditions to
induce an implicit optimization layer, SO-EBM explicitly parameterizes the
original optimization problem using a differentiable optimization layer based
on energy functions. To better approximate the optimization landscape, we
propose a coupled training objective that uses a maximum likelihood loss to
capture the optimum location and a distribution-based regularizer to capture
the overall energy landscape. Finally, we propose an efficient training
procedure for SO-EBM with a self-normalized importance sampler based on a
Gaussian mixture proposal. We evaluate SO-EBM in three applications: power
scheduling, COVID-19 resource allocation, and non-convex adversarial security
game, demonstrating the effectiveness and efficiency of SO-EBM.Comment: NeurIPS 2022 Ora
Combining river replenishment and restrictions on groundwater pumping to achieve groundwater balance in the Juma River Plain, North China Plain
In recent years, to alleviate the decline in groundwater levels, extensive restrictions on groundwater pumping have been implemented in the North China Plain (NCP). In September 2018, a large-scale ecological water replenishment project was executed involving 22 rivers and lakes. How to adjust the layout of reduction on groundwater pumping within the context of ecological water replenishment is a key issue to be addressed in the study of groundwater level recovery in the NCP. This study adopted the Juma River Plain in Baoding city as a case study, established a numerical model of river replenishment of groundwater, predicted groundwater level changes over the next 15 years (2021–2035) and quantitatively calculated the impact of river replenishment on groundwater levels. To achieve the goal of an overall groundwater balance by 2035, a suitable groundwater pumping restriction scenario was defined based on the impact of river replenishment on groundwater levels. The results indicated that by 2035, the relative rise in groundwater levels attributed to river replenishment and restrictions on groundwater pumping could reach 3.51 and 2.28 m, respectively. River replenishment significantly impacts groundwater levels, especially those near the river. Under the current groundwater exploitation conditions, river replenishment could ensure groundwater level recovery near the river, which accounts for 15% of the total study area. The goal of an overall groundwater balance by 2035 could be achieved if restrictions on groundwater pumping were superimposed, with an average annual reduction of 56 million m3. This study provides valuable insights into groundwater management across the NCP. The proposed methods are useful for the management of other depleted aquifers recharged via ecological water replenishment
A photon counting reconstructive spectrometer combining metasurfaces and superconducting nanowire single-photon detectors
Faint light spectroscopy has many important applications such as fluorescence
spectroscopy, lidar and astronomical observations. However, long measurement
time limit its application on real-time measurement. In this work, a photon
counting reconstructive spectrometer combining metasurfaces and superconducting
nanowire single photon detectors (SNSPDs) was proposed. A prototype device was
fabricated on a silicon on isolator (SOI) substrate, and its performance was
characterized. Experiment results show that this device support spectral
reconstruction of mono-color lights with a resolution of 2 nm in the wavelength
region of 1500 nm ~ 1600 nm. The detection efficiency of this device is 1.4% ~
3.2% in this wavelength region. The measurement time required by this photon
counting reconstructive spectrometer was also investigated experimentally,
showing its potential to be applied in the scenarios requiring real-time
measurement
- …