133 research outputs found
Social Media Analytics Reporting Toolkit
With the fast growth of social media services, vast amount of user-generated content with time-space stamps are produced everyday. Considerable amount of these data are publicly available online, some of which collectively convey information that are of interest to data analysts. Social media data are dynamic and unstructured by nature, which makes it very hard for analysts to efficiently and effectively retrieve useful information. Social Media Analytics Reporting Toolkit (SMART), a system developed at Purdue VACCINE lab, aims to support such analyzing. The current framework collects real-time Twitter messages and visualizes volume densities on a map. It uses Latent Dirichilet Allocation (LDA) to extract regional topics and can optionally apply Seasonal-Trend decomposition using Loess (STL) to detect abnormal events. While Twitter has a fair amount of active users, they account for a small portion of total active social media users. Data generated by many other social media services are not currently utilized by SMART. Therefore, my work focused on expanding data sources of SAMRT system by creating means to collect data from other sources such as Facebook and Instagram. During a test run using a collection of 88 specified keywords in search, over two million Facebook posts were collected in one week. Besides, current SMART framework utilizes only one topic model, i.e. LDA, which is considered to be slower than Non-negative Matrix Factorization (NMF) model, thus I also put my effort into integrating NMF algorithm into the system. The improved SMART system can be used to fulfill a variety of analyzing tasks such as monitoring regional social media responses from different sources in disastrous events, detecting user reported crimes and so on. SMART is currently an ongoing and promising project that can be further improved by integrating new features
Defining the Resolution of a Network for Transportation Analyses: a New Method to Improve Transportation Planning Decisions
Travel demand models are important tools used in the analysis of transportation plans, projects, and policies. The modeling results are useful for transportation planners making transportation decisions and for policy makers developing transportation policies. Defining the level of detail (i.e., the number of roads) of the transport network in consistency with the travel demand model’s zone system is crucial to the accuracy of modeling results. However, travel demand modelers have not had tools to determine how much detail is needed in a transport network for a travel demand model. This dissertation seeks to fill this knowledge gap by (1) providing methodology to define an appropriate level of detail for a transport network in a given travel demand model; (2) implementing this methodology in a travel demand model in the Baltimore area; and (3) identifying how this methodology improves the modeling accuracy.
All analyses identify the spatial resolution of the transport network has great impacts on the modeling results. For example, when compared to the observed traffic data, a very detailed network underestimates traffic congestion in the Baltimore area, while a network developed by this dissertation provides a more accurate modeling result of the traffic conditions. Through the evaluation of the impacts a new transportation project has on both networks, the differences in their analysis results point out the importance of having an appropriate level of network detail for making improved planning decisions.
The results corroborate a suggested guideline concerning the development of a transport network in consistency with the travel demand model’s zone system. To conclude this dissertation, limitations are identified in data sources and methodology, based on which a plan of future studies is laid out
Masked Imitation Learning: Discovering Environment-Invariant Modalities in Multimodal Demonstrations
Multimodal demonstrations provide robots with an abundance of information to
make sense of the world. However, such abundance may not always lead to good
performance when it comes to learning sensorimotor control policies from human
demonstrations.
Extraneous data modalities can lead to state over-specification, where the
state contains modalities that are not only useless for decision-making but
also can change data distribution across environments. State over-specification
leads to issues such as the learned policy not generalizing outside of the
training data distribution.
In this work, we propose Masked Imitation Learning (MIL) to address state
over-specification by selectively using informative modalities. Specifically,
we design a masked policy network with a binary mask to block certain
modalities. We develop a bi-level optimization algorithm that learns this mask
to accurately filter over-specified modalities. We demonstrate empirically that
MIL outperforms baseline algorithms in simulated domains including MuJoCo and a
robot arm environment using the Robomimic dataset, and effectively recovers the
environment-invariant modalities on a multimodal dataset collected on a real
robot. Our project website presents supplemental details and videos of our
results at: https://tinyurl.com/masked-ilComment: 13 page
Autoregressive Diffusion Model for Graph Generation
Diffusion-based graph generative models have recently obtained promising
results for graph generation. However, existing diffusion-based graph
generative models are mostly one-shot generative models that apply Gaussian
diffusion in the dequantized adjacency matrix space. Such a strategy can suffer
from difficulty in model training, slow sampling speed, and incapability of
incorporating constraints. We propose an \emph{autoregressive diffusion} model
for graph generation. Unlike existing methods, we define a node-absorbing
diffusion process that operates directly in the discrete graph space. For
forward diffusion, we design a \emph{diffusion ordering network}, which learns
a data-dependent node absorbing ordering from graph topology. For reverse
generation, we design a \emph{denoising network} that uses the reverse node
ordering to efficiently reconstruct the graph by predicting the node type of
the new node and its edges with previously denoised nodes at a time. Based on
the permutation invariance of graph, we show that the two networks can be
jointly trained by optimizing a simple lower bound of data likelihood. Our
experiments on six diverse generic graph datasets and two molecule datasets
show that our model achieves better or comparable generation performance with
previous state-of-the-art, and meanwhile enjoys fast generation speed.Comment: 18 page
Rethinking Resource Management in Edge Learning: A Joint Pre-training and Fine-tuning Design Paradigm
In some applications, edge learning is experiencing a shift in focusing from
conventional learning from scratch to new two-stage learning unifying
pre-training and task-specific fine-tuning. This paper considers the problem of
joint communication and computation resource management in a two-stage edge
learning system. In this system, model pre-training is first conducted at an
edge server via centralized learning on local pre-stored general data, and then
task-specific fine-tuning is performed at edge devices based on the pre-trained
model via federated edge learning. For the two-stage learning model, we first
analyze the convergence behavior (in terms of the average squared gradient norm
bound), which characterizes the impacts of various system parameters such as
the number of learning rounds and batch sizes in the two stages on the
convergence rate. Based on our analytical results, we then propose a joint
communication and computation resource management design to minimize an average
squared gradient norm bound, subject to constraints on the transmit power,
overall system energy consumption, and training delay. The decision variables
include the number of learning rounds, batch sizes, clock frequencies, and
transmit power control for both pre-training and fine-tuning stages. Finally,
numerical results are provided to evaluate the effectiveness of our proposed
design. It is shown that the proposed joint resource management over the
pre-training and fine-tuning stages well balances the system performance
trade-off among the training accuracy, delay, and energy consumption. The
proposed design is also shown to effectively leverage the inherent trade-off
between pre-training and fine-tuning, which arises from the differences in data
distribution between pre-stored general data versus real-time task-specific
data, thus efficiently optimizing overall system performance
End-to-End Stochastic Optimization with Energy-Based Model
Decision-focused learning (DFL) was recently proposed for stochastic
optimization problems that involve unknown parameters. By integrating
predictive modeling with an implicitly differentiable optimization layer, DFL
has shown superior performance to the standard two-stage predict-then-optimize
pipeline. However, most existing DFL methods are only applicable to convex
problems or a subset of nonconvex problems that can be easily relaxed to convex
ones. Further, they can be inefficient in training due to the requirement of
solving and differentiating through the optimization problem in every training
iteration. We propose SO-EBM, a general and efficient DFL method for stochastic
optimization using energy-based models. Instead of relying on KKT conditions to
induce an implicit optimization layer, SO-EBM explicitly parameterizes the
original optimization problem using a differentiable optimization layer based
on energy functions. To better approximate the optimization landscape, we
propose a coupled training objective that uses a maximum likelihood loss to
capture the optimum location and a distribution-based regularizer to capture
the overall energy landscape. Finally, we propose an efficient training
procedure for SO-EBM with a self-normalized importance sampler based on a
Gaussian mixture proposal. We evaluate SO-EBM in three applications: power
scheduling, COVID-19 resource allocation, and non-convex adversarial security
game, demonstrating the effectiveness and efficiency of SO-EBM.Comment: NeurIPS 2022 Ora
Two-Phase Multi-Dose-Level PET Image Reconstruction with Dose Level Awareness
To obtain high-quality positron emission tomography (PET) while minimizing
radiation exposure, a range of methods have been designed to reconstruct
standard-dose PET (SPET) from corresponding low-dose PET (LPET) images.
However, most current methods merely learn the mapping between
single-dose-level LPET and SPET images, but omit the dose disparity of LPET
images in clinical scenarios. In this paper, to reconstruct high-quality SPET
images from multi-dose-level LPET images, we design a novel two-phase
multi-dose-level PET reconstruction algorithm with dose level awareness,
containing a pre-training phase and a SPET prediction phase. Specifically, the
pre-training phase is devised to explore both fine-grained discriminative
features and effective semantic representation. The SPET prediction phase
adopts a coarse prediction network utilizing pre-learned dose level prior to
generate preliminary result, and a refinement network to precisely preserve the
details. Experiments on MICCAI 2022 Ultra-low Dose PET Imaging Challenge
Dataset have demonstrated the superiority of our method.Comment: Accepted by ISBI202
Combining river replenishment and restrictions on groundwater pumping to achieve groundwater balance in the Juma River Plain, North China Plain
In recent years, to alleviate the decline in groundwater levels, extensive restrictions on groundwater pumping have been implemented in the North China Plain (NCP). In September 2018, a large-scale ecological water replenishment project was executed involving 22 rivers and lakes. How to adjust the layout of reduction on groundwater pumping within the context of ecological water replenishment is a key issue to be addressed in the study of groundwater level recovery in the NCP. This study adopted the Juma River Plain in Baoding city as a case study, established a numerical model of river replenishment of groundwater, predicted groundwater level changes over the next 15 years (2021–2035) and quantitatively calculated the impact of river replenishment on groundwater levels. To achieve the goal of an overall groundwater balance by 2035, a suitable groundwater pumping restriction scenario was defined based on the impact of river replenishment on groundwater levels. The results indicated that by 2035, the relative rise in groundwater levels attributed to river replenishment and restrictions on groundwater pumping could reach 3.51 and 2.28 m, respectively. River replenishment significantly impacts groundwater levels, especially those near the river. Under the current groundwater exploitation conditions, river replenishment could ensure groundwater level recovery near the river, which accounts for 15% of the total study area. The goal of an overall groundwater balance by 2035 could be achieved if restrictions on groundwater pumping were superimposed, with an average annual reduction of 56 million m3. This study provides valuable insights into groundwater management across the NCP. The proposed methods are useful for the management of other depleted aquifers recharged via ecological water replenishment
- …