39 research outputs found
AcTune: Uncertainty-aware Active Self-Training for Semi-Supervised Active Learning with Pretrained Language Models
While pre-trained language model (PLM) fine-tuning has achieved strong
performance in many NLP tasks, the fine-tuning stage can be still demanding in
labeled data. Recent works have resorted to active fine-tuning to improve the
label efficiency of PLM fine-tuning, but none of them investigate the potential
of unlabeled data. We propose {\ours}, a new framework that leverages unlabeled
data to improve the label efficiency of active PLM fine-tuning. AcTune switches
between data annotation and model self-training based on uncertainty: it
selects high-uncertainty unlabeled samples for active annotation and
low-uncertainty ones for model self-training. Under this framework, we design
(1) a region-aware sampling strategy that reduces redundancy when actively
querying for annotations and (2) a momentum-based memory bank that dynamically
aggregates the model's pseudo labels to suppress label noise in self-training.
Experiments on 6 text classification datasets show that AcTune outperforms the
strongest active learning and self-training baselines and improves the label
efficiency of PLM fine-tuning by 56.2\% on average. Our implementation will be
available at \url{https://github.com/yueyu1030/actune}.Comment: NAACL 2022 Main Conference (Code:
https://github.com/yueyu1030/actune
When Rigidity Hurts: Soft Consistency Regularization for Probabilistic Hierarchical Time Series Forecasting
Probabilistic hierarchical time-series forecasting is an important variant of
time-series forecasting, where the goal is to model and forecast multivariate
time-series that have underlying hierarchical relations. Most methods focus on
point predictions and do not provide well-calibrated probabilistic forecasts
distributions. Recent state-of-art probabilistic forecasting methods also
impose hierarchical relations on point predictions and samples of distribution
which does not account for coherency of forecast distributions. Previous works
also silently assume that datasets are always consistent with given
hierarchical relations and do not adapt to real-world datasets that show
deviation from this assumption. We close both these gap and propose PROFHiT,
which is a fully probabilistic hierarchical forecasting model that jointly
models forecast distribution of entire hierarchy. PROFHiT uses a flexible
probabilistic Bayesian approach and introduces a novel Distributional Coherency
regularization to learn from hierarchical relations for entire forecast
distribution that enables robust and calibrated forecasts as well as adapt to
datasets of varying hierarchical consistency. On evaluating PROFHiT over wide
range of datasets, we observed 41-88% better performance in accuracy and
significantly better calibration. Due to modeling the coherency over full
distribution, we observed that PROFHiT can robustly provide reliable forecasts
even if up to 10% of input time-series data is missing where other methods'
performance severely degrade by over 70%.Comment: Accepted at KDD 202
End-to-End Stochastic Optimization with Energy-Based Model
Decision-focused learning (DFL) was recently proposed for stochastic
optimization problems that involve unknown parameters. By integrating
predictive modeling with an implicitly differentiable optimization layer, DFL
has shown superior performance to the standard two-stage predict-then-optimize
pipeline. However, most existing DFL methods are only applicable to convex
problems or a subset of nonconvex problems that can be easily relaxed to convex
ones. Further, they can be inefficient in training due to the requirement of
solving and differentiating through the optimization problem in every training
iteration. We propose SO-EBM, a general and efficient DFL method for stochastic
optimization using energy-based models. Instead of relying on KKT conditions to
induce an implicit optimization layer, SO-EBM explicitly parameterizes the
original optimization problem using a differentiable optimization layer based
on energy functions. To better approximate the optimization landscape, we
propose a coupled training objective that uses a maximum likelihood loss to
capture the optimum location and a distribution-based regularizer to capture
the overall energy landscape. Finally, we propose an efficient training
procedure for SO-EBM with a self-normalized importance sampler based on a
Gaussian mixture proposal. We evaluate SO-EBM in three applications: power
scheduling, COVID-19 resource allocation, and non-convex adversarial security
game, demonstrating the effectiveness and efficiency of SO-EBM.Comment: NeurIPS 2022 Ora
Autoregressive Diffusion Model for Graph Generation
Diffusion-based graph generative models have recently obtained promising
results for graph generation. However, existing diffusion-based graph
generative models are mostly one-shot generative models that apply Gaussian
diffusion in the dequantized adjacency matrix space. Such a strategy can suffer
from difficulty in model training, slow sampling speed, and incapability of
incorporating constraints. We propose an \emph{autoregressive diffusion} model
for graph generation. Unlike existing methods, we define a node-absorbing
diffusion process that operates directly in the discrete graph space. For
forward diffusion, we design a \emph{diffusion ordering network}, which learns
a data-dependent node absorbing ordering from graph topology. For reverse
generation, we design a \emph{denoising network} that uses the reverse node
ordering to efficiently reconstruct the graph by predicting the node type of
the new node and its edges with previously denoised nodes at a time. Based on
the permutation invariance of graph, we show that the two networks can be
jointly trained by optimizing a simple lower bound of data likelihood. Our
experiments on six diverse generic graph datasets and two molecule datasets
show that our model achieves better or comparable generation performance with
previous state-of-the-art, and meanwhile enjoys fast generation speed.Comment: 18 page
MUBen: Benchmarking the Uncertainty of Pre-Trained Models for Molecular Property Prediction
Large Transformer models pre-trained on massive unlabeled molecular data have
shown great success in predicting molecular properties. However, these models
can be prone to overfitting during fine-tuning, resulting in over-confident
predictions on test data that fall outside of the training distribution. To
address this issue, uncertainty quantification (UQ) methods can be used to
improve the models' calibration of predictions. Although many UQ approaches
exist, not all of them lead to improved performance. While some studies have
used UQ to improve molecular pre-trained models, the process of selecting
suitable backbone and UQ methods for reliable molecular uncertainty estimation
remains underexplored. To address this gap, we present MUBen, which evaluates
different combinations of backbone and UQ models to quantify their performance
for both property prediction and uncertainty estimation. By fine-tuning various
backbone molecular representation models using different molecular descriptors
as inputs with UQ methods from different categories, we critically assess the
influence of architectural decisions and training strategies. Our study offers
insights for selecting UQ and backbone models, which can facilitate research on
uncertainty-critical applications in fields such as materials science and drug
discovery
Energy-Efficient 60GHz Phased-Array Design for Multi-Gb/s Communication Systems
Recent advance in wireless technologies has enabled rapid growth of mobile devices. Consequently, emerging applications for mobile devices have begun demanding data rates up to multiple Gb/s. Although advanced WiFi systems are approaching such data rates, the narrow bandwidth at ISM band fundamentally limits the achievable data-rate. Therefore, the unlicensed 7GHz of bandwidth at 60GHz band provides an opportunity to efficiently implement these communication systems with a potential to achieve >10Gb/s throughput. Besides the wider bandwidth, operating at higher frequency theoretically has higher achievable signal-to-noise ratio in area limited applications. This is because the maximum achievable antenna gain within limited aperture increases with frequency and it can be achieved using phased-array technique. This thesis therefore focuses on the design of 60GHz phased-array transceivers to support energy-efficient high data-rate communication systems.Despite the advantages of 60GHz, mobile applications often require low power consumption as well as low cost implementation, making the design of 60GHz phased-array systems challenging. Taking into account the limited power budget, this research investigates the design choices of the number of elements in phased-array transceivers, and identifies that the overhead power is the bottleneck of energy efficiency. In order to reduce the overhead power in the transmitter, a new architecture using a fast start-up oscillator is proposed, which eliminates the need of explicit modulator and 60GHz LO delivery. Measurements has shown that the transmitter efficiency is boosted by more than 2X. More importantly, the overhead power is significantly reduced down to 2mW, making this architecture a good candidate for large number phased-array. On the other hand, suffering from the similar overhead problem, the receiver unfortunately could not share the same architecture. A different architecture that stacks the mixer on top of LO generation is thus proposed to reduce the power consumption in the receiver. This approach demonstrated a 2X power reduction in receiver overhead, and the resulted optimum number of receiver elements is close to 4.Besides using CMOS technologies, on-chip antenna is also studied in order to further reduce the system cost. Slot-loop antenna is identified as a good candidate because that its intrinsic ground plane eases the integration with the rest of circuitry. Although the simulation shows an efficiency as high as , the planar nature of the on-chip antenna limits its coverage in end-fire directions. Antenna diversity is thus proposed to overcome this limitation by utilizing multiple drive points on the same antenna. Because the antenna is fully integrated on-chip, antenna diversity can be implemented without extra high frequency I/Os, eliminating the loss that would be introduced otherwise.Using the proposed transceiver architectures, a 4-element phased-array with on-chip antennas was fabricated on TSMC's 65nm CMOS technology as a test vehicle. Consuming 50mW in the transmitter and 65mW in the receiver, this 10.4Gb/s phased-array covers a range larger than 45cm in all directions. This achieves a state-of-art energy-efficiency of 11pJ/bit. The 29mW/element power consumption also demonstrates the lowest power of a single phased-array element
Momentum Stiefel Optimizer, with Applications to Suitably-Orthogonal Attention, and Optimal Transport
The problem of optimization on Stiefel manifold, i.e., minimizing functions
of (not necessarily square) matrices that satisfy orthogonality constraints,
has been extensively studied, partly due to rich machine learning applications.
Yet, a new approach is proposed based on, for the first time, an interplay
between thoughtfully designed continuous and discrete dynamics. It leads to a
gradient-based optimizer with intrinsically added momentum. This method exactly
preserves the manifold structure but does not require commonly used projection
or retraction, and thus having low computational costs when compared to
existing algorithms. Its generalization to adaptive learning rates is also
demonstrated. Pleasant performances are observed in various practical tasks.
For instance, we discover that placing orthogonal constraints on attention
heads of trained-from-scratch Vision Transformer [Dosovitskiy et al. 2022]
could remarkably improve its performance, when our optimizer is used, and it is
better that each head is made orthogonal within itself but not necessarily to
other heads. This optimizer also makes the useful notion of Projection Robust
Wasserstein Distance [Paty & Cuturi 2019][Lin et al. 2020] for high-dim.
optimal transport even more effective.Comment: Comments are welcom
Be Wary of Rich Manipulators: Differences in the Performance of Different Corporate Structures in the Face of Hostile Takeovers
In light of the pressing concerns surrounding mergers and acquisitions (M&A) in recent times, the question “What sort of ownership structure is more likely to be bought in bad faith (hostile takeover)?” is addressed in this study. The disparities in company structures and the prospect of hostile takeovers are the primary topics discussed in this article. The research applies a regression model to the analysis of a substantial number of domestic M&A cases and overseas M&A cases involving Chinese firms that have occurred within the past several years. It has been discovered that businesses that have a high equity dispersion, high equity liquidity, poor operational capability of the firm, small total equity, and no dual equity structure are more susceptible to being taken over by an adversary. The findings of this study are more reliable because, in addition to taking into account local firms listed on the A-share market, it also takes into account Chinese businesses that are listed on international markets. The findings of the study can assist owners in enhancing their management practices, optimizing their equity structures, and gaining experience in warding off hostile takeovers