39 research outputs found

    AcTune: Uncertainty-aware Active Self-Training for Semi-Supervised Active Learning with Pretrained Language Models

    Full text link
    While pre-trained language model (PLM) fine-tuning has achieved strong performance in many NLP tasks, the fine-tuning stage can be still demanding in labeled data. Recent works have resorted to active fine-tuning to improve the label efficiency of PLM fine-tuning, but none of them investigate the potential of unlabeled data. We propose {\ours}, a new framework that leverages unlabeled data to improve the label efficiency of active PLM fine-tuning. AcTune switches between data annotation and model self-training based on uncertainty: it selects high-uncertainty unlabeled samples for active annotation and low-uncertainty ones for model self-training. Under this framework, we design (1) a region-aware sampling strategy that reduces redundancy when actively querying for annotations and (2) a momentum-based memory bank that dynamically aggregates the model's pseudo labels to suppress label noise in self-training. Experiments on 6 text classification datasets show that AcTune outperforms the strongest active learning and self-training baselines and improves the label efficiency of PLM fine-tuning by 56.2\% on average. Our implementation will be available at \url{https://github.com/yueyu1030/actune}.Comment: NAACL 2022 Main Conference (Code: https://github.com/yueyu1030/actune

    When Rigidity Hurts: Soft Consistency Regularization for Probabilistic Hierarchical Time Series Forecasting

    Full text link
    Probabilistic hierarchical time-series forecasting is an important variant of time-series forecasting, where the goal is to model and forecast multivariate time-series that have underlying hierarchical relations. Most methods focus on point predictions and do not provide well-calibrated probabilistic forecasts distributions. Recent state-of-art probabilistic forecasting methods also impose hierarchical relations on point predictions and samples of distribution which does not account for coherency of forecast distributions. Previous works also silently assume that datasets are always consistent with given hierarchical relations and do not adapt to real-world datasets that show deviation from this assumption. We close both these gap and propose PROFHiT, which is a fully probabilistic hierarchical forecasting model that jointly models forecast distribution of entire hierarchy. PROFHiT uses a flexible probabilistic Bayesian approach and introduces a novel Distributional Coherency regularization to learn from hierarchical relations for entire forecast distribution that enables robust and calibrated forecasts as well as adapt to datasets of varying hierarchical consistency. On evaluating PROFHiT over wide range of datasets, we observed 41-88% better performance in accuracy and significantly better calibration. Due to modeling the coherency over full distribution, we observed that PROFHiT can robustly provide reliable forecasts even if up to 10% of input time-series data is missing where other methods' performance severely degrade by over 70%.Comment: Accepted at KDD 202

    End-to-End Stochastic Optimization with Energy-Based Model

    Full text link
    Decision-focused learning (DFL) was recently proposed for stochastic optimization problems that involve unknown parameters. By integrating predictive modeling with an implicitly differentiable optimization layer, DFL has shown superior performance to the standard two-stage predict-then-optimize pipeline. However, most existing DFL methods are only applicable to convex problems or a subset of nonconvex problems that can be easily relaxed to convex ones. Further, they can be inefficient in training due to the requirement of solving and differentiating through the optimization problem in every training iteration. We propose SO-EBM, a general and efficient DFL method for stochastic optimization using energy-based models. Instead of relying on KKT conditions to induce an implicit optimization layer, SO-EBM explicitly parameterizes the original optimization problem using a differentiable optimization layer based on energy functions. To better approximate the optimization landscape, we propose a coupled training objective that uses a maximum likelihood loss to capture the optimum location and a distribution-based regularizer to capture the overall energy landscape. Finally, we propose an efficient training procedure for SO-EBM with a self-normalized importance sampler based on a Gaussian mixture proposal. We evaluate SO-EBM in three applications: power scheduling, COVID-19 resource allocation, and non-convex adversarial security game, demonstrating the effectiveness and efficiency of SO-EBM.Comment: NeurIPS 2022 Ora

    Autoregressive Diffusion Model for Graph Generation

    Full text link
    Diffusion-based graph generative models have recently obtained promising results for graph generation. However, existing diffusion-based graph generative models are mostly one-shot generative models that apply Gaussian diffusion in the dequantized adjacency matrix space. Such a strategy can suffer from difficulty in model training, slow sampling speed, and incapability of incorporating constraints. We propose an \emph{autoregressive diffusion} model for graph generation. Unlike existing methods, we define a node-absorbing diffusion process that operates directly in the discrete graph space. For forward diffusion, we design a \emph{diffusion ordering network}, which learns a data-dependent node absorbing ordering from graph topology. For reverse generation, we design a \emph{denoising network} that uses the reverse node ordering to efficiently reconstruct the graph by predicting the node type of the new node and its edges with previously denoised nodes at a time. Based on the permutation invariance of graph, we show that the two networks can be jointly trained by optimizing a simple lower bound of data likelihood. Our experiments on six diverse generic graph datasets and two molecule datasets show that our model achieves better or comparable generation performance with previous state-of-the-art, and meanwhile enjoys fast generation speed.Comment: 18 page

    MUBen: Benchmarking the Uncertainty of Pre-Trained Models for Molecular Property Prediction

    Full text link
    Large Transformer models pre-trained on massive unlabeled molecular data have shown great success in predicting molecular properties. However, these models can be prone to overfitting during fine-tuning, resulting in over-confident predictions on test data that fall outside of the training distribution. To address this issue, uncertainty quantification (UQ) methods can be used to improve the models' calibration of predictions. Although many UQ approaches exist, not all of them lead to improved performance. While some studies have used UQ to improve molecular pre-trained models, the process of selecting suitable backbone and UQ methods for reliable molecular uncertainty estimation remains underexplored. To address this gap, we present MUBen, which evaluates different combinations of backbone and UQ models to quantify their performance for both property prediction and uncertainty estimation. By fine-tuning various backbone molecular representation models using different molecular descriptors as inputs with UQ methods from different categories, we critically assess the influence of architectural decisions and training strategies. Our study offers insights for selecting UQ and backbone models, which can facilitate research on uncertainty-critical applications in fields such as materials science and drug discovery

    Energy-Efficient 60GHz Phased-Array Design for Multi-Gb/s Communication Systems

    No full text
    Recent advance in wireless technologies has enabled rapid growth of mobile devices. Consequently, emerging applications for mobile devices have begun demanding data rates up to multiple Gb/s. Although advanced WiFi systems are approaching such data rates, the narrow bandwidth at ISM band fundamentally limits the achievable data-rate. Therefore, the unlicensed 7GHz of bandwidth at 60GHz band provides an opportunity to efficiently implement these communication systems with a potential to achieve >10Gb/s throughput. Besides the wider bandwidth, operating at higher frequency theoretically has higher achievable signal-to-noise ratio in area limited applications. This is because the maximum achievable antenna gain within limited aperture increases with frequency and it can be achieved using phased-array technique. This thesis therefore focuses on the design of 60GHz phased-array transceivers to support energy-efficient high data-rate communication systems.Despite the advantages of 60GHz, mobile applications often require low power consumption as well as low cost implementation, making the design of 60GHz phased-array systems challenging. Taking into account the limited power budget, this research investigates the design choices of the number of elements in phased-array transceivers, and identifies that the overhead power is the bottleneck of energy efficiency. In order to reduce the overhead power in the transmitter, a new architecture using a fast start-up oscillator is proposed, which eliminates the need of explicit modulator and 60GHz LO delivery. Measurements has shown that the transmitter efficiency is boosted by more than 2X. More importantly, the overhead power is significantly reduced down to 2mW, making this architecture a good candidate for large number phased-array. On the other hand, suffering from the similar overhead problem, the receiver unfortunately could not share the same architecture. A different architecture that stacks the mixer on top of LO generation is thus proposed to reduce the power consumption in the receiver. This approach demonstrated a 2X power reduction in receiver overhead, and the resulted optimum number of receiver elements is close to 4.Besides using CMOS technologies, on-chip antenna is also studied in order to further reduce the system cost. Slot-loop antenna is identified as a good candidate because that its intrinsic ground plane eases the integration with the rest of circuitry. Although the simulation shows an efficiency as high as 30%30\%, the planar nature of the on-chip antenna limits its coverage in end-fire directions. Antenna diversity is thus proposed to overcome this limitation by utilizing multiple drive points on the same antenna. Because the antenna is fully integrated on-chip, antenna diversity can be implemented without extra high frequency I/Os, eliminating the loss that would be introduced otherwise.Using the proposed transceiver architectures, a 4-element phased-array with on-chip antennas was fabricated on TSMC's 65nm CMOS technology as a test vehicle. Consuming 50mW in the transmitter and 65mW in the receiver, this 10.4Gb/s phased-array covers a range larger than 45cm in all directions. This achieves a state-of-art energy-efficiency of 11pJ/bit. The 29mW/element power consumption also demonstrates the lowest power of a single phased-array element

    Momentum Stiefel Optimizer, with Applications to Suitably-Orthogonal Attention, and Optimal Transport

    Full text link
    The problem of optimization on Stiefel manifold, i.e., minimizing functions of (not necessarily square) matrices that satisfy orthogonality constraints, has been extensively studied, partly due to rich machine learning applications. Yet, a new approach is proposed based on, for the first time, an interplay between thoughtfully designed continuous and discrete dynamics. It leads to a gradient-based optimizer with intrinsically added momentum. This method exactly preserves the manifold structure but does not require commonly used projection or retraction, and thus having low computational costs when compared to existing algorithms. Its generalization to adaptive learning rates is also demonstrated. Pleasant performances are observed in various practical tasks. For instance, we discover that placing orthogonal constraints on attention heads of trained-from-scratch Vision Transformer [Dosovitskiy et al. 2022] could remarkably improve its performance, when our optimizer is used, and it is better that each head is made orthogonal within itself but not necessarily to other heads. This optimizer also makes the useful notion of Projection Robust Wasserstein Distance [Paty & Cuturi 2019][Lin et al. 2020] for high-dim. optimal transport even more effective.Comment: Comments are welcom

    Be Wary of Rich Manipulators: Differences in the Performance of Different Corporate Structures in the Face of Hostile Takeovers

    No full text
    In light of the pressing concerns surrounding mergers and acquisitions (M&A) in recent times, the question “What sort of ownership structure is more likely to be bought in bad faith (hostile takeover)?” is addressed in this study. The disparities in company structures and the prospect of hostile takeovers are the primary topics discussed in this article. The research applies a regression model to the analysis of a substantial number of domestic M&A cases and overseas M&A cases involving Chinese firms that have occurred within the past several years. It has been discovered that businesses that have a high equity dispersion, high equity liquidity, poor operational capability of the firm, small total equity, and no dual equity structure are more susceptible to being taken over by an adversary. The findings of this study are more reliable because, in addition to taking into account local firms listed on the A-share market, it also takes into account Chinese businesses that are listed on international markets. The findings of the study can assist owners in enhancing their management practices, optimizing their equity structures, and gaining experience in warding off hostile takeovers
    corecore