25,431 research outputs found

    H-TSP: Hierarchically Solving the Large-Scale Travelling Salesman Problem

    Full text link
    We propose an end-to-end learning framework based on hierarchical reinforcement learning, called H-TSP, for addressing the large-scale Travelling Salesman Problem (TSP). The proposed H-TSP constructs a solution of a TSP instance starting from the scratch relying on two components: the upper-level policy chooses a small subset of nodes (up to 200 in our experiment) from all nodes that are to be traversed, while the lower-level policy takes the chosen nodes as input and outputs a tour connecting them to the existing partial route (initially only containing the depot). After jointly training the upper-level and lower-level policies, our approach can directly generate solutions for the given TSP instances without relying on any time-consuming search procedures. To demonstrate effectiveness of the proposed approach, we have conducted extensive experiments on randomly generated TSP instances with different numbers of nodes. We show that H-TSP can achieve comparable results (gap 3.42% vs. 7.32%) as SOTA search-based approaches, and more importantly, we reduce the time consumption up to two orders of magnitude (3.32s vs. 395.85s). To the best of our knowledge, H-TSP is the first end-to-end deep reinforcement learning approach that can scale to TSP instances of up to 10000 nodes. Although there are still gaps to SOTA results with respect to solution quality, we believe that H-TSP will be useful for practical applications, particularly those that are time-sensitive e.g., on-call routing and ride hailing service.Comment: Accepted by AAAI 2023, February 202

    First constraints of dense molecular gas at z~7.5 from the quasar P\=oniu\=a'ena

    Full text link
    We report the detection of CO(6-5) and CO(7-6) and their underlying continua from the host galaxy of quasar J100758.264+211529.207 (P\=oniu\=a'ena) at z=7.5419, obtained with the NOrthern Extended Millimeter Array (NOEMA). P\=oniu\=a'ena belongs to the HYPerluminous quasars at the Epoch of ReionizatION (HYPERION) sample of 17 z>6z>6 quasars selected to be powered by supermassive black holes (SMBH) which experienced the fastest mass growth in the first Gyr of the Universe. The one reported here is the highest-redshift measurement of the cold and dense molecular gas to date. The host galaxy is unresolved and the line luminosity implies a molecular reservoir of M(H2)=(2.2±0.2)×1010\rm M(H_2)=(2.2\pm0.2)\times 10^{10} M⊙\rm M_\odot, assuming a CO spectral line energy distribution typical of high-redshift quasars and a conversion factor α=0.8\alpha=0.8 M⊙(K km s−1 pc2)−1\rm M_{\odot} (K\,km \, s^{-1} \,pc^{2})^{-1} . We model the cold dust spectral energy distribution (SED) to derive a dust mass of Mdust=(2.1±0.7)×108_{\rm dust} =(2.1\pm 0.7)\times 10^8 M⊙\rm M_\odot, and thus a gas to dust ratio ∼100\sim100. Both the gas and dust mass are not dissimilar from the reservoir found for luminous quasars at z∼6z\sim6. We use the CO detection to derive an estimate of the cosmic mass density of H2\rm H_2, ΩH2≃1.31×10−5\Omega_{H_2} \simeq 1.31 \times 10^{-5}. This value is in line with the general trend suggested by literature estimates at z<7 z < 7 and agrees fairly well with the latest theoretical expectations of non-equilibrium molecular-chemistry cosmological simulations of cold gas at early times.Comment: Submitted to ApJ Letter

    Sensitivity analysis for ReaxFF reparameterization using the Hilbert-Schmidt independence criterion

    Full text link
    We apply a global sensitivity method, the Hilbert-Schmidt independence criterion (HSIC), to the reparameterization of a Zn/S/H ReaxFF force field to identify the most appropriate parameters for reparameterization. Parameter selection remains a challenge in this context as high dimensional optimizations are prone to overfitting and take a long time, but selecting too few parameters leads to poor quality force fields. We show that the HSIC correctly and quickly identifies the most sensitive parameters, and that optimizations done using a small number of sensitive parameters outperform those done using a higher dimensional reasonable-user parameter selection. Optimizations using only sensitive parameters: 1) converge faster, 2) have loss values comparable to those found with the naive selection, 3) have similar accuracy in validation tests, and 4) do not suffer from problems of overfitting. We demonstrate that an HSIC global sensitivity is a cheap optimization pre-processing step that has both qualitative and quantitative benefits which can substantially simplify and speedup ReaxFF reparameterizations.Comment: author accepted manuscrip

    MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning

    Full text link
    We study how a principal can efficiently and effectively intervene on the rewards of a previously unseen learning agent in order to induce desirable outcomes. This is relevant to many real-world settings like auctions or taxation, where the principal may not know the learning behavior nor the rewards of real people. Moreover, the principal should be few-shot adaptable and minimize the number of interventions, because interventions are often costly. We introduce MERMAIDE, a model-based meta-learning framework to train a principal that can quickly adapt to out-of-distribution agents with different learning strategies and reward functions. We validate this approach step-by-step. First, in a Stackelberg setting with a best-response agent, we show that meta-learning enables quick convergence to the theoretically known Stackelberg equilibrium at test time, although noisy observations severely increase the sample complexity. We then show that our model-based meta-learning approach is cost-effective in intervening on bandit agents with unseen explore-exploit strategies. Finally, we outperform baselines that use either meta-learning or agent behavior modeling, in both 00-shot and K=1K=1-shot settings with partial agent information

    Reinforcement Learning Based Minimum State-flipped Control for the Reachability of Boolean Control Networks

    Full text link
    To realize reachability as well as reduce control costs of Boolean Control Networks (BCNs) with state-flipped control, a reinforcement learning based method is proposed to obtain flip kernels and the optimal policy with minimal flipping actions to realize reachability. The method proposed is model-free and of low computational complexity. In particular, Q-learning (QL), fast QL, and small memory QL are proposed to find flip kernels. Fast QL and small memory QL are two novel algorithms. Specifically, fast QL, namely, QL combined with transfer-learning and special initial states, is of higher efficiency, and small memory QL is applicable to large-scale systems. Meanwhile, we present a novel reward setting, under which the optimal policy with minimal flipping actions to realize reachability is the one of the highest returns. Then, to obtain the optimal policy, we propose QL, and fast small memory QL for large-scale systems. Specifically, on the basis of the small memory QL mentioned before, the fast small memory QL uses a changeable reward setting to speed up the learning efficiency while ensuring the optimality of the policy. For parameter settings, we give some system properties for reference. Finally, two examples, which are a small-scale system and a large-scale one, are considered to verify the proposed method

    Self-Supervised Learning to Prove Equivalence Between Straight-Line Programs via Rewrite Rules

    Full text link
    We target the problem of automatically synthesizing proofs of semantic equivalence between two programs made of sequences of statements. We represent programs using abstract syntax trees (AST), where a given set of semantics-preserving rewrite rules can be applied on a specific AST pattern to generate a transformed and semantically equivalent program. In our system, two programs are equivalent if there exists a sequence of application of these rewrite rules that leads to rewriting one program into the other. We propose a neural network architecture based on a transformer model to generate proofs of equivalence between program pairs. The system outputs a sequence of rewrites, and the validity of the sequence is simply checked by verifying it can be applied. If no valid sequence is produced by the neural network, the system reports the programs as non-equivalent, ensuring by design no programs may be incorrectly reported as equivalent. Our system is fully implemented for a given grammar which can represent straight-line programs with function calls and multiple types. To efficiently train the system to generate such sequences, we develop an original incremental training technique, named self-supervised sample selection. We extensively study the effectiveness of this novel training approach on proofs of increasing complexity and length. Our system, S4Eq, achieves 97% proof success on a curated dataset of 10,000 pairs of equivalent programsComment: 30 pages including appendi

    Offline and Online Models for Learning Pairwise Relations in Data

    Get PDF
    Pairwise relations between data points are essential for numerous machine learning algorithms. Many representation learning methods consider pairwise relations to identify the latent features and patterns in the data. This thesis, investigates learning of pairwise relations from two different perspectives: offline learning and online learning.The first part of the thesis focuses on offline learning by starting with an investigation of the performance modeling of a synchronization method in concurrent programming using a Markov chain whose state transition matrix models pairwise relations between involved cores in a computer process.Then the thesis focuses on a particular pairwise distance measure, the minimax distance, and explores memory-efficient approaches to computing this distance by proposing a hierarchical representation of the data with a linear memory requirement with respect to the number of data points, from which the exact pairwise minimax distances can be derived in a memory-efficient manner. Then, a memory-efficient sampling method is proposed that follows the aforementioned hierarchical representation of the data and samples the data points in a way that the minimax distances between all data points are maximally preserved. Finally, the thesis proposes a practical non-parametric clustering of vehicle motion trajectories to annotate traffic scenarios based on transitive relations between trajectories in an embedded space.The second part of the thesis takes an online learning perspective, and starts by presenting an online learning method for identifying bottlenecks in a road network by extracting the minimax path, where bottlenecks are considered as road segments with the highest cost, e.g., in the sense of travel time. Inspired by real-world road networks, the thesis assumes a stochastic traffic environment in which the road-specific probability distribution of travel time is unknown. Therefore, it needs to learn the parameters of the probability distribution through observations by modeling the bottleneck identification task as a combinatorial semi-bandit problem. The proposed approach takes into account the prior knowledge and follows a Bayesian approach to update the parameters. Moreover, it develops a combinatorial variant of Thompson Sampling and derives an upper bound for the corresponding Bayesian regret. Furthermore, the thesis proposes an approximate algorithm to address the respective computational intractability issue.Finally, the thesis considers contextual information of road network segments by extending the proposed model to a contextual combinatorial semi-bandit framework and investigates and develops various algorithms for this contextual combinatorial setting

    Advancing Model Pruning via Bi-level Optimization

    Full text link
    The deployment constraints in practical applications necessitate the pruning of large-scale deep learning models, i.e., promoting their weight sparsity. As illustrated by the Lottery Ticket Hypothesis (LTH), pruning also has the potential of improving their generalization ability. At the core of LTH, iterative magnitude pruning (IMP) is the predominant pruning method to successfully find 'winning tickets'. Yet, the computation cost of IMP grows prohibitively as the targeted pruning ratio increases. To reduce the computation overhead, various efficient 'one-shot' pruning methods have been developed, but these schemes are usually unable to find winning tickets as good as IMP. This raises the question of how to close the gap between pruning accuracy and pruning efficiency? To tackle it, we pursue the algorithmic advancement of model pruning. Specifically, we formulate the pruning problem from a fresh and novel viewpoint, bi-level optimization (BLO). We show that the BLO interpretation provides a technically-grounded optimization base for an efficient implementation of the pruning-retraining learning paradigm used in IMP. We also show that the proposed bi-level optimization-oriented pruning method (termed BiP) is a special class of BLO problems with a bi-linear problem structure. By leveraging such bi-linearity, we theoretically show that BiP can be solved as easily as first-order optimization, thus inheriting the computation efficiency. Through extensive experiments on both structured and unstructured pruning with 5 model architectures and 4 data sets, we demonstrate that BiP can find better winning tickets than IMP in most cases, and is computationally as efficient as the one-shot pruning schemes, demonstrating 2-7 times speedup over IMP for the same level of model accuracy and sparsity.Comment: Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022

    Reinforcement Learning-based User-centric Handover Decision-making in 5G Vehicular Networks

    Get PDF
    The advancement of 5G technologies and Vehicular Networks open a new paradigm for Intelligent Transportation Systems (ITS) in safety and infotainment services in urban and highway scenarios. Connected vehicles are vital for enabling massive data sharing and supporting such services. Consequently, a stable connection is compulsory to transmit data across the network successfully. The new 5G technology introduces more bandwidth, stability, and reliability, but it faces a low communication range, suffering from more frequent handovers and connection drops. The shift from the base station-centric view to the user-centric view helps to cope with the smaller communication range and ultra-density of 5G networks. In this thesis, we propose a series of strategies to improve connection stability through efficient handover decision-making. First, a modified probabilistic approach, M-FiVH, aimed at reducing 5G handovers and enhancing network stability. Later, an adaptive learning approach employed Connectivity-oriented SARSA Reinforcement Learning (CO-SRL) for user-centric Virtual Cell (VC) management to enable efficient handover (HO) decisions. Following that, a user-centric Factor-distinct SARSA Reinforcement Learning (FD-SRL) approach combines time series data-oriented LSTM and adaptive SRL for VC and HO management by considering both historical and real-time data. The random direction of vehicular movement, high mobility, network load, uncertain road traffic situation, and signal strength from cellular transmission towers vary from time to time and cannot always be predicted. Our proposed approaches maintain stable connections by reducing the number of HOs by selecting the appropriate size of VCs and HO management. A series of improvements demonstrated through realistic simulations showed that M-FiVH, CO-SRL, and FD-SRL were successful in reducing the number of HOs and the average cumulative HO time. We provide an analysis and comparison of several approaches and demonstrate our proposed approaches perform better in terms of network connectivity

    Neural Architecture Search: Insights from 1000 Papers

    Full text link
    In the past decade, advances in deep learning have resulted in breakthroughs in a variety of areas, including computer vision, natural language understanding, speech recognition, and reinforcement learning. Specialized, high-performing neural architectures are crucial to the success of deep learning in these areas. Neural architecture search (NAS), the process of automating the design of neural architectures for a given task, is an inevitable next step in automating machine learning and has already outpaced the best human-designed architectures on many tasks. In the past few years, research in NAS has been progressing rapidly, with over 1000 papers released since 2020 (Deng and Lindauer, 2021). In this survey, we provide an organized and comprehensive guide to neural architecture search. We give a taxonomy of search spaces, algorithms, and speedup techniques, and we discuss resources such as benchmarks, best practices, other surveys, and open-source libraries
    • …
    corecore