12 research outputs found

    Least singular value and condition number of a square random matrix with i.i.d. rows

    Full text link
    We consider a square random matrix made by i.i.d. rows with any distribution and prove that, for any given dimension, the probability for the least singular value to be in [0; ϵ\epsilon) is at least of order ϵ\epsilon. This allows us to generalize a result about the expectation of the condition number that was proved in the case of centered gaussian i.i.d. entries: such an expectation is always infinite. Moreover, we get some additional results for some well-known random matrix ensembles, in particular for the isotropic log-concave case, which is proved to have the best behaving in terms of the well conditioning

    Tight Performance Guarantees of Imitator Policies with Continuous Actions

    Get PDF
    Behavioral Cloning (BC) aims at learning a policy that mimics the behavior demonstrated by an expert. The current theoretical understanding of BC is limited to the case of finite actions. In this paper, we study BC with the goal of providing theoretical guarantees on the performance of the imitator policy in the case of continuous actions. We start by deriving a novel bound on the performance gap based on Wasserstein distance, applicable for continuous-action experts, holding under the assumption that the value function is Lipschitz continuous. Since this latter condition is hardy fulfilled in practice, even for Lipschitz Markov Decision Processes and policies, we propose a relaxed setting, proving that value function is always H\"older continuous. This result is of independent interest and allows obtaining in BC a general bound for the performance of the imitator policy. Finally, we analyze noise injection, a common practice in which the expert’s action is executed in the environment after the application of a noise kernel. We show that this practice allows deriving stronger performance guarantees, at the price of a bias due to the noise addition

    Autoregressive Bandits

    Full text link
    Autoregressive processes naturally arise in a large variety of real-world scenarios, including e.g., stock markets, sell forecasting, weather prediction, advertising, and pricing. When addressing a sequential decision-making problem in such a context, the temporal dependence between consecutive observations should be properly accounted for converge to the optimal decision policy. In this work, we propose a novel online learning setting, named Autoregressive Bandits (ARBs), in which the observed reward follows an autoregressive process of order kk, whose parameters depend on the action the agent chooses, within a finite set of nn actions. Then, we devise an optimistic regret minimization algorithm AutoRegressive Upper Confidence Bounds (AR-UCB) that suffers regret of order O~((k+1)3/2nT(1Γ)2)\widetilde{\mathcal{O}} \left( \frac{(k+1)^{3/2}\sqrt{nT}}{(1-\Gamma)^2} \right), being TT the optimization horizon and Γ<1\Gamma < 1 an index of the stability of the system. Finally, we present a numerical validation in several synthetic and one real-world setting, in comparison with general and specific purpose bandit baselines showing the advantages of the proposed approach

    Tight Performance Guarantees of Imitator Policies with Continuous Actions

    No full text
    Behavioral Cloning (BC) aims at learning a policy that mimics the behavior demonstrated by an expert. The current theoretical understanding of BC is limited to the case of finite actions. In this paper, we study BC with the goal of providing theoretical guarantees on the performance of the imitator policy in the case of continuous actions. We start by deriving a novel bound on the performance gap based on Wasserstein distance, applicable for continuous-action experts, holding under the assumption that the value function is Lipschitz continuous. Since this latter condition is hardy fulfilled in practice, even for Lipschitz Markov Decision Processes and policies, we propose a relaxed setting, proving that value function is always H\"older continuous. This result is of independent interest and allows obtaining in BC a general bound for the performance of the imitator policy. Finally, we analyze noise injection, a common practice in which the expert's action is executed in the environment after the application of a noise kernel. We show that this practice allows deriving stronger performance guarantees, at the price of a bias due to the noise addition

    Delayed Reinforcement Learning by Imitation

    Full text link
    When the agent's observations or interactions are delayed, classic reinforcement learning tools usually fail. In this paper, we propose a simple yet new and efficient solution to this problem. We assume that, in the undelayed environment, an efficient policy is known or can be easily learned, but the task may suffer from delays in practice and we thus want to take them into account. We present a novel algorithm, Delayed Imitation with Dataset Aggregation (DIDA), which builds upon imitation learning methods to learn how to act in a delayed environment from undelayed demonstrations. We provide a theoretical analysis of the approach that will guide the practical design of DIDA. These results are also of general interest in the delayed reinforcement learning literature by providing bounds on the performance between delayed and undelayed tasks, under smoothness conditions. We show empirically that DIDA obtains high performances with a remarkable sample efficiency on a variety of tasks, including robotic locomotion, classic control, and trading

    Development of a new extraction technique and HPLC method for the analysis of non-psychoactive cannabinoids in fibre-type Cannabis sativa L. (hemp)

    No full text
    The present work was aimed at the development and validation of a new, efficient and reliable technique for the analysis of the main non-psychoactive cannabinoids in fibre-type Cannabis sativa L. (hemp) inflorescences belonging to different varieties. This study was designed to identify samples with a high content of bioactive compounds, with a view to underscoring the importance of quality control in derived products as well. Different extraction methods, including dynamic maceration (DM), ultrasound-assisted extraction (UAE), microwave-assisted extraction (MAE) and supercritical-fluid extraction (SFE) were applied and compared in order to obtain a high yield of the target analytes from hemp. Dynamic maceration for 45min with ethanol (EtOH) at room temperature proved to be the most suitable technique for the extraction of cannabinoids in hemp samples. The analysis of the target analytes in hemp extracts was carried out by developing a new reversed-phase high-performance liquid chromatography (HPLC) method coupled with diode array (UV/DAD) and electrospray ionization-mass spectrometry (ESI-MS) detection, by using an ion trap mass analyser. An Ascentis Express C18 column (150mm 73.0mm I.D., 2.7\u3bcm) was selected for the HPLC analysis, with a mobile phase composed of 0.1% formic acid in both water and acetonitrile, under gradient elution. The application of the fused-core technology allowed us to obtain a significant improvement of the HPLC performance compared with that of conventional particulate stationary phases, with a shorter analysis time and a remarkable reduction of solvent usage. The analytical method optimized in this study was fully validated to show compliance with international requirements. Furthermore, it was applied to the characterization of nine hemp samples and six hemp-based pharmaceutical products. As such, it was demonstrated to be a very useful tool for the analysis of cannabinoids in both the plant material and its derivatives for pharmaceutical and nutraceutical applications

    When Love Just Ends: An Investigation of the Relationship Between Dysfunctional Behaviors, Attachment Styles, Gender, and Education Shortly After a Relationship Dissolution

    Get PDF
    Much information is known about the long-term consequences of separation and divorce, whereas there is a paucity of studies about the short-term consequences of such experiences. This study investigates the adoption of dysfunctional behaviors (e.g., insistent telephone calls and text messages, verbal threats, and sending unwanted objects) shortly after a relationship dissolution. A total of 136 participants who declared to have been left by their former partner in the previous 6 months were included in this study (i.e., females: n = 84; males: n = 52; mean age = 30.38; SD = 4.19). Attachment styles were evaluated as explanatory variables when facing a relationship dissolution, in connection with a set of (1) demographic variables (i.e., gender, education, and current marital/relationship status), (2) dysfunctional behaviors, and (3) motivations on the basis of those behaviors. Results showed that a secure or dismissing attachment style, a higher education, and currently married (but awaiting separation) status were the protective factors in adopting such dysfunctional behaviors, while the preoccupied and fearful-avoidant subjects, especially females, tended to adopt dysfunctional behaviors (i.e., communication attempts and defamation) and reported fear of abandonment and need for attention as underlying motivations. Future study on longitudinal aspects of the relationship dissolution processes is required to have deeper insights into this phenomenon. This study sheds light on the relationship between adult attachment styles and the motivations behind the adoption of dysfunctional behaviors after a relationship dissolution
    corecore