12 research outputs found
Least singular value and condition number of a square random matrix with i.i.d. rows
We consider a square random matrix made by i.i.d. rows with any distribution
and prove that, for any given dimension, the probability for the least singular
value to be in [0; ) is at least of order . This allows us
to generalize a result about the expectation of the condition number that was
proved in the case of centered gaussian i.i.d. entries: such an expectation is
always infinite. Moreover, we get some additional results for some well-known
random matrix ensembles, in particular for the isotropic log-concave case,
which is proved to have the best behaving in terms of the well conditioning
Tight Performance Guarantees of Imitator Policies with Continuous Actions
Behavioral Cloning (BC) aims at learning a policy that mimics the behavior demonstrated by an expert. The current theoretical understanding of BC is limited to the case of finite actions. In this paper, we study BC with the goal of providing theoretical guarantees on the performance of the imitator policy in the case of continuous actions. We start by deriving a novel bound on the performance gap based on Wasserstein distance, applicable for continuous-action experts, holding under the assumption that the value function is Lipschitz continuous. Since this latter condition is hardy fulfilled in practice, even for Lipschitz Markov Decision Processes and policies, we propose a relaxed setting, proving that value function is always H\"older continuous. This result is of independent interest and allows obtaining in BC a general bound for the performance of the imitator policy. Finally, we analyze noise injection, a common practice in which the expert’s action is executed in the environment after the application of a noise kernel. We show that this practice allows deriving stronger performance guarantees, at the price of a bias due to the noise addition
Autoregressive Bandits
Autoregressive processes naturally arise in a large variety of real-world
scenarios, including e.g., stock markets, sell forecasting, weather prediction,
advertising, and pricing. When addressing a sequential decision-making problem
in such a context, the temporal dependence between consecutive observations
should be properly accounted for converge to the optimal decision policy. In
this work, we propose a novel online learning setting, named Autoregressive
Bandits (ARBs), in which the observed reward follows an autoregressive process
of order , whose parameters depend on the action the agent chooses, within a
finite set of actions. Then, we devise an optimistic regret minimization
algorithm AutoRegressive Upper Confidence Bounds (AR-UCB) that suffers regret
of order , being the optimization
horizon and an index of the stability of the system. Finally, we
present a numerical validation in several synthetic and one real-world setting,
in comparison with general and specific purpose bandit baselines showing the
advantages of the proposed approach
Tight Performance Guarantees of Imitator Policies with Continuous Actions
Behavioral Cloning (BC) aims at learning a policy that mimics the behavior demonstrated by an expert. The current theoretical understanding of BC is limited to the case of finite actions. In this paper, we study BC with the goal of providing theoretical guarantees on the performance of the imitator policy in the case of continuous actions. We start by deriving a novel bound on the performance gap based on Wasserstein distance, applicable for continuous-action experts, holding under the assumption that the value function is Lipschitz continuous. Since this latter condition is hardy fulfilled in practice, even for Lipschitz Markov Decision Processes and policies, we propose a relaxed setting, proving that value function is always H\"older continuous. This result is of independent interest and allows obtaining in BC a general bound for the performance of the imitator policy. Finally, we analyze noise injection, a common practice in which the expert's action is executed in the environment after the application of a noise kernel. We show that this practice allows deriving stronger performance guarantees, at the price of a bias due to the noise addition
Delayed Reinforcement Learning by Imitation
When the agent's observations or interactions are delayed, classic
reinforcement learning tools usually fail. In this paper, we propose a simple
yet new and efficient solution to this problem. We assume that, in the
undelayed environment, an efficient policy is known or can be easily learned,
but the task may suffer from delays in practice and we thus want to take them
into account. We present a novel algorithm, Delayed Imitation with Dataset
Aggregation (DIDA), which builds upon imitation learning methods to learn how
to act in a delayed environment from undelayed demonstrations. We provide a
theoretical analysis of the approach that will guide the practical design of
DIDA. These results are also of general interest in the delayed reinforcement
learning literature by providing bounds on the performance between delayed and
undelayed tasks, under smoothness conditions. We show empirically that DIDA
obtains high performances with a remarkable sample efficiency on a variety of
tasks, including robotic locomotion, classic control, and trading
Development of a new extraction technique and HPLC method for the analysis of non-psychoactive cannabinoids in fibre-type Cannabis sativa L. (hemp)
The present work was aimed at the development and validation of a new, efficient and reliable technique for the analysis of the main non-psychoactive cannabinoids in fibre-type Cannabis sativa L. (hemp) inflorescences belonging to different varieties. This study was designed to identify samples with a high content of bioactive compounds, with a view to underscoring the importance of quality control in derived products as well. Different extraction methods, including dynamic maceration (DM), ultrasound-assisted extraction (UAE), microwave-assisted extraction (MAE) and supercritical-fluid extraction (SFE) were applied and compared in order to obtain a high yield of the target analytes from hemp. Dynamic maceration for 45min with ethanol (EtOH) at room temperature proved to be the most suitable technique for the extraction of cannabinoids in hemp samples. The analysis of the target analytes in hemp extracts was carried out by developing a new reversed-phase high-performance liquid chromatography (HPLC) method coupled with diode array (UV/DAD) and electrospray ionization-mass spectrometry (ESI-MS) detection, by using an ion trap mass analyser. An Ascentis Express C18 column (150mm
73.0mm I.D., 2.7\u3bcm) was selected for the HPLC analysis, with a mobile phase composed of 0.1% formic acid in both water and acetonitrile, under gradient elution. The application of the fused-core technology allowed us to obtain a significant improvement of the HPLC performance compared with that of conventional particulate stationary phases, with a shorter analysis time and a remarkable reduction of solvent usage. The analytical method optimized in this study was fully validated to show compliance with international requirements. Furthermore, it was applied to the characterization of nine hemp samples and six hemp-based pharmaceutical products. As such, it was demonstrated to be a very useful tool for the analysis of cannabinoids in both the plant material and its derivatives for pharmaceutical and nutraceutical applications
When Love Just Ends: An Investigation of the Relationship Between Dysfunctional Behaviors, Attachment Styles, Gender, and Education Shortly After a Relationship Dissolution
Much information is known about the long-term consequences of separation and divorce, whereas there is a paucity of studies about the short-term consequences of such experiences. This study investigates the adoption of dysfunctional behaviors (e.g., insistent telephone calls and text messages, verbal threats, and sending unwanted objects) shortly after a relationship dissolution. A total of 136 participants who declared to have been left by their former partner in the previous 6 months were included in this study (i.e., females: n = 84; males: n = 52; mean age = 30.38; SD = 4.19). Attachment styles were evaluated as explanatory variables when facing a relationship dissolution, in connection with a set of (1) demographic variables (i.e., gender, education, and current marital/relationship status), (2) dysfunctional behaviors, and (3) motivations on the basis of those behaviors. Results showed that a secure or dismissing attachment style, a higher education, and currently married (but awaiting separation) status were the protective factors in adopting such dysfunctional behaviors, while the preoccupied and fearful-avoidant subjects, especially females, tended to adopt dysfunctional behaviors (i.e., communication attempts and defamation) and reported fear of abandonment and need for attention as underlying motivations. Future study on longitudinal aspects of the relationship dissolution processes is required to have deeper insights into this phenomenon. This study sheds light on the relationship between adult attachment styles and the motivations behind the adoption of dysfunctional behaviors after a relationship dissolution