544 research outputs found

    Zero-shot Preference Learning for Offline RL via Optimal Transport

    Full text link
    Preference-based Reinforcement Learning (PbRL) has demonstrated remarkable efficacy in aligning rewards with human intentions. However, a significant challenge lies in the need of substantial human labels, which is costly and time-consuming. Additionally, the expensive preference data obtained from prior tasks is not typically reusable for subsequent task learning, leading to extensive labeling for each new task. In this paper, we propose a novel zero-shot preference-based RL algorithm that leverages labeled preference data from source tasks to infer labels for target tasks, eliminating the requirement for human queries. Our approach utilizes Gromov-Wasserstein distance to align trajectory distributions between source and target tasks. The solved optimal transport matrix serves as a correspondence between trajectories of two tasks, making it possible to identify corresponding trajectory pairs between tasks and transfer the preference labels. However, learning directly from inferred labels that contains a fraction of noisy labels will result in an inaccurate reward function, subsequently affecting policy performance. To this end, we introduce Robust Preference Transformer, which models the rewards as Gaussian distributions and incorporates reward uncertainty in addition to reward mean. The empirical results on robotic manipulation tasks of Meta-World and Robomimic show that our method has strong capabilities of transferring preferences between tasks and learns reward functions from noisy labels robustly. Furthermore, we reveal that our method attains near-oracle performance with a small proportion of scripted labels

    Privacy in rfid and mobile objects

    Get PDF
    Los sistemas RFID permiten la identificación rápida y automática de etiquetas RFID a través de un canal de comunicación inalámbrico. Dichas etiquetas son dispositivos con cierto poder de cómputo y capacidad de almacenamiento de información. Es por ello que los objetos que contienen una etiqueta RFID adherida permiten la lectura de una cantidad rica y variada de datos que los describen y caracterizan, por ejemplo, un código único de identificación, el nombre, el modelo o la fecha de expiración. Además, esta información puede ser leída sin la necesidad de un contacto visual entre el lector y la etiqueta, lo cual agiliza considerablemente los procesos de inventariado, identificación, o control automático. Para que el uso de la tecnología RFID se generalice con éxito, es conveniente cumplir con varios objetivos: eficiencia, seguridad y protección de la privacidad. Sin embargo, el diseño de protocolos de identificación seguros, privados, y escalables es un reto difícil de abordar dada las restricciones computacionales de las etiquetas RFID y su naturaleza inalámbrica. Es por ello que, en la presente tesis, partimos de protocolos de identificación seguros y privados, y mostramos cómo se puede lograr escalabilidad mediante una arquitectura distribuida y colaborativa. De este modo, la seguridad y la privacidad se alcanzan mediante el propio protocolo de identificación, mientras que la escalabilidad se logra por medio de novedosos métodos colaborativos que consideran la posición espacial y temporal de las etiquetas RFID. Independientemente de los avances en protocolos inalámbricos de identificación, existen ataques que pueden superar exitosamente cualquiera de estos protocolos sin necesidad de conocer o descubrir claves secretas válidas ni de encontrar vulnerabilidades en sus implementaciones criptográficas. La idea de estos ataques, conocidos como ataques de “relay”, consiste en crear inadvertidamente un puente de comunicación entre una etiqueta legítima y un lector legítimo. De este modo, el adversario usa los derechos de la etiqueta legítima para pasar el protocolo de autenticación usado por el lector. Nótese que, dada la naturaleza inalámbrica de los protocolos RFID, este tipo de ataques representa una amenaza importante a la seguridad en sistemas RFID. En esta tesis proponemos un nuevo protocolo que además de autenticación realiza un chequeo de la distancia a la cual se encuentran el lector y la etiqueta. Este tipo de protocolos se conocen como protocolos de acotación de distancia, los cuales no impiden este tipo de ataques, pero sí pueden frustrarlos con alta probabilidad. Por último, afrontamos los problemas de privacidad asociados con la publicación de información recogida a través de sistemas RFID. En particular, nos concentramos en datos de movilidad que también pueden ser proporcionados por otros sistemas ampliamente usados tales como el sistema de posicionamiento global (GPS) y el sistema global de comunicaciones móviles. Nuestra solución se basa en la conocida noción de k-anonimato, alcanzada mediante permutaciones y microagregación. Para este fin, definimos una novedosa función de distancia entre trayectorias con la cual desarrollamos dos métodos diferentes de anonimización de trayectorias.Els sistemes RFID permeten la identificació ràpida i automàtica d’etiquetes RFID a través d’un canal de comunicació sense fils. Aquestes etiquetes són dispositius amb cert poder de còmput i amb capacitat d’emmagatzematge de informació. Es per això que els objectes que porten una etiqueta RFID adherida permeten la lectura d’una quantitat rica i variada de dades que els descriuen i caracteritzen, com per exemple un codi únic d’identificació, el nom, el model o la data d’expiració. A més, aquesta informació pot ser llegida sense la necessitat d’un contacte visual entre el lector i l’etiqueta, la qual cosa agilitza considerablement els processos d’inventariat, identificació o control automàtic. Per a que l’ús de la tecnologia RFID es generalitzi amb èxit, es convenient complir amb diversos objectius: eficiència, seguretat i protecció de la privacitat. No obstant això, el disseny de protocols d’identificació segurs, privats i escalables, es un repte difícil d’abordar dades les restriccions computacionals de les etiquetes RFID i la seva naturalesa sense fils. Es per això que, en la present tesi, partim de protocols d’identificació segurs i privats, i mostrem com es pot aconseguir escalabilitat mitjançant una arquitectura distribuïda i col•laborativa. D’aquesta manera, la seguretat i la privacitat s’aconsegueixen mitjançant el propi protocol d’identificació, mentre que l’escalabilitat s’aconsegueix per mitjà de nous protocols col•laboratius que consideren la posició espacial i temporal de les etiquetes RFID. Independentment dels avenços en protocols d’identificació sense fils, existeixen atacs que poden passar exitosament qualsevol d’aquests protocols sense necessitat de conèixer o descobrir claus secretes vàlides, ni de trobar vulnerabilitats a les seves implantacions criptogràfiques. La idea d’aquestos atacs, coneguts com atacs de “relay”, consisteix en crear inadvertidament un pont de comunicació entre una etiqueta legítima i un lector legítim. D’aquesta manera, l’adversari utilitza els drets de l’etiqueta legítima per passar el protocol d’autentificació utilitzat pel lector. Es important tindre en compte que, dada la naturalesa sense fils dels protocols RFID, aquests tipus d’atacs representen una amenaça important a la seguretat en sistemes RFID. En aquesta dissertació proposem un nou protocol que, a més d’autentificació, realitza una revisió de la distància a la qual es troben el lector i l’etiqueta. Aquests tipus de protocols es coneixen com a “distance-boulding protocols”, els quals no prevenen aquests tipus d’atacs, però si que poden frustrar-los amb alta probabilitat. Per últim, afrontem els problemes de privacitat associats amb la publicació de informació recol•lectada a través de sistemes RFID. En concret, ens concentrem en dades de mobilitat, que també poden ser proveïdes per altres sistemes àmpliament utilitzats tals com el sistema de posicionament global (GPS) i el sistema global de comunicacions mòbils. La nostra solució es basa en la coneguda noció de privacitat “k-anonymity” i parcialment en micro-agregació. Per a aquesta finalitat, definim una nova funció de distància entre trajectòries amb la qual desenvolupen dos mètodes diferents d’anonimització de trajectòries.Radio Frequency Identification (RFID) is a technology aimed at efficiently identifying and tracking goods and assets. Such identification may be performed without requiring line-of-sight alignment or physical contact between the RFID tag and the RFID reader, whilst tracking is naturally achieved due to the short interrogation field of RFID readers. That is why the reduction in price of the RFID tags has been accompanied with an increasing attention paid to this technology. However, since tags are resource-constrained devices sending identification data wirelessly, designing secure and private RFID identification protocols is a challenging task. This scenario is even more complex when scalability must be met by those protocols. Assuming the existence of a lightweight, secure, private and scalable RFID identification protocol, there exist other concerns surrounding the RFID technology. Some of them arise from the technology itself, such as distance checking, but others are related to the potential of RFID systems to gather huge amount of tracking data. Publishing and mining such moving objects data is essential to improve efficiency of supervisory control, assets management and localisation, transportation, etc. However, obvious privacy threats arise if an individual can be linked with some of those published trajectories. The present dissertation contributes to the design of algorithms and protocols aimed at dealing with the issues explained above. First, we propose a set of protocols and heuristics based on a distributed architecture that improve the efficiency of the identification process without compromising privacy or security. Moreover, we present a novel distance-bounding protocol based on graphs that is extremely low-resource consuming. Finally, we present two trajectory anonymisation methods aimed at preserving the individuals' privacy when their trajectories are released

    Machine learning in the real world with multiple objectives

    Full text link
    Machine learning (ML) is ubiquitous in many real-world applications. Existing ML systems are based on optimizing a single quality metric such as prediction accuracy. These metrics typically do not fully align with real-world design constraints such as computation, latency, fairness, and acquisition costs that we encounter in real-world applications. In this thesis, we develop ML methods for optimizing prediction accuracy while accounting for such real-world constraints. In particular, we introduce multi-objective learning in two different setups: resource-efficient prediction and algorithmic fairness in language models. First, we focus on decreasing the test-time computational costs of prediction systems. Budget constraints arise in many machine learning problems. Computational costs limit the usage of many models on small devices such as IoT or mobile phones and increase the energy consumption in cloud computing. We design systems that allow on-the-fly modification of the prediction model for each input sample. These sample-adaptive systems allow us to leverage wide variability in sample complexity where we learn policies for selecting cheap models for low complexity instances and using descriptive models only for complex ones. We utilize multiple--objective approach where one minimizes the system cost while preserving predictive accuracy. We demonstrate significant speed-ups in the fields of computer vision, structured prediction, natural language processing, and deep learning. In the context of fairness, we first demonstrate that a naive application of ML methods runs the risk of amplifying social biases present in data. This danger is particularly acute for methods based on word embeddings, which are increasingly gaining importance in many natural language processing applications of ML. We show that word embeddings trained on Google News articles exhibit female/male gender stereotypes. We demonstrate that geometrically, gender bias is captured by unique directions in the word embedding vector space. To remove bias we formulate a empirical risk objective with fairness constraints to remove stereotypes from embeddings while maintaining desired associations. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduces gender bias in embeddings, while preserving its useful properties such as the ability to cluster related concepts

    Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation

    Full text link
    Existing research studies on vision and language grounding for robot navigation focus on improving model-free deep reinforcement learning (DRL) models in synthetic environments. However, model-free DRL models do not consider the dynamics in the real-world environments, and they often fail to generalize to new scenes. In this paper, we take a radical approach to bridge the gap between synthetic studies and real-world practices---We propose a novel, planned-ahead hybrid reinforcement learning model that combines model-free and model-based reinforcement learning to solve a real-world vision-language navigation task. Our look-ahead module tightly integrates a look-ahead policy model with an environment model that predicts the next state and the reward. Experimental results suggest that our proposed method significantly outperforms the baselines and achieves the best on the real-world Room-to-Room dataset. Moreover, our scalable method is more generalizable when transferring to unseen environments.Comment: 21 pages, 7 figures, with supplementary materia

    Conditional Goal-oriented Trajectory Prediction for Interacting Vehicles with Vectorized Representation

    Full text link
    This paper aims to tackle the interactive behavior prediction task, and proposes a novel Conditional Goal-oriented Trajectory Prediction (CGTP) framework to jointly generate scene-compliant trajectories of two interacting agents. Our CGTP framework is an end to end and interpretable model, including three main stages: context encoding, goal interactive prediction and trajectory interactive prediction. First, a Goals-of-Interest Network (GoINet) is designed to extract the interactive features between agent-to-agent and agent-to-goals using a graph-based vectorized representation. Further, the Conditional Goal Prediction Network (CGPNet) focuses on goal interactive prediction via a combined form of marginal and conditional goal predictors. Finally, the Goaloriented Trajectory Forecasting Network (GTFNet) is proposed to implement trajectory interactive prediction via the conditional goal-oriented predictors, with the predicted future states of the other interacting agent taken as inputs. In addition, a new goal interactive loss is developed to better learn the joint probability distribution over goal candidates between two interacting agents. In the end, the proposed method is conducted on Argoverse motion forecasting dataset, In-house cut-in dataset, and Waymo open motion dataset. The comparative results demonstrate the superior performance of our proposed CGTP model than the mainstream prediction methods.Comment: 14 pages, 4 figure
    corecore