1,040 research outputs found

    Minimization of Sensor Activation in Discrete-Event Systems with Control Delays and Observation Delays

    Full text link
    In discrete-event systems, to save sensor resources, the agent continuously adjusts sensor activation decisions according to a sensor activation policy based on the changing observations. However, new challenges arise for sensor activations in networked discrete-event systems, where observation delays and control delays exist between the sensor systems and the agent. In this paper, a new framework for activating sensors in networked discrete-event systems is established. In this framework, we construct a communication automaton that explicitly expresses the interaction process between the agent and the sensor systems over the observation channel and the control channel. Based on the communication automaton, we can define dynamic observations of a communicated string. To guarantee that a sensor activation policy is physically implementable and insensitive to random control delays and observation delays, we further introduce the definition of delay feasibility. We show that a delay feasible sensor activation policy can be used to dynamically activate sensors even if control delays and observation delays exist. A set of algorithms are developed to minimize sensor activations in a transition-based domain while ensuring a given specification condition is satisfied. A practical example is provided to show the application of the developed sensor activation methods. Finally, we briefly discuss how to extend the proposed framework to a decentralized sensing architecture

    Property Enforcement for Partially-Observed Discrete-Event Systems

    Full text link
    Engineering systems that involve physical elements, such as automobiles, aircraft, or electric power pants, that are controlled by a computational infrastructure that consists of several computers that communicate through a communication network, are called Cyber-Physical Systems. Ever-increasing demands for safety, security, performance, and certi cation of these critical systems put stringent constraints on their design and necessitate the use of formal model-based approaches to synthesize provably-correct feedback controllers. This dissertation aims to tackle these challenges by developing a novel methodology for synthesis of control and sensing strategies for Discrete Event Systems (DES), an important class of cyber-physical systems. First, we develop a uniform approach for synthesizing property enforcing supervisors for a wide class of properties called information-state-based (IS-based) properties. We then consider the enforcement of non-blockingness in addition to IS-based properties. We develop a nite structure called the All Enforcement Structure (AES) that embeds all valid supervisors. Furthermore, we propose novel and general approaches to solve the sensor activation problem for partially-observed DES. We extend our results for the sensor activation problem from the centralized case to the decentralized case. The methodology in the dissertation has the following novel features: (i) it explicitly considers and handles imperfect state information, due to sensor noise, and limited controllability, due to unexpected environmental disturbances; (ii) it is a uniform information-state-based approach that can be applied to a variety of user-speci ed requirements; (iii) it is a formal model-based approach, which results in provably correct solutions; and (iv) the methodology and associated theoretical foundations developed are generic and applicable to many types of networked cyber-physical systems with safety-critical requirements, in particular networked systems such as aircraft electric power systems and intelligent transportation systems.PHDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/137097/1/xiangyin_1.pd

    Discrete Event System Methods for Control Problems Arising in Cyber-physical Systems.

    Full text link
    We consider two problems in cyber-physical systems. The first is that of dynamic fault diagnosis. Specifically, we assume that a plant model is available in the form of a discrete event system (DES) containing special fault events whose occurrences are to be diagnosed. Furthermore, it is assumed that there exist sensors that can be turned on or off and are capable of detecting some subset of the system’s non-faulty events. The problem to be solved consists of constructing a compact structure, called the most permissive observer (MPO), containing the set of all sequences of sensor activations that ensure the timely diagnosis of any fault event’s occurrence. We solve this problem by defining an appropriate notion of information state summarizing the information obtained from the past sequence of observations and sensor activations. The resulting MPO has a better space complexity than that of the previous approach in the literature. The second problem considered in this thesis is that of controlling vehicles through an intersection. Specifically, we wish to obtain a supervisor for the vehicles that is safe, non-deadlocking, and maximally permissive. Furthermore, we solve this problem in the presence of uncontrolled vehicles, bounded disturbances in the dynamics, and measurement uncertainty. Our approach consists of discretizing the system in time and space, obtaining a DES abstraction, solving for maximally permissive supervisors in the abstracted domain, and refining the supervisor to one for the original, continuous, problem domain. We provide general results under which this approach yields maximally permissive memoryless supervisors for the original system and show that, under certain conditions, the resulting supervisor will be maximally permissive over the class of all supervisors, not merely memoryless ones. Our contributions are as follows. First, by constructing DES abstractions from continuous systems, we can leverage the supervisory control theory of DES, which is well-suited to finding maximally permissive supervisors under safety and non-blocking constraints. Second, we define different types of relations between transition systems and their abstractions and, for each relation, characterize the class of supervisors over which the supervisors obtained under our approach are maximally permissive.PHDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/108720/1/edallal_1.pd

    Supervisory Control and Analysis of Partially-observed Discrete Event Systems

    Get PDF
    Nowadays, a variety of real-world systems fall into discrete event systems (DES). In practical scenarios, due to facts like limited sensor technique, sensor failure, unstable network and even the intrusion of malicious agents, it might occur that some events are unobservable, multiple events are indistinguishable in observations, and observations of some events are nondeterministic. By considering various practical scenarios, increasing attention in the DES community has been paid to partially-observed DES, which in this thesis refer broadly to those DES with partial and/or unreliable observations. In this thesis, we focus on two topics of partially-observed DES, namely, supervisory control and analysis. The first topic includes two research directions in terms of system models. One is the supervisory control of DES with both unobservable and uncontrollable events, focusing on the forbidden state problem; the other is the supervisory control of DES vulnerable to sensor-reading disguising attacks (SD-attacks), which is also interpreted as DES with nondeterministic observations, addressing both the forbidden state problem and the liveness-enforcing problem. Petri nets (PN) are used as a reference formalism in this topic. First, we study the forbidden state problem in the framework of PN with both unobservable and uncontrollable transitions, assuming that unobservable transitions are uncontrollable. For ordinary PN subject to an admissible Generalized Mutual Exclusion Constraint (GMEC), an optimal on-line control policy with polynomial complexity is proposed provided that a particular subnet, called observation subnet, satisfies certain conditions in structure. It is then discussed how to obtain an optimal on-line control policy for PN subject to an arbitrary GMEC. Next, we still consider the forbidden state problem but in PN vulnerable to SD-attacks. Assuming the control specification in terms of a GMEC, we propose three methods to derive on-line control policies. The first two lead to an optimal policy but are computationally inefficient for large-size systems, while the third method computes a policy with timely response even for large-size systems but at the expense of optimality. Finally, we investigate the liveness-enforcing problem still assuming that the system is vulnerable to SD-attacks. In this problem, the plant is modelled as a bounded PN, which allows us to off-line compute a supervisor starting from constructing the reachability graph of the PN. Then, based on repeatedly computing a more restrictive liveness-enforcing supervisor under no attack and constructing a basic supervisor, an off-line method that synthesizes a liveness-enforcing supervisor tolerant to an SD-attack is proposed. In the second topic, we care about the verification of properties related to system security. Two properties are considered, i.e., fault-predictability and event-based opacity. The former is a property in the literature, characterizing the situation that the occurrence of any fault in a system is predictable, while the latter is a newly proposed property in the thesis, which describes the fact that secret events of a system cannot be revealed to an external observer within their critical horizons. In the case of fault-predictability, DES are modeled by labeled PN. A necessary and sufficient condition for fault-predictability is derived by characterizing the structure of the Predictor Graph. Furthermore, two rules are proposed to reduce the size of a PN, which allow us to analyze the fault-predictability of the original net by verifying that of the reduced net. When studying event-based opacity, we use deterministic finite-state automata as the reference formalism. Considering different scenarios, we propose four notions, namely, K-observation event-opacity, infinite-observation event-opacity, event-opacity and combinational event-opacity. Moreover, verifiers are proposed to analyze these properties

    Supervisory control in health care systems

    Get PDF

    Hierarchical Reasoning about Faults in Cyber-Physical Energy Systems using Temporal Causal Diagrams

    Get PDF
    Fault management systems that observe the state of the system, decide if there is an anomaly and then take automated actions to isolate faults. For example, in electrical networks relays and breaks isolate faults in order to arrest failure propagation and protect the healthy parts of the system. However, due to the limited situational awareness and hidden failures the protection devices themselves, through their operation (or mis-operation) may cause overloading and the disconnection of parts of an otherwise healthy system. Additionally, often there can be faults in the management system itself leading to situations where it is difficult to isolate failures. Our work presented in this paper is geared towards solution of this problem by describing the formalism of Temporal Causal Diagrams (TCD-s) that augment the failure models for the physical systems with discrete time models of protection elements, accounting for the complex interactions between the protection devices and the physical plants. We use the case study of the standard Western System Coordinating Council (WSCC) 9 bus system to describe four different fault scenarios and illustrate how our approach can help isolate these failures. Though, we use power networks as exemplars in this paper our approach can be applied to other distributed cyberphysical systems, for example water networks

    Análisis del juego en Baloncesto: relaciones entre las acciones de balón jugado y balón parado en diferentes niveles de competición

    Get PDF
    El propósito de este estudio fue investigar la duración de las acciones jugadas y el tiempo de pausa (ST) y comparar la distribución de las duraciones entre los diferentes niveles y cuartos. La duración se ha registrado durante 21 partidos oficiales en cuatro niveles competitivos: Cadetes (U16), 3ª división española (LEB), 1ª división (ACB) y Euroliga (EUR). Se implementó un análisis de varianza, la d de Cohen (d) para interpretar la magnitud de las diferencias y un análisis secuencial. El 72.8% de las ST duraron menos de 44 s y solo el 1% fueron más largos que 120 s. En EUR las ST fueron más largas que en LEB (d=0.25) y U16 (d=0.36). El último cuarto exhibe las duraciones más largas de ST (d=-0.32) debido a un mayor número de tiempos muertos y eventos de tiros libres. El análisis secuencial mostró cómo las duraciones cortas de acciones jugadas (46 s) activaron largos períodos pausa (>38 s). La aplicación práctica está orientada al diseño de programas de entrenamiento basados en el juego, especialmente para comprender los escenarios más demandantes, con aplicabilidad en diferentes niveles de baloncesto.The purpose of this study was to investigate durations of live-time actions (LA) and stoppages (ST) and to compare the distribution of durations among different levels of competition and quarters. LA and ST have been recorded during 21 official games in four different levels of competition: Under 16-years-old (Spanish U16), 3rd division competition (Spanish LEB), 1st division competition (Spanish ACB) and Euroleague (EUR). One-way analysis of variance, Cohen's d (d) were implement to interpret the magnitude of differences and a sequential analysis was also performed. 72.8% of all ST events lasted less than 44 s and only 1% were longer than 120 s. EUR level has longer ST durations compared with LEB (d=0.25) and U16 (d=0.36). Fourth quarter exhibit the longest durations of ST (d=-0.32) due to a higher number of time outs and free throws events. Sequential analysis showed how short durations of LA (46 s) activated long periods of SP (>38 s). The practical application of these results should be considered for the design of game-based conditioning programs, especially by understanding the most demanding scenario, with applicability in different levels of basketball.Ministerio de Ciencia, Innovación y Universidades (MCIU), Agencia Estatal de Investigación (AEI) y Fondo Europeo de Desarrollo Regional (FEDER)]: proyecto “New approach of research in physical activity and sport from mixed methods perspective” (NARPAS_MM) [SPGC201800X098742CV0].peerReviewe

    Protecting Contextual Information in WSNs: Source- and Receiver-Location Privacy Solutions

    Get PDF
    La privacidad es un derecho fundamental recogido por numerosas leyes y tratados entre los que destaca la Declaración Universal de los Derechos Humanos de las Naciones Unidas. Sin embargo, este derecho fundamental se ha visto vulnerado en numerosas ocasiones a lo largo de la historia; y el desarrollo de la tecnología, en especial la mejora de los sistemas de recolección, analisis y diseminación de información, han tenido gran parte de culpa. En la actualidad nos encontramos en un punto en el que el desarrollo y despliegue de sistemas ubicuos, encabezados por las redes inalámbricas de sensores, puede llegar a suponer un riesgo de privacidad sin precedentes dada su capacidad para recolectar información en cantidades y situaciones hasta el momento insospechadas. Existe, por tanto, una urgente necesidad de desarrollar mecanismos capaces de velar por nuestra información más sensible. Es precisamente éste uno de los objetivos principales de la presente tesis doctoral: facilitar la integración de las redes inalámbricas de sensores en nuestro día a día sin que éstas supongan un grave riesgo de privacidad. Esta tesis se centra en un problema de privacidad particular que viene derivado de la naturaleza inalámbrica de las comunicaciones y de la necesidad imperiosa de ahorrar energía que existe en estas redes de recursos restringidos. Para las redes de sensores, las comunicaciones suponen un gran porcentaje del presupuesto energético y, por ello, los protocolos de encaminamiento empleados tienden a minimizarlas, utilizando protocolos de camino óptimo. Aprovechándose de esta situación, un observador podría, mediante técnicas de análisis de tráfico no demasiado sofisticadas, y sin necesidad de descifrar el contenido de los paquete, determinar el origen y el destino de las comunicaciones. Esto supone, al igual que en los sistemas de comunicación tradicionales, un grave riesgo para la privacidad. Dado que el problema de la privacidad de localización en redes de sensores se reduce a una cuestión de análisis de tráfico, parece razonable pensar que las soluciones desarrolladas a tal fin en redes de computadores pueden ser de utilida. Sin embargo, esta hipótesis ha sido rechazada en varias ocasiones con argumentos vagos al respecto de las limitaciones computacionales y energéticas de las redes de sensores. Nosotros consideramos que esto no es motivo suficiente para descartar estas soluciones ya que, a pesar de la tendencia actual, en el futuro podríamos tener nodos sensores de gran capacidad. Por ello, uno de los objetivos de esta tesis ha sido realizar un análisis exhaustivo sobre la aplicabilidad de estas soluciones al ámbito de las redes de sensores, centrándonos no sólo en los requisitos computacionales sino también en las propiedades de anonimato que se persiguen, en los modelos de atacante y en las posibles limitaciones que podrían derivarse de su aplicación. Por otra parte, se ha realizado un amplio análisis de las soluciones de privacidad de localización existentes para redes de sensores. Este análisis no se ha centrado únicamente en estudiar las técnicas de protección de empleadas sino que además se ha esforzado en destacar las ventajas e inconvenientes de las distintas soluciones. Esto ha permitido desarrollar una completa taxonomía en varios niveles basada en los recursos que se desean proteger, los modelos de adversario a los que hacer frente y las principales características o técnicas empleadas por las diferentes soluciones. Además, a partir de esto se han detectado una serie de problemas abiertos y puntos de mejora del estado del arte actual, que se han plasmado en dos nuevas soluciones; una de las soluciones se ha centrado en la protección de la localización del origen de datos, mientras que la otra se ha enfocado a la protección de la estación base. Ambas soluciones tienen en cuenta atacantes con un rango de escucha parcial y capaces de desplazarse en el terreno para observar las comunicaciones en diferentes zonas de la red. La primera de las soluciones desarrolladas parte de la observación de que los mecanismos actuales se basan principalmente en el envío de paquetes siguiendo caminos aleatorios sin ningún conocimiento acerca de si estos caminos son realmente efectivos para hacer frente a un atacante local. La idea detrás de CALP es aprovechar la capacidad que tienen las redes de sensores para sentir lo que pasa en su entorno para desarrollar mecanismos de protección más inteligentes utilizando información acerca del atacante. De esta forma, se consigue reducir drásticamente el consumo energético de la solución y al mismo tiempo se reduce el retraso de las comunicaciones, ya que el mecanismo sólo se activa ante la presencia de un atacante. Aunque esta idea se ha aplicado únicamente a la protección de los nodos origen de datos, sus características indican que también sería posible aplicarla con éxito a la protección de la estación base. La segunda solución surge tras observar que las soluciones para proteger la estación base son demasiado costosas a nivel energético o, en su defecto, revelan información sobre su localización. Además, hasta la fecha ninguna solución había tenido en cuenta que si un atacante obtiene las tablas de rutas de un nodo obtiene información sobre la estación base. Nuestra solución, HISP-NC, se basa en dos mecanismos complementarios que, por un lado, hacen frente a ataques de análisis de tráfico y, por otro lado, protegen frente al nuevo modelo de atacante desarrollado. El primer mecanismo se basa en la homogeneización del tráfico en el entorno del camino y el segundo en la perturbación de la tabla de rutas, de manera que se dificulta el ataque al tiempo que se asegura la llegada de datos a la estación base

    Bimanual robot skills: MP encoding, dimensionality reduction and reinforcement learning

    Get PDF
    In our culture, robots have been in novels and cinema for a long time, but it has been specially in the last two decades when the improvements in hardware - better computational power and components - and advances in Artificial Intelligence (AI), have allowed robots to start sharing spaces with humans. Such situations require, aside from ethical considerations, robots to be able to move with both compliance and precision, and learn at different levels, such as perception, planning, and motion, being the latter the focus of this work. The first issue addressed in this thesis is inverse kinematics for redundant robot manipulators, i.e: positioning the robot joints so as to reach a certain end-effector pose. We opt for iterative solutions based on the inversion of the kinematic Jacobian of a robot, and propose to filter and limit the gains in the spectral domain, while also unifying such approach with a continuous, multipriority scheme. Such inverse kinematics method is then used to derive manipulability in the whole workspace of an antropomorphic arm, and the coordination of two arms is subsequently optimized by finding their best relative positioning. Having solved the kinematic issues, a robot learning within a human environment needs to move compliantly, with limited amount of force, in order not to harm any humans or cause any damage, while being as precise as possible. Therefore, we developed two dynamic models for the same redundant arm we had analysed kinematically: The first based on local models with Gaussian projections, and the second characterizing the most problematic term of the dynamics, namely friction. Such models allowed us to implement feed-forward controllers, where we can actively change the weights in the compliance-precision tradeoff. Moreover, we used such models to predict external forces acting on the robot, without the use of force sensors. Afterwards, we noticed that bimanual robots must coordinate their components (or limbs) and be able to adapt to new situations with ease. Over the last decade, a number of successful applications for learning robot motion tasks have been published. However, due to the complexity of a complete system including all the required elements, most of these applications involve only simple robots with a large number of high-end technology sensors, or consist of very simple and controlled tasks. Using our previous framework for kinematics and control, we relied on two types of movement primitives to encapsulate robot motion. Such movement primitives are very suitable for using reinforcement learning. In particular, we used direct policy search, which uses the motion parametrization as the policy itself. In order to improve the learning speed in real robot applications, we generalized a policy search algorithm to give some importance to samples yielding a bad result, and we paid special attention to the dimensionality of the motion parametrization. We reduced such dimensionality with linear methods, using the rewards obtained through motion repetition and execution. We tested such framework in a bimanual task performed by two antropomorphic arms, such as the folding of garments, showing how a reduced dimensionality can provide qualitative information about robot couplings and help to speed up the learning of tasks when robot motion executions are costly.A la nostra cultura, els robots han estat presents en novel·les i cinema des de fa dècades, però ha sigut especialment en les últimes dues quan les millores en hardware (millors capacitats de còmput) i els avenços en intel·ligència artificial han permès que els robots comencin a compartir espais amb els humans. Aquestes situacions requereixen, a banda de consideracions ètiques, que els robots siguin capaços de moure's tant amb suavitat com amb precisió, i d'aprendre a diferents nivells, com són la percepció, planificació i moviment, essent l'última el centre d'atenció d'aquest treball. El primer problema adreçat en aquesta tesi és la cinemàtica inversa, i.e.: posicionar les articulacions del robot de manera que l'efector final estigui en una certa posició i orientació. Hem estudiat el camp de les solucions iteratives, basades en la inversió del Jacobià cinemàtic d'un robot, i proposem un filtre que limita els guanys en el seu domini espectral, mentre també unifiquem tal mètode dins un esquema multi-prioritat i continu. Aquest mètode per a la cinemàtica inversa és usat a l'hora d'encapsular tota la informació sobre l'espai de treball d'un braç antropomòrfic, i les capacitats de coordinació entre dos braços són optimitzades, tot trobant la seva millor posició relativa en l'espai. Havent resolt les dificultats cinemàtiques, un robot que aprèn en un entorn humà necessita moure's amb suavitat exercint unes forces limitades per tal de no causar danys, mentre es mou amb la màxima precisió possible. Per tant, hem desenvolupat dos models dinàmics per al mateix braç robòtic redundant que havíem analitzat des del punt de vista cinemàtic: El primer basat en models locals amb projeccions de Gaussianes i el segon, caracteritzant el terme més problemàtic i difícil de representar de la dinàmica, la fricció. Aquests models ens van permetre utilitzar controladors coneguts com "feed-forward", on podem canviar activament els guanys buscant l'equilibri precisió-suavitat que més convingui. A més, hem usat aquests models per a inferir les forces externes actuant en el robot, sense la necessitat de sensors de força. Més endavant, ens hem adonat que els robots bimanuals han de coordinar els seus components (braços) i ser capaços d'adaptar-se a noves situacions amb facilitat. Al llarg de l'última dècada, diverses aplicacions per aprendre tasques motores robòtiques amb èxit han estat publicades. No obstant, degut a la complexitat d'un sistema complet que inclogui tots els elements necessaris, la majoria d'aquestes aplicacions consisteixen en robots més aviat simples amb costosos sensors d'última generació, o a resoldre tasques senzilles en un entorn molt controlat. Utilitzant el nostre treball en cinemàtica i control, ens hem basat en dos tipus de primitives de moviment per caracteritzar la motricitat robòtica. Aquestes primitives de moviment són molt adequades per usar aprenentatge per reforç. En particular, hem usat la búsqueda directa de la política, un camp de l'aprenentatge per reforç que usa la parametrització del moviment com la pròpia política. Per tal de millorar la velocitat d'aprenentatge en aplicacions amb robots reals, hem generalitzat un algoritme de búsqueda directa de política per a donar importància a les mostres amb mal resultat, i hem donat especial atenció a la reducció de dimensionalitat en la parametrització dels moviments. Hem reduït la dimensionalitat amb mètodes lineals, utilitzant les recompenses obtingudes EN executar els moviments. Aquests mètodes han estat provats en tasques bimanuals com són plegar roba, usant dos braços antropomòrfics. Els resultats mostren com la reducció de dimensionalitat pot aportar informació qualitativa d'una tasca, i al mateix temps ajuda a aprendre-la més ràpid quan les execucions amb robots reals són costoses

    Bimanual robot skills: MP encoding, dimensionality reduction and reinforcement learning

    Get PDF
    Aplicat embargament des de la data de defensa fins 1/7/2018Premio a la mejor Tesis Doctoral sobre Robótica, Edición 2017, atorgat pel Comité Español de Automática.Finalista del 2018 George Girault PhD Award, from EuRoboticsIn our culture, robots have been in novels and cinema for a long time, but it has been specially in the last two decades when the improvements in hardware - better computational power and components - and advances in Artificial Intelligence (AI), have allowed robots to start sharing spaces with humans. Such situations require, aside from ethical considerations, robots to be able to move with both compliance and precision, and learn at different levels, such as perception, planning, and motion, being the latter the focus of this work. The first issue addressed in this thesis is inverse kinematics for redundant robot manipulators, i.e: positioning the robot joints so as to reach a certain end-effector pose. We opt for iterative solutions based on the inversion of the kinematic Jacobian of a robot, and propose to filter and limit the gains in the spectral domain, while also unifying such approach with a continuous, multipriority scheme. Such inverse kinematics method is then used to derive manipulability in the whole workspace of an antropomorphic arm, and the coordination of two arms is subsequently optimized by finding their best relative positioning. Having solved the kinematic issues, a robot learning within a human environment needs to move compliantly, with limited amount of force, in order not to harm any humans or cause any damage, while being as precise as possible. Therefore, we developed two dynamic models for the same redundant arm we had analysed kinematically: The first based on local models with Gaussian projections, and the second characterizing the most problematic term of the dynamics, namely friction. Such models allowed us to implement feed-forward controllers, where we can actively change the weights in the compliance-precision tradeoff. Moreover, we used such models to predict external forces acting on the robot, without the use of force sensors. Afterwards, we noticed that bimanual robots must coordinate their components (or limbs) and be able to adapt to new situations with ease. Over the last decade, a number of successful applications for learning robot motion tasks have been published. However, due to the complexity of a complete system including all the required elements, most of these applications involve only simple robots with a large number of high-end technology sensors, or consist of very simple and controlled tasks. Using our previous framework for kinematics and control, we relied on two types of movement primitives to encapsulate robot motion. Such movement primitives are very suitable for using reinforcement learning. In particular, we used direct policy search, which uses the motion parametrization as the policy itself. In order to improve the learning speed in real robot applications, we generalized a policy search algorithm to give some importance to samples yielding a bad result, and we paid special attention to the dimensionality of the motion parametrization. We reduced such dimensionality with linear methods, using the rewards obtained through motion repetition and execution. We tested such framework in a bimanual task performed by two antropomorphic arms, such as the folding of garments, showing how a reduced dimensionality can provide qualitative information about robot couplings and help to speed up the learning of tasks when robot motion executions are costly.A la nostra cultura, els robots han estat presents en novel·les i cinema des de fa dècades, però ha sigut especialment en les últimes dues quan les millores en hardware (millors capacitats de còmput) i els avenços en intel·ligència artificial han permès que els robots comencin a compartir espais amb els humans. Aquestes situacions requereixen, a banda de consideracions ètiques, que els robots siguin capaços de moure's tant amb suavitat com amb precisió, i d'aprendre a diferents nivells, com són la percepció, planificació i moviment, essent l'última el centre d'atenció d'aquest treball. El primer problema adreçat en aquesta tesi és la cinemàtica inversa, i.e.: posicionar les articulacions del robot de manera que l'efector final estigui en una certa posició i orientació. Hem estudiat el camp de les solucions iteratives, basades en la inversió del Jacobià cinemàtic d'un robot, i proposem un filtre que limita els guanys en el seu domini espectral, mentre també unifiquem tal mètode dins un esquema multi-prioritat i continu. Aquest mètode per a la cinemàtica inversa és usat a l'hora d'encapsular tota la informació sobre l'espai de treball d'un braç antropomòrfic, i les capacitats de coordinació entre dos braços són optimitzades, tot trobant la seva millor posició relativa en l'espai. Havent resolt les dificultats cinemàtiques, un robot que aprèn en un entorn humà necessita moure's amb suavitat exercint unes forces limitades per tal de no causar danys, mentre es mou amb la màxima precisió possible. Per tant, hem desenvolupat dos models dinàmics per al mateix braç robòtic redundant que havíem analitzat des del punt de vista cinemàtic: El primer basat en models locals amb projeccions de Gaussianes i el segon, caracteritzant el terme més problemàtic i difícil de representar de la dinàmica, la fricció. Aquests models ens van permetre utilitzar controladors coneguts com "feed-forward", on podem canviar activament els guanys buscant l'equilibri precisió-suavitat que més convingui. A més, hem usat aquests models per a inferir les forces externes actuant en el robot, sense la necessitat de sensors de força. Més endavant, ens hem adonat que els robots bimanuals han de coordinar els seus components (braços) i ser capaços d'adaptar-se a noves situacions amb facilitat. Al llarg de l'última dècada, diverses aplicacions per aprendre tasques motores robòtiques amb èxit han estat publicades. No obstant, degut a la complexitat d'un sistema complet que inclogui tots els elements necessaris, la majoria d'aquestes aplicacions consisteixen en robots més aviat simples amb costosos sensors d'última generació, o a resoldre tasques senzilles en un entorn molt controlat. Utilitzant el nostre treball en cinemàtica i control, ens hem basat en dos tipus de primitives de moviment per caracteritzar la motricitat robòtica. Aquestes primitives de moviment són molt adequades per usar aprenentatge per reforç. En particular, hem usat la búsqueda directa de la política, un camp de l'aprenentatge per reforç que usa la parametrització del moviment com la pròpia política. Per tal de millorar la velocitat d'aprenentatge en aplicacions amb robots reals, hem generalitzat un algoritme de búsqueda directa de política per a donar importància a les mostres amb mal resultat, i hem donat especial atenció a la reducció de dimensionalitat en la parametrització dels moviments. Hem reduït la dimensionalitat amb mètodes lineals, utilitzant les recompenses obtingudes EN executar els moviments. Aquests mètodes han estat provats en tasques bimanuals com són plegar roba, usant dos braços antropomòrfics. Els resultats mostren com la reducció de dimensionalitat pot aportar informació qualitativa d'una tasca, i al mateix temps ajuda a aprendre-la més ràpid quan les execucions amb robots reals són costoses.Award-winningPostprint (published version
    corecore