27 research outputs found

    Design Techniques for Energy-Quality Scalable Digital Systems

    Get PDF
    Energy efficiency is one of the key design goals in modern computing. Increasingly complex tasks are being executed in mobile devices and Internet of Things end-nodes, which are expected to operate for long time intervals, in the orders of months or years, with the limited energy budgets provided by small form-factor batteries. Fortunately, many of such tasks are error resilient, meaning that they can toler- ate some relaxation in the accuracy, precision or reliability of internal operations, without a significant impact on the overall output quality. The error resilience of an application may derive from a number of factors. The processing of analog sensor inputs measuring quantities from the physical world may not always require maximum precision, as the amount of information that can be extracted is limited by the presence of external noise. Outputs destined for human consumption may also contain small or occasional errors, thanks to the limited capabilities of our vision and hearing systems. Finally, some computational patterns commonly found in domains such as statistics, machine learning and operational research, naturally tend to reduce or eliminate errors. Energy-Quality (EQ) scalable digital systems systematically trade off the quality of computations with energy efficiency, by relaxing the precision, the accuracy, or the reliability of internal software and hardware components in exchange for energy reductions. This design paradigm is believed to offer one of the most promising solutions to the impelling need for low-energy computing. Despite these high expectations, the current state-of-the-art in EQ scalable design suffers from important shortcomings. First, the great majority of techniques proposed in literature focus only on processing hardware and software components. Nonetheless, for many real devices, processing contributes only to a small portion of the total energy consumption, which is dominated by other components (e.g. I/O, memory or data transfers). Second, in order to fulfill its promises and become diffused in commercial devices, EQ scalable design needs to achieve industrial level maturity. This involves moving from purely academic research based on high-level models and theoretical assumptions to engineered flows compatible with existing industry standards. Third, the time-varying nature of error tolerance, both among different applications and within a single task, should become more central in the proposed design methods. This involves designing “dynamic” systems in which the precision or reliability of operations (and consequently their energy consumption) can be dynamically tuned at runtime, rather than “static” solutions, in which the output quality is fixed at design-time. This thesis introduces several new EQ scalable design techniques for digital systems that take the previous observations into account. Besides processing, the proposed methods apply the principles of EQ scalable design also to interconnects and peripherals, which are often relevant contributors to the total energy in sensor nodes and mobile systems respectively. Regardless of the target component, the presented techniques pay special attention to the accurate evaluation of benefits and overheads deriving from EQ scalability, using industrial-level models, and on the integration with existing standard tools and protocols. Moreover, all the works presented in this thesis allow the dynamic reconfiguration of output quality and energy consumption. More specifically, the contribution of this thesis is divided in three parts. In a first body of work, the design of EQ scalable modules for processing hardware data paths is considered. Three design flows are presented, targeting different technologies and exploiting different ways to achieve EQ scalability, i.e. timing-induced errors and precision reduction. These works are inspired by previous approaches from the literature, namely Reduced-Precision Redundancy and Dynamic Accuracy Scaling, which are re-thought to make them compatible with standard Electronic Design Automation (EDA) tools and flows, providing solutions to overcome their main limitations. The second part of the thesis investigates the application of EQ scalable design to serial interconnects, which are the de facto standard for data exchanges between processing hardware and sensors. In this context, two novel bus encodings are proposed, called Approximate Differential Encoding and Serial-T0, that exploit the statistical characteristics of data produced by sensors to reduce the energy consumption on the bus at the cost of controlled data approximations. The two techniques achieve different results for data of different origins, but share the common features of allowing runtime reconfiguration of the allowed error and being compatible with standard serial bus protocols. Finally, the last part of the manuscript is devoted to the application of EQ scalable design principles to displays, which are often among the most energy- hungry components in mobile systems. The two proposals in this context leverage the emissive nature of Organic Light-Emitting Diode (OLED) displays to save energy by altering the displayed image, thus inducing an output quality reduction that depends on the amount of such alteration. The first technique implements an image-adaptive form of brightness scaling, whose outputs are optimized in terms of balance between power consumption and similarity with the input. The second approach achieves concurrent power reduction and image enhancement, by means of an adaptive polynomial transformation. Both solutions focus on minimizing the overheads associated with a real-time implementation of the transformations in software or hardware, so that these do not offset the savings in the display. For each of these three topics, results show that the aforementioned goal of building EQ scalable systems compatible with existing best practices and mature for being integrated in commercial devices can be effectively achieved. Moreover, they also show that very simple and similar principles can be applied to design EQ scalable versions of different system components (processing, peripherals and I/O), and to equip these components with knobs for the runtime reconfiguration of the energy versus quality tradeoff

    Optimización de un Algoritmo de Detección de Líneas para Vehículos Autónomos en un RISC-V con Acelerador

    Get PDF
    In recent years, autonomous vehicles have attracted the attention of many research groups, both in academia and business, including researchers from leading companies such as Google, Uber and Tesla. This type of vehicles are equipped with systems that are subject to very strict requirements, essentially aimed at performing safe operations -both for potential passengers and pedestrians- as well as carrying out the processing needed for decision making in real time. In many instances, general-purpose processors alone cannot ensure that these safety, reliability and real-time requirements are met, so it is common to implement heterogeneous systems by including accelerators. This paper explores the acceleration of a line detection application in the autonomous car environment using a heterogeneous system consisting of a general-purpose RISC-V core and a domain-specific accelerator. In particular, the application is analyzed to identify the most computationally intensive parts of the code and it is adapted accordingly for more efficient processing. Furthermore, the code is executed on the aforementioned hardware platform to verify that the execution effectively meets the existing requirements in autonomous vehicles, experiencing a 3.7x speedup with respect to running without accelerator.En los últimos años los vehículos autónomos están atrayendo la atención de muchos grupos de investigación, tanto del ámbito académico como del empresarial, entre los que se incluyen investigadores pertenecientes a empresas punteras como Google, Uber o Tesla. Los sistemas de los que están dotados este tipo de vehículos están sometidos a requisitos muy estrictos relacionados esencialmente con la realización de operaciones seguras, tanto para los potenciales pasajeros como para los peatones, así como con que el procesamiento necesario para la toma de decisiones se realice en tiempo real. En muchas ocasiones, los procesadores de propósito general no pueden por sí solos garantizar el cumplimiento de estos requisitos de seguridad, fiabilidad y tiempo real, por lo que es común implementar sistemas heterogéneos mediante la inclusión de aceleradores. En este artículo se explora la aceleración de una aplicación de detección de líneas en el entorno de vehículos autónomos utilizando para ello un sistema heterogéneo formado por un core RISC-V de propósito general y un acelerador de dominio específico. En particular, se analiza dicha aplicación para identificar las partes del código más costosas computacionalmente y se adapta el código para un procesamiento más eficiente. Además, se ejecuta dicho código en la mencionada plataforma hardware y se comprueba que su procesamiento efectivamente cumple con los requisitos presentes en los vehículos autónomos, experimentando una reducción de 3.7x en su tiempo de ejecución con respecto a su ejecución sin acelerador.Facultad de Informátic

    Approximation Opportunities in Edge Computing Hardware : A Systematic Literature Review

    Get PDF
    With the increasing popularity of the Internet of Things and massive Machine Type Communication technologies, the number of connected devices is rising. However, while enabling valuable effects to our lives, bandwidth and latency constraints challenge Cloud processing of their associated data amounts. A promising solution to these challenges is the combination of Edge and approximate computing techniques that allows for data processing nearer to the user. This paper aims to survey the potential benefits of these paradigms’ intersection. We provide a state-of-the-art review of circuit-level and architecture-level hardware techniques and popular applications. We also outline essential future research directions.publishedVersionPeer reviewe

    Scalable Task Schedulers for Many-Core Architectures

    Get PDF
    This thesis develops schedulers for many-cores with different optimization objectives. The proposed schedulers are designed to be scale up as the number of cores in many-cores increase while continuing to provide guarantees on the quality of the schedule

    Resource-aware scheduling for 2D/3D multi-/many-core processor-memory systems

    Get PDF
    This dissertation addresses the complexities of 2D/3D multi-/many-core processor-memory systems, focusing on two key areas: enhancing timing predictability in real-time multi-core processors and optimizing performance within thermal constraints. The integration of an increasing number of transistors into compact chip designs, while boosting computational capacity, presents challenges in resource contention and thermal management. The first part of the thesis improves timing predictability. We enhance shared cache interference analysis for set-associative caches, advancing the calculation of Worst-Case Execution Time (WCET). This development enables accurate assessment of cache interference and the effectiveness of partitioned schedulers in real-world scenarios. We introduce TCPS, a novel task and cache-aware partitioned scheduler that optimizes cache partitioning based on task-specific WCET sensitivity, leading to improved schedulability and predictability. Our research explores various cache and scheduling configurations, providing insights into their performance trade-offs. The second part focuses on thermal management in 2D/3D many-core systems. Recognizing the limitations of Dynamic Voltage and Frequency Scaling (DVFS) in S-NUCA many-core processors, we propose synchronous thread migrations as a thermal management strategy. This approach culminates in the HotPotato scheduler, which balances performance and thermal safety. We also introduce 3D-TTP, a transient temperature-aware power budgeting strategy for 3D-stacked systems, reducing the need for Dynamic Thermal Management (DTM) activation. Finally, we present 3QUTM, a novel method for 3D-stacked systems that combines core DVFS and memory bank Low Power Modes with a learning algorithm, optimizing response times within thermal limits. This research contributes significantly to enhancing performance and thermal management in advanced processor-memory systems

    Computational Argumentation for the Automatic Analysis of Argumentative Discourse and Human Persuasion

    Full text link
    Tesis por compendio[ES] La argumentación computacional es el área de investigación que estudia y analiza el uso de distintas técnicas y algoritmos que aproximan el razonamiento argumentativo humano desde un punto de vista computacional. En esta tesis doctoral se estudia el uso de distintas técnicas propuestas bajo el marco de la argumentación computacional para realizar un análisis automático del discurso argumentativo, y para desarrollar técnicas de persuasión computacional basadas en argumentos. Con estos objetivos, en primer lugar se presenta una completa revisión del estado del arte y se propone una clasificación de los trabajos existentes en el área de la argumentación computacional. Esta revisión nos permite contextualizar y entender la investigación previa de forma más clara desde la perspectiva humana del razonamiento argumentativo, así como identificar las principales limitaciones y futuras tendencias de la investigación realizada en argumentación computacional. En segundo lugar, con el objetivo de solucionar algunas de estas limitaciones, se ha creado y descrito un nuevo conjunto de datos que permite abordar nuevos retos y investigar problemas previamente inabordables (e.g., evaluación automática de debates orales). Conjuntamente con estos datos, se propone un nuevo sistema para la extracción automática de argumentos y se realiza el análisis comparativo de distintas técnicas para esta misma tarea. Además, se propone un nuevo algoritmo para la evaluación automática de debates argumentativos y se prueba con debates humanos reales. Finalmente, en tercer lugar se presentan una serie de estudios y propuestas para mejorar la capacidad persuasiva de sistemas de argumentación computacionales en la interacción con usuarios humanos. De esta forma, en esta tesis se presentan avances en cada una de las partes principales del proceso de argumentación computacional (i.e., extracción automática de argumentos, representación del conocimiento y razonamiento basados en argumentos, e interacción humano-computador basada en argumentos), así como se proponen algunos de los cimientos esenciales para el análisis automático completo de discursos argumentativos en lenguaje natural.[CA] L'argumentació computacional és l'àrea de recerca que estudia i analitza l'ús de distintes tècniques i algoritmes que aproximen el raonament argumentatiu humà des d'un punt de vista computacional. En aquesta tesi doctoral s'estudia l'ús de distintes tècniques proposades sota el marc de l'argumentació computacional per a realitzar una anàlisi automàtic del discurs argumentatiu, i per a desenvolupar tècniques de persuasió computacional basades en arguments. Amb aquestos objectius, en primer lloc es presenta una completa revisió de l'estat de l'art i es proposa una classificació dels treballs existents en l'àrea de l'argumentació computacional. Aquesta revisió permet contextualitzar i entendre la investigació previa de forma més clara des de la perspectiva humana del raonament argumentatiu, així com identificar les principals limitacions i futures tendències de la investigació realitzada en argumentació computacional. En segon lloc, amb l'objectiu de sol\cdotlucionar algunes d'aquestes limitacions, hem creat i descrit un nou conjunt de dades que ens permet abordar nous reptes i investigar problemes prèviament inabordables (e.g., avaluació automàtica de debats orals). Conjuntament amb aquestes dades, es proposa un nou sistema per a l'extracció d'arguments i es realitza l'anàlisi comparativa de distintes tècniques per a aquesta mateixa tasca. A més a més, es proposa un nou algoritme per a l'avaluació automàtica de debats argumentatius i es prova amb debats humans reals. Finalment, en tercer lloc es presenten una sèrie d'estudis i propostes per a millorar la capacitat persuasiva de sistemes d'argumentació computacionals en la interacció amb usuaris humans. D'aquesta forma, en aquesta tesi es presenten avanços en cada una de les parts principals del procés d'argumentació computacional (i.e., l'extracció automàtica d'arguments, la representació del coneixement i raonament basats en arguments, i la interacció humà-computador basada en arguments), així com es proposen alguns dels fonaments essencials per a l'anàlisi automàtica completa de discursos argumentatius en llenguatge natural.[EN] Computational argumentation is the area of research that studies and analyses the use of different techniques and algorithms that approximate human argumentative reasoning from a computational viewpoint. In this doctoral thesis we study the use of different techniques proposed under the framework of computational argumentation to perform an automatic analysis of argumentative discourse, and to develop argument-based computational persuasion techniques. With these objectives in mind, we first present a complete review of the state of the art and propose a classification of existing works in the area of computational argumentation. This review allows us to contextualise and understand the previous research more clearly from the human perspective of argumentative reasoning, and to identify the main limitations and future trends of the research done in computational argumentation. Secondly, to overcome some of these limitations, we create and describe a new corpus that allows us to address new challenges and investigate on previously unexplored problems (e.g., automatic evaluation of spoken debates). In conjunction with this data, a new system for argument mining is proposed and a comparative analysis of different techniques for this same task is carried out. In addition, we propose a new algorithm for the automatic evaluation of argumentative debates and we evaluate it with real human debates. Thirdly, a series of studies and proposals are presented to improve the persuasiveness of computational argumentation systems in the interaction with human users. In this way, this thesis presents advances in each of the main parts of the computational argumentation process (i.e., argument mining, argument-based knowledge representation and reasoning, and argument-based human-computer interaction), and proposes some of the essential foundations for the complete automatic analysis of natural language argumentative discourses.This thesis has been partially supported by the Generalitat Valenciana project PROME- TEO/2018/002 and by the Spanish Government projects TIN2017-89156-R and PID2020- 113416RB-I00.Ruiz Dolz, R. (2023). Computational Argumentation for the Automatic Analysis of Argumentative Discourse and Human Persuasion [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/194806Compendi

    New geometric algorithms and data structures for collision detection of dynamically deforming objects

    Get PDF
    Any virtual environment that supports interactions between virtual objects and/or a user and objects, needs a collision detection system to handle all interactions in a physically correct or plausible way. A collision detection system is needed to determine if objects are in contact or interpenetrates. These interpenetrations are resolved by a collision handling system. Because of the fact, that in nearly all simulations objects can interact with each other, collision detection is a fundamental technology, that is needed in all these simulations, like physically based simulation, robotic path and motion planning, virtual prototyping, and many more. Most virtual environments aim to represent the real-world as realistic as possible and therefore, virtual environments getting more and more complex. Furthermore, all models in a virtual environment should interact like real objects do, if forces are applied to the objects. Nearly all real-world objects will deform or break down in its individual parts if forces are acted upon the objects. Thus deformable objects are becoming more and more common in virtual environments, which want to be as realistic as possible and thus, will present new challenges to the collision detection system. The necessary collision detection computations can be very complex and this has the effect, that the collision detection process is the performance bottleneck in most simulations. Most rigid body collision detection approaches use a BVH as acceleration data structure. This technique is perfectly suitable if the object does not change its shape. For a soft body an update step is necessary to ensure that the underlying acceleration data structure is still valid after performing a simulation step. This update step can be very time consuming, is often hard to implement and in most cases will produce a degenerated BVH after some simulation steps, if the objects generally deform. Therefore, the here presented collision detection approach works entirely without an acceleration data structure and supports rigid and soft bodies. Furthermore, we can compute inter-object and intraobject collisions of rigid and deformable objects consisting of many tens of thousands of triangles in a few milliseconds. To realize this, a subdivision of the scene into parts using a fuzzy clustering approach is applied. Based on that all further steps for each cluster can be performed in parallel and if desired, distributed to different GPUs. Tests have been performed to judge the performance of our approach against other state-of-the-art collision detection algorithms. Additionally, we integrated our approach into Bullet, a commonly used physics engine, to evaluate our algorithm. In order to make a fair comparison of different rigid body collision detection algorithms, we propose a new collision detection Benchmarking Suite. Our Benchmarking Suite can evaluate both the performance as well as the quality of the collision response. Therefore, the Benchmarking Suite is subdivided into a Performance Benchmark and a Quality Benchmark. This approach needs to be extended to support soft body collision detection algorithms in the future.Jede virtuelle Umgebung, welche eine Interaktion zwischen den virtuellen Objekten in der Szene zulässt und/oder zwischen einem Benutzer und den Objekten, benötigt für eine korrekte Behandlung der Interaktionen eine Kollisionsdetektion. Nur dank der Kollisionsdetektion können Berührungen zwischen Objekten erkannt und mittels der Kollisionsbehandlung aufgelöst werden. Dies ist der Grund für die weite Verbreitung der Kollisionsdetektion in die verschiedensten Fachbereiche, wie der physikalisch basierten Simulation, der Pfadplanung in der Robotik, dem virtuellen Prototyping und vielen weiteren. Auf Grund des Bestrebens, die reale Umgebung in der virtuellen Welt so realistisch wie möglich nachzubilden, steigt die Komplexität der Szenen stetig. Fortwährend steigen die Anforderungen an die Objekte, sich realistisch zu verhalten, sollten Kräfte auf die einzelnen Objekte ausgeübt werden. Die meisten Objekte, die uns in unserer realen Welt umgeben, ändern ihre Form oder zerbrechen in ihre Einzelteile, wenn Kräfte auf sie einwirken. Daher kommen in realitätsnahen, virtuellen Umgebungen immer häufiger deformierbare Objekte zum Einsatz, was neue Herausforderungen an die Kollisionsdetektion stellt. Die hierfür Notwendigen, teils komplexen Berechnungen, führen dazu, dass die Kollisionsdetektion häufig der Performance-Bottleneck in der jeweiligen Simulation darstellt. Die meisten Kollisionsdetektionen für starre Körper benutzen eine Hüllkörperhierarchie als Beschleunigungsdatenstruktur. Diese Technik ist hervorragend geeignet, solange sich die Form des Objektes nicht verändert. Im Fall von deformierbaren Objekten ist eine Aktualisierung der Datenstruktur nach jedem Schritt der Simulation notwendig, damit diese weiterhin gültig ist. Dieser Aktualisierungsschritt kann, je nach Hierarchie, sehr zeitaufwendig sein, ist in den meisten Fällen schwer zu implementieren und generiert nach vielen Schritten der Simulation häufig eine entartete Hüllkörperhierarchie, sollte sich das Objekt sehr stark verformen. Um dies zu vermeiden, verzichtet unsere Kollisionsdetektion vollständig auf eine Beschleunigungsdatenstruktur und unterstützt sowohl rigide, wie auch deformierbare Körper. Zugleich können wir Selbstkollisionen und Kollisionen zwischen starren und/oder deformierbaren Objekten, bestehend aus vielen Zehntausenden Dreiecken, innerhalb von wenigen Millisekunden berechnen. Um dies zu realisieren, unterteilen wir die gesamte Szene in einzelne Bereiche mittels eines Fuzzy Clustering-Verfahrens. Dies ermöglicht es, dass alle Cluster unabhängig bearbeitet werden und falls gewünscht, die Berechnungen für die einzelnen Cluster auf verschiedene Grafikkarten verteilt werden können. Um die Leistungsfähigkeit unseres Ansatzes vergleichen zu können, haben wir diesen gegen aktuelle Verfahren für die Kollisionsdetektion antreten lassen. Weiterhin haben wir unser Verfahren in die Physik-Engine Bullet integriert, um das Verhalten in dynamischen Situationen zu evaluieren. Um unterschiedliche Kollisionsdetektionsalgorithmen für starre Körper korrekt und objektiv miteinander vergleichen zu können, haben wir eine Benchmarking-Suite entwickelt. Unsere Benchmarking- Suite kann sowohl die Geschwindigkeit, für die Bestimmung, ob zwei Objekte sich durchdringen, wie auch die Qualität der berechneten Kräfte miteinander vergleichen. Hierfür ist die Benchmarking-Suite in den Performance Benchmark und den Quality Benchmark unterteilt worden. In der Zukunft wird diese Benchmarking-Suite dahingehend erweitert, dass auch Kollisionsdetektionsalgorithmen für deformierbare Objekte unterstützt werden
    corecore