34 research outputs found

    Performance Evaluation of In-storage Processing Architectures for Diverse Applications and Benchmarks

    Get PDF
    University of Minnesota Ph.D. dissertation.June 2018. Major: Electrical/Computer Engineering. Advisor: David Lilja. 1 computer file (PDF); vii, 100 pages.As we inch towards the future, the storage needs of the world are going to be massive and diversied. To tackle the needs of the next generation, the storage systems are required to be studied and require innovative solutions. These solutions need to solve multitude of issues involving high power consumption of traditional systems, manageability, easy scaling out, and integration into existing systems. Therefore, we need to rethink the new technologies from the ground up. To keep the energy signature under control we devised a new architecture called Storage Processing Unit (SPU). For the modeling of this architecture we incorporate a processing element inside the storage medium to limit the data movement between the storage device and the host processor. This resulted in a hierarchal architecture which required an extensive design space exploration along with in-depth study of the applications. We found this new architecture to provide energy savings from 11-423X and gave performance gains from 4-66X for applications including k-means, Sparse BLAS, and others. Moreover, to understand the diverse nature of the applications and newer technologies, we tried the concept of in-storage processing for unstructured data. This type of data is demonstrating huge amount of growth and would continue to do so. Seagate's new class of drives - Kinetic Drives, address the rise of unstructured data. They have a processing element inside disk drives that execute LevelDB, a key-value store. We evaluated this off-the-shelf device using micro and macro benchmarks for an in-depth throughput and latency benchmarking. We observed sequential write throughput of 63 MB/sec and sequential read throughput of 78 MB/sec for 1 MB value sizes. We tested several unique features including P2P transfer that takes place in a Kinetic Drive. These new class of drives outperformed traditional servers workloads for several test cases. Finally, large number of these devices are needed for huge amounts of data. To demonstrate that Kinetic Drives reduce the management complexity for large-scale deployment, we conducted a study. We allocated large amounts of data on Kinetic Drives and then evaluated the performance of the system for migration of data amongst drives. Previously developed key indexing schemes were evaluated which gave important insights into their performance differences. Based on this study we can conclude that efficient mapping of key-value pairs to drives could be obtained. This lead to an understanding of the trade-offs between the number of empty drives and mapping of different key ranges to different drives. In conclusion, in-storage processing architectures bring an interesting aspect where processing is moved closer to the data. This leads to a paradigm shift which often results in a major software and hardware architectural changes. Furthermore, the new architectures have the potential to perform better than the traditional systems but require easy integration with the existing systems

    Exploiting task-based programming models for resilience

    Get PDF
    Hardware errors become more common as silicon technologies shrink and become more vulnerable, especially in memory cells, which are the most exposed to errors. Permanent and intermittent faults are caused by manufacturing variability and circuits ageing. While these can be mitigated once they are identified, their continuous rate of appearance throughout the lifetime of memory devices will always cause unexpected errors. In addition, transient faults are caused by effects such as radiation or small voltage/frequency margins, and there is no efficient way to shield against these events. Other constraints related to the diminishing sizes of transistors, such as power consumption and memory latency have caused the microprocessor industry to turn to increasingly complex processor architectures. To solve the difficulties arising from programming such architectures, programming models have emerged that rely on runtime systems. These systems form a new intermediate layer on the hardware-software abstraction stack, that performs tasks such as distributing work across computing resources: processor cores, accelerators, etc. These runtime systems dispose of a lot of information, both from the hardware and the applications, and offer thus many possibilities for optimisations. This thesis proposes solutions to the increasing fault rates in memory, across multiple resilience disciplines, from algorithm-based fault tolerance to hardware error correcting codes, through OS reliability strategies. These solutions rely for their efficiency on the opportunities presented by runtime systems. The first contribution of this thesis is an algorithmic-based resilience technique, allowing to tolerate detected errors in memory. This technique allows to recover data that is lost by performing computations that rely on simple redundancy relations identified in the program. The recovery is demonstrated for a family of iterative solvers, the Krylov subspace methods, and evaluated for the conjugate gradient solver. The runtime can transparently overlap the recovery with the computations of the algorithm, which allows to mask the already low overheads of this technique. The second part of this thesis proposes a metric to characterise the impact of faults in memory, which outperforms state-of-the-art metrics in precision and assurances on the error rate. This metric reveals a key insight into data that is not relevant to the program, and we propose an OS-level strategy to ignore errors in such data, by delaying the reporting of detected errors. This allows to reduce failure rates of running programs, by ignoring errors that have no impact. The architectural-level contribution of this thesis is a dynamically adaptable Error Correcting Code (ECC) scheme, that can increase protection of memory regions where the impact of errors is highest. A runtime methodology is presented to estimate the fault rate at runtime using our metric, through performance monitoring tools of current commodity processors. Guiding the dynamic ECC scheme online using the methodology's vulnerability estimates allows to decrease error rates of programs at a fraction of the redundancy cost required for a uniformly stronger ECC. This provides a useful and wide range of trade-offs between redundancy and error rates. The work presented in this thesis demonstrates that runtime systems allow to make the most of redundancy stored in memory, to help tackle increasing error rates in DRAM. This exploited redundancy can be an inherent part of algorithms that allows to tolerate higher fault rates, or in the form of dead data stored in memory. Redundancy can also be added to a program, in the form of ECC. In all cases, the runtime allows to decrease failure rates efficiently, by diminishing recovery costs, identifying redundant data, or targeting critical data. It is thus a very valuable tool for the future computing systems, as it can perform optimisations across different layers of abstractions.Los errores en memoria se vuelven más comunes a medida que las tecnologías de silicio reducen su tamaño. La variabilidad de fabricación y el envejecimiento de los circuitos causan fallos permanentes e intermitentes. Aunque se pueden mitigar una vez identificados, su continua tasa de aparición siempre causa errores inesperados. Además, la memoria también sufre de fallos transitorios contra los cuales no se puede proteger eficientemente. Estos fallos están causados por efectos como la radiación o los reducidos márgenes de voltaje y frecuencia. Otras restricciones coetáneas, como el consumo de energía y la latencia de la memoria, obligaron a las arquitecturas de computadores a volverse cada vez más complejas. Para programar tales procesadores, se desarrollaron modelos de programación basados en entornos de ejecución. Estos sistemas forman una nueva abstracción entre hardware y software, realizando tareas como la distribución del trabajo entre recursos informáticos: núcleos de procesadores, aceleradores, etc. Estos entornos de ejecución disponen de mucha información tanto sobre el hardware como sobre las aplicaciones, y ofrecen así muchas posibilidades de optimización. Esta tesis propone soluciones a los fallos en memoria entre múltiples disciplinas de resiliencia, desde la tolerancia a fallos basada en algoritmos, hasta los códigos de corrección de errores en hardware, incluyendo estrategias de resiliencia del sistema operativo. La eficiencia de estas soluciones depende de las oportunidades que presentan los entornos de ejecución. La primera contribución de esta tesis es una técnica a nivel algorítmico que permite corregir fallos encontrados mientras el programa su ejecuta. Para corregir fallos se han identificado redundancias simples en los datos del programa para toda una clase de algoritmos, los métodos del subespacio de Krylov (gradiente conjugado, GMRES, etc). La estrategia de recuperación de datos desarrollada permite corregir errores sin tener que reinicializar el algoritmo, y aprovecha el modelo de programación para superponer las computaciones del algoritmo y de la recuperación de datos. La segunda parte de esta tesis propone una métrica para caracterizar el impacto de los fallos en la memoria. Esta métrica supera en precisión a las métricas de vanguardia y permite identificar datos que son menos relevantes para el programa. Se propone una estrategia a nivel del sistema operativo retrasando la notificación de los errores detectados, que permite ignorar fallos en estos datos y reducir la tasa de fracaso del programa. Por último, la contribución a nivel arquitectónico de esta tesis es un esquema de Código de Corrección de Errores (ECC por sus siglas en inglés) adaptable dinámicamente. Este esquema puede aumentar la protección de las regiones de memoria donde el impacto de los errores es mayor. Se presenta una metodología para estimar el riesgo de fallo en tiempo de ejecución utilizando nuestra métrica, a través de las herramientas de monitorización del rendimiento disponibles en los procesadores actuales. El esquema de ECC guiado dinámicamente con estas estimaciones de vulnerabilidad permite disminuir la tasa de fracaso de los programas a una fracción del coste de redundancia requerido para un ECC uniformemente más fuerte. El trabajo presentado en esta tesis demuestra que los entornos de ejecución permiten aprovechar al máximo la redundancia contenida en la memoria, para contener el aumento de los errores en ella. Esta redundancia explotada puede ser una parte inherente de los algoritmos que permite tolerar más fallos, en forma de datos inutilizados almacenados en la memoria, o agregada a la memoria de un programa en forma de ECC. En todos los casos, el entorno de ejecución permite disminuir los efectos de los fallos de manera eficiente, disminuyendo los costes de recuperación, identificando datos redundantes, o focalizando esfuerzos de protección en los datos críticos.Postprint (published version

    Internet of Things and Sensors Networks in 5G Wireless Communications

    Get PDF
    This book is a printed edition of the Special Issue Internet of Things and Sensors Networks in 5G Wireless Communications that was published in Sensors

    Internet of Things and Sensors Networks in 5G Wireless Communications

    Get PDF
    This book is a printed edition of the Special Issue Internet of Things and Sensors Networks in 5G Wireless Communications that was published in Sensors

    Internet of Things and Sensors Networks in 5G Wireless Communications

    Get PDF
    The Internet of Things (IoT) has attracted much attention from society, industry and academia as a promising technology that can enhance day to day activities, and the creation of new business models, products and services, and serve as a broad source of research topics and ideas. A future digital society is envisioned, composed of numerous wireless connected sensors and devices. Driven by huge demand, the massive IoT (mIoT) or massive machine type communication (mMTC) has been identified as one of the three main communication scenarios for 5G. In addition to connectivity, computing and storage and data management are also long-standing issues for low-cost devices and sensors. The book is a collection of outstanding technical research and industrial papers covering new research results, with a wide range of features within the 5G-and-beyond framework. It provides a range of discussions of the major research challenges and achievements within this topic

    Machine learning enabled millimeter wave cellular system and beyond

    Get PDF
    Millimeter-wave (mmWave) communication with advantages of abundant bandwidth and immunity to interference has been deemed a promising technology for the next generation network and beyond. With the help of mmWave, the requirements envisioned of the future mobile network could be met, such as addressing the massive growth required in coverage, capacity as well as traffic, providing a better quality of service and experience to users, supporting ultra-high data rates and reliability, and ensuring ultra-low latency. However, due to the characteristics of mmWave, such as short transmission distance, high sensitivity to the blockage, and large propagation path loss, there are some challenges for mmWave cellular network design. In this context, to enjoy the benefits from the mmWave networks, the architecture of next generation cellular network will be more complex. With a more complex network, it comes more complex problems. The plethora of possibilities makes planning and managing a complex network system more difficult. Specifically, to provide better Quality of Service and Quality of Experience for users in the such network, how to provide efficient and effective handover for mobile users is important. The probability of handover trigger will significantly increase in the next generation network, due to the dense small cell deployment. Since the resources in the base station (BS) is limited, the handover management will be a great challenge. Further, to generate the maximum transmission rate for the users, Line-of-sight (LOS) channel would be the main transmission channel. However, due to the characteristics of mmWave and the complexity of the environment, LOS channel is not feasible always. Non-line-of-sight channel should be explored and used as the backup link to serve the users. With all the problems trending to be complex and nonlinear, and the data traffic dramatically increasing, the conventional method is not effective and efficiency any more. In this case, how to solve the problems in the most efficient manner becomes important. Therefore, some new concepts, as well as novel technologies, require to be explored. Among them, one promising solution is the utilization of machine learning (ML) in the mmWave cellular network. On the one hand, with the aid of ML approaches, the network could learn from the mobile data and it allows the system to use adaptable strategies while avoiding unnecessary human intervention. On the other hand, when ML is integrated in the network, the complexity and workload could be reduced, meanwhile, the huge number of devices and data could be efficiently managed. Therefore, in this thesis, different ML techniques that assist in optimizing different areas in the mmWave cellular network are explored, in terms of non-line-of-sight (NLOS) beam tracking, handover management, and beam management. To be specific, first of all, a procedure to predict the angle of arrival (AOA) and angle of departure (AOD) both in azimuth and elevation in non-line-of-sight mmWave communications based on a deep neural network is proposed. Moreover, along with the AOA and AOD prediction, a trajectory prediction is employed based on the dynamic window approach (DWA). The simulation scenario is built with ray tracing technology and generate data. Based on the generated data, there are two deep neural networks (DNNs) to predict AOA/AOD in the azimuth (AAOA/AAOD) and AOA/AOD in the elevation (EAOA/EAOD). Furthermore, under an assumption that the UE mobility and the precise location is unknown, UE trajectory is predicted and input into the trained DNNs as a parameter to predict the AAOA/AAOD and EAOA/EAOD to show the performance under a realistic assumption. The robustness of both procedures is evaluated in the presence of errors and conclude that DNN is a promising tool to predict AOA and AOD in a NLOS scenario. Second, a novel handover scheme is designed aiming to optimize the overall system throughput and the total system delay while guaranteeing the quality of service (QoS) of each user equipment (UE). Specifically, the proposed handover scheme called O-MAPPO integrates the reinforcement learning (RL) algorithm and optimization theory. An RL algorithm known as multi-agent proximal policy optimization (MAPPO) plays a role in determining handover trigger conditions. Further, an optimization problem is proposed in conjunction with MAPPO to select the target base station and determine beam selection. It aims to evaluate and optimize the system performance of total throughput and delay while guaranteeing the QoS of each UE after the handover decision is made. Third, a multi-agent RL-based beam management scheme is proposed, where multiagent deep deterministic policy gradient (MADDPG) is applied on each small-cell base station (SCBS) to maximize the system throughput while guaranteeing the quality of service. With MADDPG, smart beam management methods can serve the UEs more efficiently and accurately. Specifically, the mobility of UEs causes the dynamic changes of the network environment, the MADDPG algorithm learns the experience of these changes. Based on that, the beam management in the SCBS is optimized according the reward or penalty when severing different UEs. The approach could improve the overall system throughput and delay performance compared with traditional beam management methods. The works presented in this thesis demonstrate the potentiality of ML when addressing the problem from the mmWave cellular network. Moreover, it provides specific solutions for optimizing NLOS beam tracking, handover management and beam management. For NLOS beam tracking part, simulation results show that the prediction errors of the AOA and AOD can be maintained within an acceptable range of ±2. Further, when it comes to the handover optimization part, the numerical results show the system throughput and delay are improved by 10% and 25%, respectively, when compared with two typical RL algorithms, Deep Deterministic Policy Gradient (DDPG) and Deep Q-learning (DQL). Lastly, when it considers the intelligent beam management part, numerical results reveal the convergence performance of the MADDPG and the superiority in improving the system throughput compared with other typical RL algorithms and the traditional beam management method

    Intelligent Sensing and Learning for Advanced MIMO Communication Systems

    Get PDF

    Soundtrack recommendation for images

    Get PDF
    The drastic increase in production of multimedia content has emphasized the research concerning its organization and retrieval. In this thesis, we address the problem of music retrieval when a set of images is given as input query, i.e., the problem of soundtrack recommendation for images. The task at hand is to recommend appropriate music to be played during the presentation of a given set of query images. To tackle this problem, we formulate a hypothesis that the knowledge appropriate for the task is contained in publicly available contemporary movies. Our approach, Picasso, employs similarity search techniques inside the image and music domains, harvesting movies to form a link between the domains. To achieve a fair and unbiased comparison between different soundtrack recommendation approaches, we proposed an evaluation benchmark. The evaluation results are reported for Picasso and the baseline approach, using the proposed benchmark. We further address two efficiency aspects that arise from the Picasso approach. First, we investigate the problem of processing top-K queries with set-defined selections and propose an index structure that aims at minimizing the query answering latency. Second, we address the problem of similarity search in high-dimensional spaces and propose two enhancements to the Locality Sensitive Hashing (LSH) scheme. We also investigate the prospects of a distributed similarity search algorithm based on LSH using the MapReduce framework. Finally, we give an overview of the PicasSound|a smartphone application based on the Picasso approach.Der drastische Anstieg von verfügbaren Multimedia-Inhalten hat die Bedeutung der Forschung über deren Organisation sowie Suche innerhalb der Daten hervorgehoben. In dieser Doktorarbeit betrachten wir das Problem der Suche nach geeigneten Musikstücken als Hintergrundmusik für Diashows. Wir formulieren die Hypothese, dass die für das Problem erforderlichen Kenntnisse in öffentlich zugänglichen, zeitgenössischen Filmen enthalten sind. Unser Ansatz, Picasso, verwendet Techniken aus dem Bereich der Ähnlichkeitssuche innerhalb von Bild- und Musik-Domains, um basierend auf Filmszenen eine Verbindung zwischen beliebigen Bildern und Musikstücken zu lernen. Um einen fairen und unvoreingenommenen Vergleich zwischen verschiedenen Ansätzen zur Musikempfehlung zu erreichen, schlagen wir einen Bewertungs-Benchmark vor. Die Ergebnisse der Auswertung werden, anhand des vorgeschlagenen Benchmarks, für Picasso und einen weiteren, auf Emotionen basierenden Ansatz, vorgestellt. Zusätzlich behandeln wir zwei Effizienzaspekte, die sich aus dem Picasso Ansatz ergeben. (i) Wir untersuchen das Problem der Ausführung von top-K Anfragen, bei denen die Ergebnismenge ad-hoc auf eine kleine Teilmenge des gesamten Indexes eingeschränkt wird. (ii) Wir behandeln das Problem der Ähnlichkeitssuche in hochdimensionalen Räumen und schlagen zwei Erweiterungen des Lokalitätssensitiven Hashing (LSH) Schemas vor. Zusätzlich untersuchen wir die Erfolgsaussichten eines verteilten Algorithmus für die Ähnlichkeitssuche, der auf LSH unter Verwendung des MapReduce Frameworks basiert. Neben den vorgenannten wissenschaftlichen Ergebnissen beschreiben wir ferner das Design und die Implementierung von PicassSound, einer auf Picasso basierenden Smartphone-Anwendung
    corecore