9 research outputs found

    Improving decision tree and neural network learning for evolving data-streams

    Get PDF
    High-throughput real-time Big Data stream processing requires fast incremental algorithms that keep models consistent with most recent data. In this scenario, Hoeffding Trees are considered the state-of-the-art single classifier for processing data streams and they are widely used in ensemble combinations. This thesis is devoted to the improvement of the performance of algorithms for machine learning/artificial intelligence on evolving data streams. In particular, we focus on improving the Hoeffding Tree classifier and its ensemble combinations, in order to reduce its resource consumption and its response time latency, achieving better throughput when processing evolving data streams. First, this thesis presents a study on using Neural Networks (NN) as an alternative method for processing data streams. The use of random features for improving NNs training speed is proposed and important issues are highlighted about the use of NN on a data stream setup. These issues motivated this thesis to go in the direction of improving the current state-of-the-art methods: Hoeffding Trees and their ensemble combinations. Second, this thesis proposes the Echo State Hoeffding Tree (ESHT), as an extension of the Hoeffding Tree to model time-dependencies typically present in data streams. The capabilities of the new proposed architecture on both regression and classification problems are evaluated. Third, a new methodology to improve the Adaptive Random Forest (ARF) is developed. ARF has been introduced recently, and it is considered the state-of-the-art classifier in the MOA framework (a popular framework for processing evolving data streams). This thesis proposes the Elastic Swap Random Forest, an extension to ARF that reduces the number of base learners in the ensemble down to one third on average, while providing similar accuracy than the standard ARF with 100 trees. And finally, a last contribution on a multi-threaded high performance scalable ensemble design that is highly adaptable to a variety of hardware platforms, ranging from server-class to edge computing. The proposed design achieves throughput improvements of 85x (Intel i7), 143x (Intel Xeon parsing from memory), 10x (Jetson TX1, ARM) and 23x (X-Gene2, ARM) compared to single-threaded MOA on i7. In addition, the proposal achieves 75% parallel efficiency when using 24 cores on the Intel Xeon.Procesar grandes flujos de datos (Big Data Streams, BDS) en tiempo real requiere el uso de algoritmos incrementales rápidos que mantengan los modelos consistentes con los datos más recientes. En este escenario, los Hoeffding Trees (HT) se consideran el clasificador simple más avanzado para procesar BDS, razon por la cual son ampliamente usados como base a la hora de combinar clasificadores en Ensembles. Esta tesis está dedicada a la mejora del rendimiento de algoritmos para Machine Learning/Iteligencia Artificial en BDS que evolucionan con el tiempo (es decir, BDS cuya distribución estadística cambia con el tiempo). En particular, nuestro objetivo es mejorar el Hoeffding Tree y sus combinaciones en Ensembles, con el objetivo de reducir el consumo de recursos y la latencia en el tiempo de respuesta, logrando un mejor rendimiento al procesar BDS que evolucionan en el tiempo. Primero, se presenta un estudio sobre el uso de redes neuronales (NN) con parámetros aleatorios como un método alternativo para procesar BDS con el objetivo de mejorar la velocidad de entrenamiento de Nns. También se destacan problemas importantes derivados del uso de NN para BDS. Como consecuencia, esta tesis tomo la dirección de mejorar los métodos de vanguardia en BDS: Hoeffding Trees y sus combinaciones en Ensembles. Segundo, se propone el Echo State Hoeffding Tree (ESHT), como una extensión del HT para modelar las dependencias temporales típicamente presentes en BDS. La nueva arquitectura propuesta se evalúa tanto en problemas de regresión como de clasificación. Tercero, se propone una extensión para el Adaptive Random Forest (ARF), publicado recientemente y considerado como el clasificador mas potente implementado en MOA (un framework muy popular para procesar BDS). Proponemos el Elastic Swap Random Forest para reducir el número de clasificadores en el ensemble a un tercio en promedio, al tiempo se mantiene un accuracy similar a la de un ARF estándar con 100 árboles. Finalmente, la última contribución de esta tesis es una arquitectura de Ensembles multi hilo para procesar BDS. Nuestro diseño es altamente adaptable a una variedad de plataformas de hardware, que van desde servidores hasta pequeños dispositivos en el Edge Computing (pej, Internet de las Cosas). El diseño propuesto logra mejoras de rendimiento de 85x (Intel i7), 143x (análisis de Intel Xeon desde la memoria), 10x (Jetson TX1, ARM) y 23x (X-Gene2, ARM) en comparación con MOA (un solo proceso) en un Intel i7. Además, la propuesta logra una eficiencia paralela del 75 \% cuando se usan 24 núcleos en el Intel Xeon.Postprint (published version

    Unsupervised tracking of time-evolving data streams and an application to short-term urban traffic flow forecasting

    Get PDF
    I am indebted to many people for their help and support I receive during my Ph.D. study and research at DIBRIS-University of Genoa. First and foremost, I would like to express my sincere thanks to my supervisors Prof.Dr. Masulli, and Prof.Dr. Rovetta for the invaluable guidance, frequent meetings, and discussions, and the encouragement and support on my way of research. I thanks all the members of the DIBRIS for their support and kindness during my 4 years Ph.D. I would like also to acknowledge the contribution of the projects Piattaforma per la mobili\ue0 Urbana con Gestione delle INformazioni da sorgenti eterogenee (PLUG-IN) and COST Action IC1406 High Performance Modelling and Simulation for Big Data Applications (cHiPSet). Last and most importantly, I wish to thanks my family: my wife Shaimaa who stays with me through the joys and pains; my daughter and son whom gives me happiness every-day; and my parents for their constant love and encouragement

    Design and validation of novel methods for long-term road traffic forecasting

    Get PDF
    132 p.Road traffic management is a critical aspect for the design and planning of complex urban transport networks for which vehicle flow forecasting is an essential component. As a testimony of its paramount relevance in transport planning and logistics, thousands of scientific research works have covered the traffic forecasting topic during the last 50 years. In the beginning most approaches relied on autoregressive models and other analysis methods suited for time series data. During the last two decades, the development of new technology, platforms and techniques for massive data processing under the Big Data umbrella, the availability of data from multiple sources fostered by the Open Data philosophy and an ever-growing need of decision makers for accurate traffic predictions have shifted the spotlight to data-driven procedures. Even in this convenient context, with abundance of open data to experiment and advanced techniques to exploit them, most predictive models reported in literature aim for shortterm forecasts, and their performance degrades when the prediction horizon is increased. Long-termforecasting strategies are more scarce, and commonly based on the detection and assignment to patterns. These approaches can perform reasonably well unless an unexpected event provokes non predictable changes, or if the allocation to a pattern is inaccurate.The main core of the work in this Thesis has revolved around datadriven traffic forecasting, ultimately pursuing long-term forecasts. This has broadly entailed a deep analysis and understanding of the state of the art, and dealing with incompleteness of data, among other lesser issues. Besides, the second part of this dissertation presents an application outlook of the developed techniques, providing methods and unexpected insights of the local impact of traffic in pollution. The obtained results reveal that the impact of vehicular emissions on the pollution levels is overshadowe

    IoT and Sensor Networks in Industry and Society

    Get PDF
    The exponential progress of Information and Communication Technology (ICT) is one of the main elements that fueled the acceleration of the globalization pace. Internet of Things (IoT), Artificial Intelligence (AI) and big data analytics are some of the key players of the digital transformation that is affecting every aspect of human's daily life, from environmental monitoring to healthcare systems, from production processes to social interactions. In less than 20 years, people's everyday life has been revolutionized, and concepts such as Smart Home, Smart Grid and Smart City have become familiar also to non-technical users. The integration of embedded systems, ubiquitous Internet access, and Machine-to-Machine (M2M) communications have paved the way for paradigms such as IoT and Cyber Physical Systems (CPS) to be also introduced in high-requirement environments such as those related to industrial processes, under the forms of Industrial Internet of Things (IIoT or I2oT) and Cyber-Physical Production Systems (CPPS). As a consequence, in 2011 the German High-Tech Strategy 2020 Action Plan for Germany first envisioned the concept of Industry 4.0, which is rapidly reshaping traditional industrial processes. The term refers to the promise to be the fourth industrial revolution. Indeed, the first industrial revolution was triggered by water and steam power. Electricity and assembly lines enabled mass production in the second industrial revolution. In the third industrial revolution, the introduction of control automation and Programmable Logic Controllers (PLCs) gave a boost to factory production. As opposed to the previous revolutions, Industry 4.0 takes advantage of Internet access, M2M communications, and deep learning not only to improve production efficiency but also to enable the so-called mass customization, i.e. the mass production of personalized products by means of modularized product design and flexible processes. Less than five years later, in January 2016, the Japanese 5th Science and Technology Basic Plan took a further step by introducing the concept of Super Smart Society or Society 5.0. According to this vision, in the upcoming future, scientific and technological innovation will guide our society into the next social revolution after the hunter-gatherer, agrarian, industrial, and information eras, which respectively represented the previous social revolutions. Society 5.0 is a human-centered society that fosters the simultaneous achievement of economic, environmental and social objectives, to ensure a high quality of life to all citizens. This information-enabled revolution aims to tackle today’s major challenges such as an ageing population, social inequalities, depopulation and constraints related to energy and the environment. Accordingly, the citizens will be experiencing impressive transformations into every aspect of their daily lives. This book offers an insight into the key technologies that are going to shape the future of industry and society. It is subdivided into five parts: the I Part presents a horizontal view of the main enabling technologies, whereas the II-V Parts offer a vertical perspective on four different environments. The I Part, dedicated to IoT and Sensor Network architectures, encompasses three Chapters. In Chapter 1, Peruzzi and Pozzebon analyse the literature on the subject of energy harvesting solutions for IoT monitoring systems and architectures based on Low-Power Wireless Area Networks (LPWAN). The Chapter does not limit the discussion to Long Range Wise Area Network (LoRaWAN), SigFox and Narrowband-IoT (NB-IoT) communication protocols, but it also includes other relevant solutions such as DASH7 and Long Term Evolution MAchine Type Communication (LTE-M). In Chapter 2, Hussein et al. discuss the development of an Internet of Things message protocol that supports multi-topic messaging. The Chapter further presents the implementation of a platform, which integrates the proposed communication protocol, based on Real Time Operating System. In Chapter 3, Li et al. investigate the heterogeneous task scheduling problem for data-intensive scenarios, to reduce the global task execution time, and consequently reducing data centers' energy consumption. The proposed approach aims to maximize the efficiency by comparing the cost between remote task execution and data migration. The II Part is dedicated to Industry 4.0, and includes two Chapters. In Chapter 4, Grecuccio et al. propose a solution to integrate IoT devices by leveraging a blockchain-enabled gateway based on Ethereum, so that they do not need to rely on centralized intermediaries and third-party services. As it is better explained in the paper, where the performance is evaluated in a food-chain traceability application, this solution is particularly beneficial in Industry 4.0 domains. Chapter 5, by De Fazio et al., addresses the issue of safety in workplaces by presenting a smart garment that integrates several low-power sensors to monitor environmental and biophysical parameters. This enables the detection of dangerous situations, so as to prevent or at least reduce the consequences of workers accidents. The III Part is made of two Chapters based on the topic of Smart Buildings. In Chapter 6, Petroșanu et al. review the literature about recent developments in the smart building sector, related to the use of supervised and unsupervised machine learning models of sensory data. The Chapter poses particular attention on enhanced sensing, energy efficiency, and optimal building management. In Chapter 7, Oh examines how much the education of prosumers about their energy consumption habits affects power consumption reduction and encourages energy conservation, sustainable living, and behavioral change, in residential environments. In this Chapter, energy consumption monitoring is made possible thanks to the use of smart plugs. Smart Transport is the subject of the IV Part, including three Chapters. In Chapter 8, Roveri et al. propose an approach that leverages the small world theory to control swarms of vehicles connected through Vehicle-to-Vehicle (V2V) communication protocols. Indeed, considering a queue dominated by short-range car-following dynamics, the Chapter demonstrates that safety and security are increased by the introduction of a few selected random long-range communications. In Chapter 9, Nitti et al. present a real time system to observe and analyze public transport passengers' mobility by tracking them throughout their journey on public transport vehicles. The system is based on the detection of the active Wi-Fi interfaces, through the analysis of Wi-Fi probe requests. In Chapter 10, Miler et al. discuss the development of a tool for the analysis and comparison of efficiency indicated by the integrated IT systems in the operational activities undertaken by Road Transport Enterprises (RTEs). The authors of this Chapter further provide a holistic evaluation of efficiency of telematics systems in RTE operational management. The book ends with the two Chapters of the V Part on Smart Environmental Monitoring. In Chapter 11, He et al. propose a Sea Surface Temperature Prediction (SSTP) model based on time-series similarity measure, multiple pattern learning and parameter optimization. In this strategy, the optimal parameters are determined by means of an improved Particle Swarm Optimization method. In Chapter 12, Tsipis et al. present a low-cost, WSN-based IoT system that seamlessly embeds a three-layered cloud/fog computing architecture, suitable for facilitating smart agricultural applications, especially those related to wildfire monitoring. We wish to thank all the authors that contributed to this book for their efforts. We express our gratitude to all reviewers for the volunteering support and precious feedback during the review process. We hope that this book provides valuable information and spurs meaningful discussion among researchers, engineers, businesspeople, and other experts about the role of new technologies into industry and society

    Design and validation of novel methods for long-term road traffic forecasting

    Get PDF
    132 p.Road traffic management is a critical aspect for the design and planning of complex urban transport networks for which vehicle flow forecasting is an essential component. As a testimony of its paramount relevance in transport planning and logistics, thousands of scientific research works have covered the traffic forecasting topic during the last 50 years. In the beginning most approaches relied on autoregressive models and other analysis methods suited for time series data. During the last two decades, the development of new technology, platforms and techniques for massive data processing under the Big Data umbrella, the availability of data from multiple sources fostered by the Open Data philosophy and an ever-growing need of decision makers for accurate traffic predictions have shifted the spotlight to data-driven procedures. Even in this convenient context, with abundance of open data to experiment and advanced techniques to exploit them, most predictive models reported in literature aim for shortterm forecasts, and their performance degrades when the prediction horizon is increased. Long-termforecasting strategies are more scarce, and commonly based on the detection and assignment to patterns. These approaches can perform reasonably well unless an unexpected event provokes non predictable changes, or if the allocation to a pattern is inaccurate.The main core of the work in this Thesis has revolved around datadriven traffic forecasting, ultimately pursuing long-term forecasts. This has broadly entailed a deep analysis and understanding of the state of the art, and dealing with incompleteness of data, among other lesser issues. Besides, the second part of this dissertation presents an application outlook of the developed techniques, providing methods and unexpected insights of the local impact of traffic in pollution. The obtained results reveal that the impact of vehicular emissions on the pollution levels is overshadowe

    Technology 2002: the Third National Technology Transfer Conference and Exposition, Volume 1

    Get PDF
    The proceedings from the conference are presented. The topics covered include the following: computer technology, advanced manufacturing, materials science, biotechnology, and electronics

    Women in Artificial intelligence (AI)

    Get PDF
    This Special Issue, entitled "Women in Artificial Intelligence" includes 17 papers from leading women scientists. The papers cover a broad scope of research areas within Artificial Intelligence, including machine learning, perception, reasoning or planning, among others. The papers have applications to relevant fields, such as human health, finance, or education. It is worth noting that the Issue includes three papers that deal with different aspects of gender bias in Artificial Intelligence. All the papers have a woman as the first author. We can proudly say that these women are from countries worldwide, such as France, Czech Republic, United Kingdom, Australia, Bangladesh, Yemen, Romania, India, Cuba, Bangladesh and Spain. In conclusion, apart from its intrinsic scientific value as a Special Issue, combining interesting research works, this Special Issue intends to increase the invisibility of women in AI, showing where they are, what they do, and how they contribute to developments in Artificial Intelligence from their different places, positions, research branches and application fields. We planned to issue this book on the on Ada Lovelace Day (11/10/2022), a date internationally dedicated to the first computer programmer, a woman who had to fight the gender difficulties of her times, in the XIX century. We also thank the publisher for making this possible, thus allowing for this book to become a part of the international activities dedicated to celebrating the value of women in ICT all over the world. With this book, we want to pay homage to all the women that contributed over the years to the field of AI

    Epidemiology of Injury in English Women's Super league Football: A Cohort Study

    Get PDF
    INTRODUCTION: The epidemiology of injury in male professional football has been well documented (Ekstrand, Hägglund, & Waldén, 2011) and used as a basis to understand injury trends for a number of years. The prevalence and incidence of injuries occurring in womens super league football is unknown. The aim of this study is to estimate the prevalence and incidence of injury in an English Super League Women’s Football squad. METHODS: Following ethical approval from Leeds Beckett University, players (n = 25) signed to a Women’s Super League Football club provided written informed consent to complete a self-administered injury survey. Measures of exposure, injury and performance over a 12-month period was gathered. Participants were classified as injured if they reported a football injury that required medical attention or withdrawal from participation for one day or more. Injuries were categorised as either traumatic or overuse and whether the injury was a new injury and/or re-injury of the same anatomical site RESULTS: 43 injuries, including re-injury were reported by the 25 participants providing a clinical incidence of 1.72 injuries per player. Total incidence of injury was 10.8/1000 h (95% CI: 7.5 to 14.03). Participants were at higher risk of injury during a match compared with training (32.4 (95% CI: 15.6 to 48.4) vs 8.0 (95% CI: 5.0 to 10.85)/1000 hours, p 28 days) of which there were three non-contact anterior cruciate ligament (ACL) injuries. The epidemiological incidence proportion was 0.80 (95% CI: 0.64 to 0.95) and the average probability that any player on this team will sustain at least one injury was 80.0% (95% CI: 64.3% to 95.6%) CONCLUSION: This is the first report capturing exposure and injury incidence by anatomical site from a cohort of English players and is comparable to that found in Europe (6.3/1000 h (95% CI 5.4 to 7.36) Larruskain et al 2017). The number of ACL injuries highlights a potential injury burden for a squad of this size. Multi-site prospective investigations into the incidence and prevalence of injury in women’s football are require

    A Systematic Review and Meta-Analysis of the Incidence of Injury in Professional Female Soccer

    Get PDF
    The epidemiology of injury in male professional football is well documented and has been used as a basis to monitor injury trends and implement injury prevention strategies. There are no systematic reviews that have investigated injury incidence in women’s professional football. Therefore, the extent of injury burden in women’s professional football remains unknown. PURPOSE: The primary aim of this study was to calculate an overall incidence rate of injury in senior female professional soccer. The secondary aims were to provide an incidence rate for training and match play. METHODS: PubMed, Discover, EBSCO, Embase and ScienceDirect electronic databases were searched from inception to September 2018. Two reviewers independently assessed study quality using the Strengthening the Reporting of Observational Studies in Epidemiology statement using a 22-item STROBE checklist. Seven prospective studies (n=1137 professional players) were combined in a pooled analysis of injury incidence using a mixed effects model. Heterogeneity was evaluated using the Cochrane Q statistic and I2. RESULTS: The epidemiological incidence proportion over one season was 0.62 (95% CI 0.59 - 0.64). Mean total incidence of injury was 3.15 (95% CI 1.54 - 4.75) injuries per 1000 hours. The mean incidence of injury during match play was 10.72 (95% CI 9.11 - 12.33) and during training was 2.21 (95% CI 0.96 - 3.45). Data analysis found a significant level of heterogeneity (total Incidence, X2 = 16.57 P < 0.05; I2 = 63.8%) and during subsequent sub group analyses in those studies reviewed (match incidence, X2 = 76.4 (d.f. = 7), P <0.05; I2 = 90.8%, training incidence, X2 = 16.97 (d.f. = 7), P < 0.05; I2 = 58.8%). Appraisal of the study methodologies revealed inconsistency in the use of injury terminology, data collection procedures and calculation of exposure by researchers. Such inconsistencies likely contribute to the large variance in the incidence and prevalence of injury reported. CONCLUSIONS: The estimated risk of sustaining at least one injury over one football season is 62%. Continued reporting of heterogeneous results in population samples limits meaningful comparison of studies. Standardising the criteria used to attribute injury and activity coupled with more accurate methods of calculating exposure will overcome such limitations
    corecore