7 research outputs found

    From detection to optimization: impact of soft errors on high-performance computing applications

    Get PDF
    As high-performance computing (HPC) continues to progress, constraints on HPC system design forces the handling of errors to higher levels in the software stack. Of the types of errors facing HPC, soft errors that silently corrupt system or application state are among the most severe. The behavior of HPC applications in the presence of soft errors is critical to gain insight for effective utilization of HPC systems. The need to understand this behavior can be used in developing algorithm-based error detection guided by application characteristics from fault injection and error propagation studies. Furthermore, the realization that applications are tolerant to small errors allows optimizations such as lossy compression on high-cost data transfers. Lossy compression adds small user controllable amounts of error when compressing data, to reduce data size before expensive data transfers saving time. This dissertation investigates and improves the resiliency of HPC applications to soft errors, and explores lossy compression as a new form of optimization for expensive, time-consuming data transfers

    Principled design of evolutionary learning sytems for large scale data mining

    Get PDF
    Currently, the data mining and machine learning fields are facing new challenges because of the amount of information that is collected and needs processing. Many sophisticated learning approaches cannot simply cope with large and complex domains, because of the unmanageable execution times or the loss of prediction and generality capacities that occurs when the domains become more complex. Therefore, to cope with the volumes of information of the current realworld problems there is a need to push forward the boundaries of sophisticated data mining techniques. This thesis is focused on improving the efficiency of Evolutionary Learning systems in large scale domains. Specifically the objective of this thesis is improving the efficiency of the Bioinformatic Hierarchical Evolutionary Learning (BioHEL) system, a system designed with the purpose of handling large domains. This is a classifier system that uses an Iterative Rule Learning approach to generate a set of rules one by one using consecutive Genetic Algorithms. This system have shown to be very competitive so far in large and complex domains. In particular, BioHEL has obtained very important results when solving protein structure prediction problems and has won related merits, such as being placed among the best algorithms for this purpose at the Critical Assessment of Techniques for Protein Structure Prediction (CASP) in 2008 and 2010, and winning the bronze medal at the HUMIES Awards for Human-competitive results in 2007. However, there is still a need to analyse this system in a principled way to determine how the current mechanisms work together to solve larger domains and determine the aspects of the system that can be improved towards this aim. To fulfil the objective of this thesis, the work is divided in two parts. In the first part of the thesis exhaustive experimentation was carried out to determine ways in which the system could be improved. From this exhaustive analysis three main weaknesses are pointed out: a) the problem-dependancy of parameters in BioHEL's fitness function, which results in having a system difficult to set up and which requires an extensive preliminary experimentation to determine the adequate values for these parameters; b) the execution time of the learning process, which at the moment does not use any parallelisation techniques and depends on the size of the training sets; and c) the lack of global supervision over the generated solutions which comes from the usage of the Iterative Rule Learning paradigm and produces larger rule sets in which there is no guarantee of minimality or maximal generality. The second part of the thesis is focused on tackling each one of the weaknesses abovementioned to have a system capable of handling larger domains. First a heuristic approach to set parameters within BioHEL's fitness function is developed. Second a new parallel evaluation process that runs on General Purpose Graphic Processing Units was developed. Finally, post-processing operators to tackle the generality and cardinality of the generated solutions are proposed. By means of these enhancements we managed to improve the BioHEL system to reduce both the learning and the preliminary experimentation time, increase the generality of the final solutions and make the system more accessible for end-users. Moreover, as the techniques discussed in this thesis can be easily extended to other Evolutionary Learning systems we consider them important additions to the research in this field towards tackling large scale domains

    Ramon Llull's Ars Magna

    Get PDF

    Proceedings of the 3rd International Conference on Models and Technologies for Intelligent Transportation Systems 2013

    Get PDF
    Challenges arising from an increasing traffic demand, limited resource availability and growing quality expectations of the customers can only be met successfully, if each transport mode is regarded as an intelligent transportation system itself, but also as part of one intelligent transportation system with “intelligent” intramodal and intermodal interfaces. This topic is well reflected in the Third International Conference on “Models and Technologies for Intelligent Transportation Systems” which took place in Dresden 2013 (previous editions: Rome 2009, Leuven 2011). With its variety of traffic management problems that can be solved using similar methods and technologies, but with application specific models, objective functions and constraints the conference stands for an intensive exchange between theory and practice and the presentation of case studies for all transport modes and gives a discussion forum for control engineers, computer scientists, mathematicians and other researchers and practitioners. The present book comprises fifty short papers accepted for presentation at the Third Edition of the conference. All submissions have undergone intensive reviews by the organisers of the special sessions, the members of the scientific and technical advisory committees and further external experts in the field. Like the conference itself the proceedings are structured in twelve streams: the more model-oriented streams of Road-Bound Public Transport Management, Modelling and Control of Urban Traffic Flow, Railway Traffic Management in four different sessions, Air Traffic Management, Water Traffic and Traffic and Transit Assignment, as well as the technology-oriented streams of Floating Car Data, Localisation Technologies for Intelligent Transportation Systems and Image Processing in Transportation. With this broad range of topics this book will be of interest to a number of groups: ITS experts in research and industry, students of transport and control engineering, operations research and computer science. The case studies will also be of interest for transport operators and members of traffic administration

    Scalable Algorithms for Outlier Detection

    No full text
    Outlier detection is an important problem for the data mining community as outliers often embody potentially new and valuable information. Nowadays, in the face of exponential growth in data generation, extracting outliers from such massive data sets is a non-trivial task and requires the design and implementation of new scalable algorithms which is the main focus of the thesis. More specifically, we make the following contributions: We propose a new algorithm for detecting emerging outliers in traffic data by extending the Likelihood Ratio Test Statistics (LRT) framework. We also propose a general and efficient pattern mining approach for spatio-temporal outlier detection that is based on our statistical models. We propose a unified parallel approach for LRT computation in GPGPU, multi-core and cloud cluster environments. We also present new algorithmic techniques for computing the Likelihood Ratio Test (LRT) in parallel for a large spatial data grid by utilizing these distributed architectures. As a separate contribution, we present novel approaches which simultaneously perform clustering and outlier detection without specifying the number of clusters. These methods are formulated as an integer programming optimisation task

    Scalable Algorithms for Outlier Detection

    Get PDF
    Outlier detection is an important problem for the data mining community as outliers often embody potentially new and valuable information. Nowadays, in the face of exponential growth in data generation, extracting outliers from such massive data sets is a non-trivial task and requires the design and implementation of new scalable algorithms which is the main focus of the thesis. More specifically, we make the following contributions: We propose a new algorithm for detecting emerging outliers in traffic data by extending the Likelihood Ratio Test Statistics (LRT) framework. We also propose a general and efficient pattern mining approach for spatio-temporal outlier detection that is based on our statistical models. We propose a unified parallel approach for LRT computation in GPGPU, multi-core and cloud cluster environments. We also present new algorithmic techniques for computing the Likelihood Ratio Test (LRT) in parallel for a large spatial data grid by utilizing these distributed architectures. As a separate contribution, we present novel approaches which simultaneously perform clustering and outlier detection without specifying the number of clusters. These methods are formulated as an integer programming optimisation task
    corecore