18 research outputs found

    HEC: Collaborative Research: SAM^2 Toolkit: Scalable and Adaptive Metadata Management for High-End Computing

    Get PDF
    The increasing demand for Exa-byte-scale storage capacity by high end computing applications requires a higher level of scalability and dependability than that provided by current file and storage systems. The proposal deals with file systems research for metadata management of scalable cluster-based parallel and distributed file storage systems in the HEC environment. It aims to develop a scalable and adaptive metadata management (SAM2) toolkit to extend features of and fully leverage the peak performance promised by state-of-the-art cluster-based parallel and distributed file storage systems used by the high performance computing community. There is a large body of research on data movement and management scaling, however, the need to scale up the attributes of cluster-based file systems and I/O, that is, metadata, has been underestimated. An understanding of the characteristics of metadata traffic, and an application of proper load-balancing, caching, prefetching and grouping mechanisms to perform metadata management correspondingly, will lead to a high scalability. It is anticipated that by appropriately plugging the scalable and adaptive metadata management components into the state-of-the-art cluster-based parallel and distributed file storage systems one could potentially increase the performance of applications and file systems, and help translate the promise and potential of high peak performance of such systems to real application performance improvements. The project involves the following components: 1. Develop multi-variable forecasting models to analyze and predict file metadata access patterns. 2. Develop scalable and adaptive file name mapping schemes using the duplicative Bloom filter array technique to enforce load balance and increase scalability 3. Develop decentralized, locality-aware metadata grouping schemes to facilitate the bulk metadata operations such as prefetching. 4. Develop an adaptive cache coherence protocol using a distributed shared object model for client-side and server-side metadata caching. 5. Prototype the SAM2 components into the state-of-the-art parallel virtual file system PVFS2 and a distributed storage data caching system, set up an experimental framework for a DOE CMS Tier 2 site at University of Nebraska-Lincoln and conduct benchmark, evaluation and validation studies

    Parallelization Strategies for Graph-Code-Based Similarity Search

    Get PDF
    The volume of multimedia assets in collections is growing exponentially, and the retrieval of information is becoming more complex. The indexing and retrieval of multimedia content is generally implemented by employing feature graphs. Feature graphs contain semantic information on multimedia assets. Machine learning can produce detailed semantic information on multimedia assets, reflected in a high volume of nodes and edges in the feature graphs. While increasing the effectiveness of the information retrieval results, the high level of detail and also the growing collections increase the processing time. Addressing this problem, Multimedia Feature Graphs (MMFGs) and Graph Codes (GCs) have been proven to be fast and effective structures for information retrieval. However, the huge volume of data requires more processing time. As Graph Code algorithms were designed to be parallelizable, different paths of parallelization can be employed to prove or evaluate the scalability options of Graph Code processing. These include horizontal and vertical scaling with the use of Graphic Processing Units (GPUs), Multicore Central Processing Units (CPUs), and distributed computing. In this paper, we show how different parallelization strategies based on Graph Codes can be combined to provide a significant improvement in efficiency. Our modeling work shows excellent scalability with a theoretical speedup of 16,711 on a top-of-the-line Nvidia H100 GPU with 16,896 cores. Our experiments with a mediocre GPU show that a speedup of 225 can be achieved and give credence to the theoretical speedup. Thus, Graph Codes provide fast and effective multimedia indexing and retrieval, even in billion-scale use cases

    Implementing generalized deep-copy in MPI

    Get PDF
    In this paper we introduce a framework for implementing deep copy on top of MPI. The process is initiated by passing just the root object of the dynamic data structure. Our framework takes care of all pointer traversal, communication, copying and reconstruction on receiving nodes. The benefit of our approach is that MPI users can deep copy complex dynamic data structures without the need to write bespoke communication or serialize / deserialize methods for each object. These methods can present a challenging implementation problem that can quickly become unwieldy to maintain when working with complex structured data. This paper demonstrates our generic implementation, which encapsulates both approaches. We analyze the approach with a variety of structures (trees, graphs (including complete graphs) and rings) and demonstrate that it performs comparably to hand written implementations, using a vastly simplified programming interface. We make the source code available completely as a convenient header file

    A comparison of statistical machine learning methods in heartbeat detection and classification

    Get PDF
    In health care, patients with heart problems require quick responsiveness in a clinical setting or in the operating theatre. Towards that end, automated classification of heartbeats is vital as some heartbeat irregularities are time consuming to detect. Therefore, analysis of electro-cardiogram (ECG) signals is an active area of research. The methods proposed in the literature depend on the structure of a heartbeat cycle. In this paper, we use interval and amplitude based features together with a few samples from the ECG signal as a feature vector. We studied a variety of classification algorithms focused especially on a type of arrhythmia known as the ventricular ectopic fibrillation (VEB). We compare the performance of the classifiers against algorithms proposed in the literature and make recommendations regarding features, sampling rate, and choice of the classifier to apply in a real-time clinical setting. The extensive study is based on the MIT-BIH arrhythmia database. Our main contribution is the evaluation of existing classifiers over a range sampling rates, recommendation of a detection methodology to employ in a practical setting, and extend the notion of a mixture of experts to a larger class of algorithms

    Eight Biennial Report : April 2005 – March 2007

    No full text

    Généralisation de l’analyse de performance décrémentale vers l’analyse différentielle

    Get PDF
    A crucial step in the process of application performance analysis is the accurate detection of program bottlenecks. A bottleneck is any event which contributes to extend the execution time. Determining their cause is important for application developpers as it enable them to detect code design and generation flaws.Bottleneck detection is becoming a difficult art. Techniques such as event counts,which succeeded to find bottlenecks easily in the past, became less efficient because of the increasing complexity of modern micro-processors, and because of the introduction of parallelism at several levels. Consequently, a real need for new analysis approaches is present in order to face these challenges.Our work focuses on performance analysis and bottleneck detection of computeintensive loops in scientific applications. We work on Decan, a performance analysis and bottleneck detection tool, which offers an interesting and promising approach called Decremental Analysis. The tool, which operates at binary level, is based on the idea of performing controlled modifications on the instructions of a loop, and comparing the new version (called variant) to the original one. The goal is to assess the cost of specific events, and thus the existence or not of bottlenecks.Our first contribution, consists of extending Decan with new variants that we designed, tested and validated. Based on these variants, we developed analysis methods which we used to characterize hot loops and find their bottlenecks. Welater, integrated the tool into a performance analysis methodology (Pamda) which coordinates several analysis tools in order to achieve a more efficient application performance analysis.Second, we introduce several improvements on the Decan tool. Techniquesdeveloped to preserve the control flow of the modified programs, allowed to use thetool on real applications instead of extracted kernels. Support for parallel programs(thread and process based) was also added. Finally, our tool primarily relying on execution time as the main concern for its analysis process, we study the opportunity of also using other hardware generated events, through a study of their stability, precision and overheadUne des étapes les plus cruciales dans le processus d’analyse des performances d’une application est la détection des goulets d’étranglement. Un goulet étant tout évènement qui contribue à l’allongement temps d’exécution, la détection de ses causes est importante pour les développeurs d’applications afin de comprendre les défauts de conception et de génération de code. Cependant, la détection de goulets devient un art difficile. Dans le passé, des techniques qui reposaient sur le comptage du nombre d’évènements, arrivaient facilement à trouver les goulets. Maintenant, la complexité accrue des micro-architectures modernes et l’introduction de plusieurs niveaux de parallélisme ont rendu ces techniques beaucoup moins efficaces. Par conséquent, il y a un réel besoin de réflexion sur de nouvelles approches.Notre travail porte sur le développement d’outils d’évaluation de performance des boucles de calculs issues d’applications scientifiques. Nous travaillons sur Decan, un outil d’analyse de performance qui présente une approche intéressante et prometteuse appelée l’Analyse Décrémentale. Decan repose sur l’idée d’effectuer des changements contrôlés sur les boucles du programme et de comparer la version obtenue (appelée variante) avec la version originale, permettant ainsi de détecter la présence ou pas de goulets d’étranglement.Tout d’abord, nous avons enrichi Decan avec de nouvelles variantes, que nous avons conçues, testées et validées. Ces variantes sont, par la suite, intégrées dans une analyse de performance poussée appelée l’Analyse Différentielle. Nous avons intégré l’outil et l’analyse dans une méthodologie d’analyse de performance plus globale appelée Pamda.Nous décrirons aussi les différents apports à l’outil Decan. Sont particulièrement détaillées les techniques de préservation des structures de contrôle du programme,ainsi que l’ajout du support pour les programmes parallèles.Finalement, nous effectuons une étude statistique qui permet de vérifier la possibilité d’utiliser des compteurs d’évènements, autres que le temps d’exécution, comme métriques de comparaison entre les variantes Deca

    Evaluating the Robustness of Resource Allocations Obtained through Performance Modeling with Stochastic Process Algebra

    Get PDF
    Recent developments in the field of parallel and distributed computing has led to a proliferation of solving large and computationally intensive mathematical, science, or engineering problems, that consist of several parallelizable parts and several non-parallelizable (sequential) parts. In a parallel and distributed computing environment, the performance goal is to optimize the execution of parallelizable parts of an application on concurrent processors. This requires efficient application scheduling and resource allocation for mapping applications to a set of suitable parallel processors such that the overall performance goal is achieved. However, such computational environments are often prone to unpredictable variations in application (problem and algorithm) and system characteristics. Therefore, a robustness study is required to guarantee a desired level of performance. Given an initial workload, a mapping of applications to resources is considered to be robust if that mapping optimizes execution performance and guarantees a desired level of performance in the presence of unpredictable perturbations at runtime. In this research, a stochastic process algebra, Performance Evaluation Process Algebra (PEPA), is used for obtaining resource allocations via a numerical analysis of performance modeling of the parallel execution of applications on parallel computing resources. The PEPA performance model is translated into an underlying mathematical Markov chain model for obtaining performance measures. Further, a robustness analysis of the allocation techniques is performed for finding a robustmapping from a set of initial mapping schemes. The numerical analysis of the performance models have confirmed similarity with the simulation results of earlier research available in existing literature. When compared to direct experiments and simulations, numerical models and the corresponding analyses are easier to reproduce, do not incur any setup or installation costs, do not impose any prerequisites for learning a simulation framework, and are not limited by the complexity of the underlying infrastructure or simulation libraries

    Multi-tasking scheduling for heterogeneous systems

    Get PDF
    Heterogeneous platforms play an increasingly important role in modern computer systems. They combine high performance with low power consumption. From mobiles to supercomputers, we see an increasing number of computer systems that are heterogeneous. The most well-known heterogeneous system, CPU+GPU platforms have been widely used in recent years. As they become more mainstream, serving multiple tasks from multiple users is an emerging challenge. A good scheduler can greatly improve performance. However, indiscriminately allocating tasks based on availability leads to poor performance. As modern GPUs have a large number of hardware resources, most tasks cannot efficiently utilize all of them. Concurrent task execution on GPU is a promising solution, however, indiscriminately running tasks in parallel causes a slowdown. This thesis focuses on scheduling OpenCL kernels. A runtime framework is developed to determine where to schedule OpenCL kernels. It predicts the best-fit device by using a machine learning-based classifier, then schedules the kernels accordingly to either CPU or GPU. To improve GPU utilization, a kernel merging approach is proposed. Kernels are merged if their predicted co-execution can provide better performance than sequential execution. A machine learning based classifier is developed to find the best kernel pairs for co-execution on GPU. Finally, a runtime framework is developed to schedule kernels separately on either CPU or GPU, and run kernels in pairs if their co-execution can improve performance. The approaches developed in this thesis significantly improve system performance and outperform all existing techniques

    Advances in Information Security and Privacy

    Get PDF
    With the recent pandemic emergency, many people are spending their days in smart working and have increased their use of digital resources for both work and entertainment. The result is that the amount of digital information handled online is dramatically increased, and we can observe a significant increase in the number of attacks, breaches, and hacks. This Special Issue aims to establish the state of the art in protecting information by mitigating information risks. This objective is reached by presenting both surveys on specific topics and original approaches and solutions to specific problems. In total, 16 papers have been published in this Special Issue
    corecore