479 research outputs found

    The Parallelism Motifs of Genomic Data Analysis

    Get PDF
    Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

    Synergizing Roadway Infrastructure Investment with Digital Infrastructure for Infrastructure-Based Connected Vehicle Applications: Review of Current Status and Future Directions

    Get PDF
    The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.The safety, mobility, environmental and economic benefits of Connected and Autonomous Vehicles (CAVs) are potentially dramatic. However, realization of these benefits largely hinges on the timely upgrading of the existing transportation system. CAVs must be enabled to send and receive data to and from other vehicles and drivers (V2V communication) and to and from infrastructure (V2I communication). Further, infrastructure and the transportation agencies that manage it must be able to collect, process, distribute and archive these data quickly, reliably, and securely. This paper focuses on current digital roadway infrastructure initiatives and highlights the importance of including digital infrastructure investment alongside more traditional infrastructure investment to keep up with the auto industry's push towards this real time communication and data processing capability. Agencies responsible for transportation infrastructure construction and management must collaborate, establishing national and international platforms to guide the planning, deployment and management of digital infrastructure in their jurisdictions. This will help create standardized interoperable national and international systems so that CAV technology is not deployed in a haphazard and uncoordinated manner

    A Comprehensive Survey on Distributed Training of Graph Neural Networks

    Full text link
    Graph neural networks (GNNs) have been demonstrated to be a powerful algorithmic model in broad application fields for their effectiveness in learning over graphs. To scale GNN training up for large-scale and ever-growing graphs, the most promising solution is distributed training which distributes the workload of training across multiple computing nodes. At present, the volume of related research on distributed GNN training is exceptionally vast, accompanied by an extraordinarily rapid pace of publication. Moreover, the approaches reported in these studies exhibit significant divergence. This situation poses a considerable challenge for newcomers, hindering their ability to grasp a comprehensive understanding of the workflows, computational patterns, communication strategies, and optimization techniques employed in distributed GNN training. As a result, there is a pressing need for a survey to provide correct recognition, analysis, and comparisons in this field. In this paper, we provide a comprehensive survey of distributed GNN training by investigating various optimization techniques used in distributed GNN training. First, distributed GNN training is classified into several categories according to their workflows. In addition, their computational patterns and communication patterns, as well as the optimization techniques proposed by recent work are introduced. Second, the software frameworks and hardware platforms of distributed GNN training are also introduced for a deeper understanding. Third, distributed GNN training is compared with distributed training of deep neural networks, emphasizing the uniqueness of distributed GNN training. Finally, interesting issues and opportunities in this field are discussed.Comment: To Appear in Proceedings of the IEE

    Resiliency in numerical algorithm design for extreme scale simulations

    Get PDF
    This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale Simulations’ held March 1–6, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high computation speeds at the cost of involving an enormous amount of resources and costs. A typical large-scale computation running for 48 h on a system consuming 20 MW, as predicted for exascale systems, would consume a million kWh, corresponding to about 100k Euro in energy cost for executing 1023 floating-point operations. It is clearly unacceptable to lose the whole computation if any of the several million parallel processes fails during the execution. Moreover, if a single operation suffers from a bit-flip error, should the whole computation be declared invalid? What about the notion of reproducibility itself: should this core paradigm of science be revised and refined for results that are obtained by large-scale simulation? Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? While the analysis of use cases can help understand the particular reliability requirements, the construction of remedies is currently wide open. One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. These ideas constituted an essential topic of the seminar. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. This article gathers a broad range of perspectives on the role of algorithms, applications and systems in achieving resilience for extreme scale simulations. The ultimate goal is to spark novel ideas and encourage the development of concrete solutions for achieving such resilience holistically.Peer Reviewed"Article signat per 36 autors/es: Emmanuel Agullo, Mirco Altenbernd, Hartwig Anzt, Leonardo Bautista-Gomez, Tommaso Benacchio, Luca Bonaventura, Hans-Joachim Bungartz, Sanjay Chatterjee, Florina M. Ciorba, Nathan DeBardeleben, Daniel Drzisga, Sebastian Eibl, Christian Engelmann, Wilfried N. Gansterer, Luc Giraud, Dominik G ̈oddeke, Marco Heisig, Fabienne Jezequel, Nils Kohl, Xiaoye Sherry Li, Romain Lion, Miriam Mehl, Paul Mycek, Michael Obersteiner, Enrique S. Quintana-Ortiz, Francesco Rizzi, Ulrich Rude, Martin Schulz, Fred Fung, Robert Speck, Linda Stals, Keita Teranishi, Samuel Thibault, Dominik Thonnes, Andreas Wagner and Barbara Wohlmuth"Postprint (author's final draft

    Weiterentwicklung analytischer Datenbanksysteme

    Get PDF
    This thesis contributes to the state of the art in analytical database systems. First, we identify and explore extensions to better support analytics on event streams. Second, we propose a novel polygon index to enable efficient geospatial data processing in main memory. Third, we contribute a new deep learning approach to cardinality estimation, which is the core problem in cost-based query optimization.Diese Arbeit trägt zum aktuellen Forschungsstand von analytischen Datenbanksystemen bei. Wir identifizieren und explorieren Erweiterungen um Analysen auf Eventströmen besser zu unterstützen. Wir stellen eine neue Indexstruktur für Polygone vor, die eine effiziente Verarbeitung von Geodaten im Hauptspeicher ermöglicht. Zudem präsentieren wir einen neuen Ansatz für Kardinalitätsschätzungen mittels maschinellen Lernens

    Multi-Factor Authentication: A Survey

    Get PDF
    Today, digitalization decisively penetrates all the sides of the modern society. One of the key enablers to maintain this process secure is authentication. It covers many different areas of a hyper-connected world, including online payments, communications, access right management, etc. This work sheds light on the evolution of authentication systems towards Multi-Factor Authentication (MFA) starting from Single-Factor Authentication (SFA) and through Two-Factor Authentication (2FA). Particularly, MFA is expected to be utilized for human-to-everything interactions by enabling fast, user-friendly, and reliable authentication when accessing a service. This paper surveys the already available and emerging sensors (factor providers) that allow for authenticating a user with the system directly or by involving the cloud. The corresponding challenges from the user as well as the service provider perspective are also reviewed. The MFA system based on reversed Lagrange polynomial within Shamir’s Secret Sharing (SSS) scheme is further proposed to enable more flexible authentication. This solution covers the cases of authenticating the user even if some of the factors are mismatched or absent. Our framework allows for qualifying the missing factors by authenticating the user without disclosing sensitive biometric data to the verification entity. Finally, a vision of the future trends in MFA is discussed.Peer reviewe

    Review of Recent Trends

    Get PDF
    This work was partially supported by the European Regional Development Fund (FEDER), through the Regional Operational Programme of Centre (CENTRO 2020) of the Portugal 2020 framework, through projects SOCA (CENTRO-01-0145-FEDER-000010) and ORCIP (CENTRO-01-0145-FEDER-022141). Fernando P. Guiomar acknowledges a fellowship from “la Caixa” Foundation (ID100010434), code LCF/BQ/PR20/11770015. Houda Harkat acknowledges the financial support of the Programmatic Financing of the CTS R&D Unit (UIDP/00066/2020).MIMO-OFDM is a key technology and a strong candidate for 5G telecommunication systems. In the literature, there is no convenient survey study that rounds up all the necessary points to be investigated concerning such systems. The current deeper review paper inspects and interprets the state of the art and addresses several research axes related to MIMO-OFDM systems. Two topics have received special attention: MIMO waveforms and MIMO-OFDM channel estimation. The existing MIMO hardware and software innovations, in addition to the MIMO-OFDM equalization techniques, are discussed concisely. In the literature, only a few authors have discussed the MIMO channel estimation and modeling problems for a variety of MIMO systems. However, to the best of our knowledge, there has been until now no review paper specifically discussing the recent works concerning channel estimation and the equalization process for MIMO-OFDM systems. Hence, the current work focuses on analyzing the recently used algorithms in the field, which could be a rich reference for researchers. Moreover, some research perspectives are identified.publishersversionpublishe

    A Systematic Review of the State of Cyber-Security in Water Systems

    Get PDF
    Critical infrastructure systems are evolving from isolated bespoke systems to those that use general-purpose computing hosts, IoT sensors, edge computing, wireless networks and artificial intelligence. Although this move improves sensing and control capacity and gives better integration with business requirements, it also increases the scope for attack from malicious entities that intend to conduct industrial espionage and sabotage against these systems. In this paper, we review the state of the cyber-security research that is focused on improving the security of the water supply and wastewater collection and treatment systems that form part of the critical national infrastructure. We cover the publication statistics of the research in this area, the aspects of security being addressed, and future work required to achieve better cyber-security for water systems
    corecore