459 research outputs found

    Aspects of practical implementations of PRAM algorithms

    Get PDF
    The PRAM is a shared memory model of parallel computation which abstracts away from inessential engineering details. It provides a very simple architecture independent model and provides a good programming environment. Theoreticians of the computer science community have proved that it is possible to emulate the theoretical PRAM model using current technology. Solutions have been found for effectively interconnecting processing elements, for routing data on these networks and for distributing the data among memory modules without hotspots. This thesis reviews this emulation and the possibilities it provides for large scale general purpose parallel computation. The emulation employs a bridging model which acts as an interface between the actual hardware and the PRAM model. We review the evidence that such a scheme crn achieve scalable parallel performance and portable parallel software and that PRAM algorithms can be optimally implemented on such practical models. In the course of this review we presented the following new results: 1. Concerning parallel approximation algorithms, we describe an NC algorithm for finding an approximation to a minimum weight perfect matching in a complete weighted graph. The algorithm is conceptually very simple and it is also the first NC-approximation algorithm for the task with a sub-linear performance ratio. 2. Concerning graph embedding, we describe dense edge-disjoint embeddings of the complete binary tree with n leaves in the following n-node communication networks: the hypercube, the de Bruijn and shuffle-exchange networks and the 2-dimcnsional mesh. In the embeddings the maximum distance from a leaf to the root of the tree is asymptotically optimally short. The embeddings facilitate efficient implementation of many PRAM algorithms on networks employing these graphs as interconnection networks. 3. Concerning bulk synchronous algorithmics, we describe scalable transportable algorithms for the following three commonly required types of computation; balanced tree computations. Fast Fourier Transforms and matrix multiplications

    Decompose and Conquer: Addressing Evasive Errors in Systems on Chip

    Full text link
    Modern computer chips comprise many components, including microprocessor cores, memory modules, on-chip networks, and accelerators. Such system-on-chip (SoC) designs are deployed in a variety of computing devices: from internet-of-things, to smartphones, to personal computers, to data centers. In this dissertation, we discuss evasive errors in SoC designs and how these errors can be addressed efficiently. In particular, we focus on two types of errors: design bugs and permanent faults. Design bugs originate from the limited amount of time allowed for design verification and validation. Thus, they are often found in functional features that are rarely activated. Complete functional verification, which can eliminate design bugs, is extremely time-consuming, thus impractical in modern complex SoC designs. Permanent faults are caused by failures of fragile transistors in nano-scale semiconductor manufacturing processes. Indeed, weak transistors may wear out unexpectedly within the lifespan of the design. Hardware structures that reduce the occurrence of permanent faults incur significant silicon area or performance overheads, thus they are infeasible for most cost-sensitive SoC designs. To tackle and overcome these evasive errors efficiently, we propose to leverage the principle of decomposition to lower the complexity of the software analysis or the hardware structures involved. To this end, we present several decomposition techniques, specific to major SoC components. We first focus on microprocessor cores, by presenting a lightweight bug-masking analysis that decomposes a program into individual instructions to identify if a design bug would be masked by the program's execution. We then move to memory subsystems: there, we offer an efficient memory consistency testing framework to detect buggy memory-ordering behaviors, which decomposes the memory-ordering graph into small components based on incremental differences. We also propose a microarchitectural patching solution for memory subsystem bugs, which augments each core node with a small distributed programmable logic, instead of including a global patching module. In the context of on-chip networks, we propose two routing reconfiguration algorithms that bypass faulty network resources. The first computes short-term routes in a distributed fashion, localized to the fault region. The second decomposes application-aware routing computation into simple routing rules so to quickly find deadlock-free, application-optimized routes in a fault-ridden network. Finally, we consider general accelerator modules in SoC designs. When a system includes many accelerators, there are a variety of interactions among them that must be verified to catch buggy interactions. To this end, we decompose such inter-module communication into basic interaction elements, which can be reassembled into new, interesting tests. Overall, we show that the decomposition of complex software algorithms and hardware structures can significantly reduce overheads: up to three orders of magnitude in the bug-masking analysis and the application-aware routing, approximately 50 times in the routing reconfiguration latency, and 5 times on average in the memory-ordering graph checking. These overhead reductions come with losses in error coverage: 23% undetected bug-masking incidents, 39% non-patchable memory bugs, and occasionally we overlook rare patterns of multiple faults. In this dissertation, we discuss the ideas and their trade-offs, and present future research directions.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147637/1/doowon_1.pd

    Multiscale Markov Decision Problems: Compression, Solution, and Transfer Learning

    Full text link
    Many problems in sequential decision making and stochastic control often have natural multiscale structure: sub-tasks are assembled together to accomplish complex goals. Systematically inferring and leveraging hierarchical structure, particularly beyond a single level of abstraction, has remained a longstanding challenge. We describe a fast multiscale procedure for repeatedly compressing, or homogenizing, Markov decision processes (MDPs), wherein a hierarchy of sub-problems at different scales is automatically determined. Coarsened MDPs are themselves independent, deterministic MDPs, and may be solved using existing algorithms. The multiscale representation delivered by this procedure decouples sub-tasks from each other and can lead to substantial improvements in convergence rates both locally within sub-problems and globally across sub-problems, yielding significant computational savings. A second fundamental aspect of this work is that these multiscale decompositions yield new transfer opportunities across different problems, where solutions of sub-tasks at different levels of the hierarchy may be amenable to transfer to new problems. Localized transfer of policies and potential operators at arbitrary scales is emphasized. Finally, we demonstrate compression and transfer in a collection of illustrative domains, including examples involving discrete and continuous statespaces.Comment: 86 pages, 15 figure

    Scalable fault management architecture for dynamic optical networks : an information-theoretic approach

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.MIT Barker Engineering Library copy: printed in pages.Also issued printed in pages.Includes bibliographical references (leaves 255-262).All-optical switching, in place of electronic switching, of high data-rate lightpaths at intermediate nodes is one of the key enabling technologies for economically scalable future data networks. This replacement of electronic switching with optical switching at intermediate nodes, however, presents new challenges for fault detection and localization in reconfigurable all-optical networks. Presently, fault detection and localization techniques, as implemented in SONET/G.709 networks, rely on electronic processing of parity checks at intermediate nodes. If similar techniques are adapted to all-optical reconfigurable networks, optical signals need to be tapped out at intermediate nodes for parity checks. This additional electronic processing would break the all-optical transparency paradigm and thus significantly diminish the cost advantages of all-optical networks. In this thesis, we propose new fault-diagnosis approaches specifically tailored to all-optical networks, with an objective of keeping the diagnostic capital expenditure and the diagnostic operation effort low. Instead of the aforementioned passive monitoring paradigm based on parity checks, we propose a proactive lightpath probing paradigm: optical probing signals are sent along a set of lightpaths in the network, and network state (i.e., failure pattern) is then inferred from testing results of this set of end-to-end lightpath measurements. Moreover, we assume that a subset of network nodes (up to all the nodes) is equipped with diagnostic agents - including both transmitters/receivers for probe transmission/detection and software processes for probe management to perform fault detection and localization. The design objectives of this proposed proactive probing paradigm are two folded: i) to minimize the number of lightpath probes to keep the diagnostic operational effort low, and ii) to minimize the number of diagnostic hardware to keep the diagnostic capital expenditure low.(cont.) The network fault-diagnosis problem can be mathematically modeled with a group testing-over-graphs framework. In particular, the network is abstracted as a graph in which the failure status of each node/link is modeled with a random variable (e.g. Bernoulli distribution). A probe over any path in the graph results in a value, defined as the probe syndrome, which is a function of all the random variables associated in that path. A network failure pattern is inferred through a set of probe syndromes resulting from a set of optimally chosen probes. This framework enriches the traditional group-testing problem by introducing a topological structure, and can be extended to model many other network-monitoring problems (e.g., packet delay, packet drop ratio, noise and etc) by choosing appropriate state variables. Under the group-testing-over-graphs framework with a probabilistic failure model, we initiate an information-theoretic approach to minimizing the average number of lightpath probes to identify all possible network failure patterns. Specifically, we have established an isomorphic mapping between the fault-diagnosis problem in network management and the source-coding problem in Information Theory. This mapping suggests that the minimum average number of lightpath probes required is lower bounded by the information entropy of the network state and efficient source-coding algorithms (e.g. the run-length code) can be translated into scalable fault-diagnosis schemes under some additional probe feasibility constraint. Our analytical and numerical investigations yield a guideline for designing scalable fault-diagnosis algorithms: each probe should provide approximately 1-bit of state information, and thus the total number of probes required is approximately equal to the entropy of the network state.(cont.) To address the hardware cost of diagnosis, we also developed a probabilistic analysis framework to characterize the trade-off between hardware cost (i.e., the number of nodes equipped with Tx/Rx pairs) and diagnosis capability (i.e., the probability of successful failure detection and localization). Our results suggest that, for practical situations, the hardware cost can be reduced significantly by accepting a small amount of uncertainty about the failure status.by Yonggang Wen.Ph.D

    Design Space Exploration and Resource Management of Multi/Many-Core Systems

    Get PDF
    The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends

    Exploitation of information propagation patterns in social sensing

    Get PDF
    Online social media presents new opportunity for sensing the physical world. The sensors are essentially human, who share information in the broadcast social media. Such human sensors impose challenges like influence, bias, polarization, and data overload, unseen in the traditional sensor network. This dissertation addresses the aforementioned challenges by exploiting the propagation or prefential attachment patterns of the human sensors to distill a factual view of the events transpiring in the physical world. Our first contribution explores the correlated errors caused by the dependent sources. When people follow others, they are prone to broadcast information with unknown provenance. We show that using admission control mechanism to select an independent set of sensors improves the quality of reconstruction. The next contribution explores a different kind of correlated error caused by polarization and bias. During events related to conflict or disagreement, people take sides, and take a selective or preferential approach when broadcasting information. For example, a source might be less credible when it shares information conforming to its own bias. We present a maximum-likelihood estimation model to reconstruct the factual information in such cases, given the individual bias of the sources are already known. Our next two contributions relate to modeling polarization and unveiling polarization using maximum-likelihood and matrix factorization based mechanisms. These mechanisms allow us to automate the process of separating polarized content, and obtain a more faithful view of the events being sensed. Finally, we design and implement `SocialTrove', a summarization service that continuously execute in the cloud, as a platform to compute the reconstructions at scale. Our contributions have been integrated with `Apollo Social Sensing Toolkit', which builds a pipeline to collect, summarize, and analyze information from Twitter, and serves more than 40 users

    Proceedings of the 21st Conference on Formal Methods in Computer-Aided Design – FMCAD 2021

    Get PDF
    The Conference on Formal Methods in Computer-Aided Design (FMCAD) is an annual conference on the theory and applications of formal methods in hardware and system verification. FMCAD provides a leading forum to researchers in academia and industry for presenting and discussing groundbreaking methods, technologies, theoretical results, and tools for reasoning formally about computing systems. FMCAD covers formal aspects of computer-aided system design including verification, specification, synthesis, and testing

    Switching considerations in storage networks.

    Get PDF
    by Leung Yiu Tong.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 96-98).Abstracts in English and Chinese.Chapter 1. --- Introduction --- p.1Chapter 1.1 --- Motivation --- p.1Chapter 1.2 --- Thesis Organization --- p.3Chapter 2. --- Storage Network Fundamentals --- p.4Chapter 2.1 --- Storage Network Topology --- p.4Chapter 2.1.1 --- Direct Attached Storage (DAS) --- p.5Chapter 2.1.2 --- Network Attached Storage (NAS) --- p.7Chapter 2.1.3 --- Storage Area Network (SAN) --- p.9Chapter 2.1.3.1 --- SAN and the Fibre Channel Protocol --- p.11Chapter 2.1.4 --- Summary on Storage Network Topology --- p.12Chapter 2.2 --- Storage Protocol --- p.15Chapter 2.2.1 --- Fibre Channel --- p.15Chapter 2.2.1.1 --- Fibre Channel over IP (FCIP) --- p.17Chapter 2.2.1.2 --- Internet Fibre Channel Protocol (iFCP) --- p.19Chapter 2.2.2 --- Internet SCSI (iSCSI) --- p.20Chapter 2.2.3 --- InfiniBand --- p.22Chapter 2.2.4 --- Review on Storage Network Protocol --- p.25Chapter 2.3 --- Standard Organization --- p.27Chapter 2.4 --- Summary --- p.28Chapter 3. --- Switching Design for Storage Networks --- p.30Chapter 3.1. --- Shared Bus Design --- p.32Chapter 3.2. --- Time Division Switch --- p.36Chapter 3.3. --- Share Buffer Memory Switch --- p.37Chapter 3.3.1 --- Parallel Memory Array --- p.40Chapter 3.3.2 --- Distributive Storage --- p.43Chapter 3.4. --- Crossbar Switch --- p.45Chapter 3.4.1 --- Arbitrated Crossbar vs. Buffered Crossbar --- p.46Chapter 3.4.1.1 --- Arbitrated Crossbar Switch --- p.47Chapter 3.4.1.2 --- Buffered Crossbar Switch --- p.48Chapter 3.4.2 --- Switch Scheduling --- p.49Chapter 3.4.2.1 --- Bipartite Matching --- p.50Chapter 3.4.2.2 --- Token-based Distributive Scheduling --- p.53Chapter 3.4.2.3 --- Resource Counting using Semaphore --- p.56Chapter 3.5. --- Algebraic Switches --- p.60Chapter 3.5.1 --- Switching by Conditionally Nonblocking Properties --- p.61Chapter 3.5.2 --- Self-Routing Mechanism with Zero-Bit Buffering --- p.64Chapter 3.5.3 --- Multistage Interconnection of Self-routing Concentrators --- p.69Chapter 3.6. --- Summary --- p.73Chapter 4. --- Investigating Switching Issue in Storage Networks --- p.74Chapter 4.1 --- Choosing a Suitable Switch --- p.74Chapter 4.2 --- Quality of Service (QoS) --- p.76Chapter 4.3 --- Multicasting --- p.77Chapter 4.3.1 --- Crossbar Switch --- p.78Chapter 4.3.2 --- Shared-Buffer Memory Switches --- p.80Chapter 4.3.3 --- Algebraic Switch --- p.82Chapter 4.3.4 --- Application on Multicast Transmission --- p.86Chapter 4.4 --- Load Balancing Mechanism --- p.87Chapter 4.5 --- Optimization on Storage Utilization --- p.91Chapter 4.6 --- Summary --- p.93Chapter 5. --- Conclusion and Summary of Original Contributions --- p.9

    19th SC@RUG 2022 proceedings 2021-2022

    Get PDF
    • …
    corecore