805 research outputs found

    WebWave: Globally Load Balanced Fully Distributed Caching of Hot Published Documents

    Full text link
    Document publication service over such a large network as the Internet challenges us to harness available server and network resources to meet fast growing demand. In this paper, we show that large-scale dynamic caching can be employed to globally minimize server idle time, and hence maximize the aggregate server throughput of the whole service. To be efficient, scalable and robust, a successful caching mechanism must have three properties: (1) maximize the global throughput of the system, (2) find cache copies without recourse to a directory service, or to a discovery protocol, and (3) be completely distributed in the sense of operating only on the basis of local information. In this paper, we develop a precise definition, which we call tree load-balance (TLB), of what it means for a mechanism to satisfy these three goals. We present an algorithm that computes TLB off-line, and a distributed protocol that induces a load distribution that converges quickly to a TLB one. Both algorithms place cache copies of immutable documents, on the routing tree that connects the cached document's home server to its clients, thus enabling requests to stumble on cache copies en route to the home server.Harvard University; The Saudi Cultural Mission to the U.S.A

    The liquid model load balancing method

    Get PDF

    Local load balancing according to a simple liquid model

    Get PDF

    Embedded dynamic programming networks for networks-on-chip

    Get PDF
    PhD ThesisRelentless technology downscaling and recent technological advancements in three dimensional integrated circuit (3D-IC) provide a promising prospect to realize heterogeneous system-on-chip (SoC) and homogeneous chip multiprocessor (CMP) based on the networks-onchip (NoCs) paradigm with augmented scalability, modularity and performance. In many cases in such systems, scheduling and managing communication resources are the major design and implementation challenges instead of the computing resources. Past research efforts were mainly focused on complex design-time or simple heuristic run-time approaches to deal with the on-chip network resource management with only local or partial information about the network. This could yield poor communication resource utilizations and amortize the benefits of the emerging technologies and design methods. Thus, the provision for efficient run-time resource management in large-scale on-chip systems becomes critical. This thesis proposes a design methodology for a novel run-time resource management infrastructure that can be realized efficiently using a distributed architecture, which closely couples with the distributed NoC infrastructure. The proposed infrastructure exploits the global information and status of the network to optimize and manage the on-chip communication resources at run-time. There are four major contributions in this thesis. First, it presents a novel deadlock detection method that utilizes run-time transitive closure (TC) computation to discover the existence of deadlock-equivalence sets, which imply loops of requests in NoCs. This detection scheme, TC-network, guarantees the discovery of all true-deadlocks without false alarms in contrast to state-of-the-art approximation and heuristic approaches. Second, it investigates the advantages of implementing future on-chip systems using three dimensional (3D) integration and presents the design, fabrication and testing results of a TC-network implemented in a fully stacked three-layer 3D architecture using a through-silicon via (TSV) complementary metal-oxide semiconductor (CMOS) technology. Testing results demonstrate the effectiveness of such a TC-network for deadlock detection with minimal computational delay in a large-scale network. Third, it introduces an adaptive strategy to effectively diffuse heat throughout the three dimensional network-on-chip (3D-NoC) geometry. This strategy employs a dynamic programming technique to select and optimize the direction of data manoeuvre in NoC. It leads to a tool, which is based on the accurate HotSpot thermal model and SystemC cycle accurate model, to simulate the thermal system and evaluate the proposed approach. Fourth, it presents a new dynamic programming-based run-time thermal management (DPRTM) system, including reactive and proactive schemes, to effectively diffuse heat throughout NoC-based CMPs by routing packets through the coolest paths, when the temperature does not exceed chip’s thermal limit. When the thermal limit is exceeded, throttling is employed to mitigate heat in the chip and DPRTM changes its course to avoid throttled paths and to minimize the impact of throttling on chip performance. This thesis enables a new avenue to explore a novel run-time resource management infrastructure for NoCs, in which new methodologies and concepts are proposed to enhance the on-chip networks for future large-scale 3D integration.Iraqi Ministry of Higher Education and Scientific Research (MOHESR)

    Silicon nanowire field-effect transistors for the detection of proteins

    Full text link
    In this dissertation I present results on our efforts to increase the sensitivity and selectivity of silicon nanowire ion-sensitive field-effect transistors for the detection of biomarkers, as well as a novel method for wireless power transfer based on metamaterial rectennas for their potential use as implantable sensors. The sensing scheme is based on changes in the conductance of the semiconducting nanowires upon binding of charged entities to the surface, which induces a field-effect. Monitoring the differential conductance thus provides information of the selective binding of biological molecules of interest to previously covalently linked counterparts on the nanowire surface. In order to improve on the performance of the nanowire sensing, we devised and fabricated a nanowire Wheatstone bridge, which allows canceling out of signal drift due to thermal fluctuations and dynamics of fluid flow. We showed that balancing the bridge significantly improves the signal-to-noise ratio. Further, we demonstrated the sensing of novel melanoma biomarker TROY at clinically relevant concentrations and distinguished it from nonspecific binding by comparing the reaction kinetics. For increased sensitivity, an amplification method was employed using an enzyme which catalyzes a signal-generating reaction by changing the redox potential of a redox pair. In addition, we investigated the electric double layer, which forms around charges in an electrolytic solution. It causes electrostatic screening of the proteins of interest, which puts a fundamental limitation on the biomarker detection in solutions with high salt concentrations, such as blood. We solved the coupled Nernst-Planck and Poisson equations for the electrolyte under influence of an oscillating electric field and discovered oscillations of the counterion concentration at a characteristic frequency. In addition to exploring different methods for improved sensing capabilities, we studied an innovative method to supply power to implantable biosensors wirelessly, eliminating the need for batteries. A metamaterial split ring resonator is integrated with a rectifying circuit for efficient conversion of microwave radiation to direct electrical power. We studied the near-field behavior of this rectenna with respect to distance, polarization, power, and frequency. Using a 100 mW microwave power source, we demonstrated operating a simple silicon nanowire pH sensor with light indicator

    Topology Optimization with Text-Guided Stylization

    Full text link
    We propose an approach for the generation of topology-optimized structures with text-guided appearance stylization. This methodology aims to enrich the concurrent design of a structure's physical functionality and aesthetic appearance. Users can effortlessly input descriptive text to govern the style of the structure. Our system employs a hash-encoded neural network as the implicit structure representation backbone, which serves as the foundation for the co-optimization of structural mechanical performance, style, and connectivity, to ensure full-color, high-quality 3D-printable solutions. We substantiate the effectiveness of our system through extensive comparisons, demonstrations, and a 3D printing test

    Run-time thread management for large-scale distributed-memory multiprocessors

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1993.Includes bibliographical references (p. 214-216).by Daniel Nussbaum.Ph.D

    Solution of partial differential equations on vector and parallel computers

    Get PDF
    The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed

    Sparse matrix-vector multiplication on GPGPUs

    Get PDF
    The multiplication of a sparse matrix by a dense vector (SpMV) is a centerpiece of scientific computing applications: it is the essential kernel for the solution of sparse linear systems and sparse eigenvalue problems by iterative methods. The efficient implementation of the sparse matrix-vector multiplication is therefore crucial and has been the subject of an immense amount of research, with interest renewed with every major new trend in high performance computing architectures. The introduction of General Purpose Graphics Processing Units (GPGPUs) is no exception, and many articles have been devoted to this problem. With this paper we provide a review of the techniques for implementing the SpMV kernel on GPGPUs that have appeared in the literature of the last few years. We discuss the issues and trade-offs that have been encountered by the various researchers, and a list of solutions, organized in categories according to common features. We also provide a performance comparison across different GPGPU models and on a set of test matrices coming from various application domains
    • …
    corecore