16 research outputs found
Recommended from our members
Project Report on DOE Young Investigator Grant (Contract No. DE-FG02-02ER25525) Dynamic Scheduling and Fusion of Irregular Computation (August 15, 2002 to August 14, 2005)
Computer simulation has become increasingly important in many scientiï¬c disciplines, but its performance and scalability are severely limited by the memory throughput on todayâs computer systems. With the support of this grant, we ï¬rst designed training-based prediction, which accurately predicts the memory performance of large applications before their execution. Then we developed optimization techniques using dynamic computation fusion and large-scale data transformation. The research work has three major components. The ï¬rst is modeling and prediction of cache behav- ior. We have developed a new technique, which uses reuse distance information from training inputs then extracts a parameterized model of the programâs cache miss rates for any input size and for any size of fully associative cache. Using the model we have built a web-based tool using three dimensional visualization. The new model can help to build cost-effective computer systems, design better benchmark suites, and improve task scheduling on heterogeneous systems. The second component is global computation for improving cache performance. We have developed an algorithm for dynamic data partitioning using sampling theory and probability distribution. Recent work from a number of groups show that manual or semi-manual computation fusion has signiï¬cant beneï¬ts in physical, mechanical, and biological simulations as well as information retrieval and machine veriï¬cation. We have developed an au- tomatic tool that measures the potential of computation fusion. The new system can be used by high-performance application programmers to estimate the potential of locality improvement for a program before trying complex transformations for a speciï¬c cache system. The last component studies models of spatial locality and the problem of data layout. In scientiï¬c programs, most data are stored in arrays. Grand challenge problems such as hydrodynamics simulation and data mining may use an enormous number of data elements. To optimize the layout across multiple arrays, we have developed a formal model called reference afï¬nity. We collaborated with the IBM production compiler group and designed an efï¬cient compiler analysis that performs as well as data or code proï¬ling does. Based on these results, the IBM group has ï¬led a patent and is including this technique in their product compiler. A major part of the project is the development of software tools. We have developed web-based visu- alization for program locality. In addition, we have implemented a prototype of array regrouping in the IBM compiler. The full implementation is expected to come out of IBM in the near future and to beneï¬t scientiï¬c applications running on IBM supercomputers. We have also developed a test environment for studying the limit of computation fusion. Finally, our work has directly inï¬uenced the design of the Intel Itanium compiler. The project has strengthened the research relation between the PIâs group and groups in DoE labs. The PI was an invited speaker at the Center for Applied Scientiï¬c Computing Seminar Series at the early stage of the project. The question that the most audience was curious about was the limit of computation fusion, which has been studied in depth in this research. In addition, the seminar directly helped a group at Lawrence Livermore to achieve four times speedup on an important DoE code. The PI helped to organize a number of high-performance computing forums, including the founding of a workshop on memory system performance (MSP). In the past two years, one fourth of the papers in the workshop came from researchers in Lawrence Livermore, Argonne, Las Alamos, and Lawrence Berkeley national laboratories. The PI lectured frequently on DoE funded research. In a broader context, high performance computing is central to Americaâs scientiï¬c and economic stature in the world, and addresses many of the most scientiï¬cally and socially important problems of our day. This research has improved the programming support for a variety of computational paradigms, including dynamic mesh, hydrodynamics, molecular dynamics, multi-grid methods, matrix algebra, and sequential and parallel sorting. In the process, the PIâs group has developed and strengthened relationships with DoE laboratories and major hardware and software vendors
Media Pembelajaran untuk Metode Penjadwalan First Come First Serve-Ejecting Based Dynamic Scheduling (FCFS-EDS) untuk MPI Job dalam Sistem Grid
First Come First Serve-Ejecting Based Dynamic Scheduling (FCFS-EDS) adalah metode untuk menjadwalkan untuk MPI Job berdasarkan reservasi atau pemesanan. Message Passing Interface (MPI) adalah spesifikasi API yang mengijinkan aplikasi paralel untuk berkomunikasi dengan sesamanya dengan saling berkirim pesan. Penjadwalan FCFS-EDS masuk dalam kategori Flexible Advance Reservation, dinamik pada sudut pandang lojik. Oleh karena itu akan sulit dipahami dan dibayangkan karena hanya bersifat lojik saja. Untuk itulah perlu dibuat simulasi dari penjadwalan MPI job dengan metode FCFS-EDS, untuk membantu memahami proses penjadwalan MPI job dengan metode FCFS-EDS. Dengan simulasi ini pengguna dapat merubah-rubah parameter dalam penjadwalan MPI job dan melihat visualisasi dari penjadwalan MPI job dalam sudut pandang lojik, dan juga mengetahui bagaimana nantinya sebuah MPI job akan dieksekusi
Recommended from our members
A survey on Bluetooth multi-hop networks
Bluetooth was firstly announced in 1998. Originally designed as cable replacement connecting devices in a point-to-point fashion its high penetration arouses interest in its ad-hoc networking potential. This ad-hoc networking potential of Bluetooth is advertised for years - but until recently no actual products were available and less than a handful of real Bluetooth multi-hop network deployments were reported. The turnaround was triggered by the release of the Bluetooth Low Energy Mesh Profile which is unquestionable a great achievement but not well suited for all use cases of multi-hop networks. This paper surveys the tremendous work done on Bluetooth multi-hop networks during the last 20 years. All aspects are discussed with demands for a real world Bluetooth multi-hop operation in mind. Relationships and side effects of different topics for a real world implementation are explained. This unique focus distinguishes this survey from existing ones. Furthermore, to the best of the authors’ knowledge this is the first survey consolidating the work on Bluetooth multi-hop networks for classic Bluetooth technology as well as for Bluetooth Low Energy. Another individual characteristic of this survey is a synopsis of real world Bluetooth multi-hop network deployment efforts. In fact, there are only four reports of a successful establishment of a Bluetooth multi-hop network with more than 30 nodes and only one of them was integrated in a real world application - namely a photovoltaic power plant. © 2019 The Author
Analytical and Empirical Analysis of Countermeasures to Traffic Analysis Attacks
This paper studies countermeasures to traffic analysis attacks. A common strategy for such countermeasures is traffic padding. We consider systems where payload traffic may be padded to have either constant inter-arrival times or variable inter-arrival times for their packets. The adversary applies statistical recognition techniques to detect the payload traffic rates and may use statistical measures, such as sample mean, sample variance, or sample entropy, to perform such a detection. We evaluate quantitatively the ability of the adversary to make a correct detection. We derive closed-form formulas for the detection rate based on analytical models we establish. Extensive experiments were carried out to validate the system performance predicted by the analytical method. Based on the systematic evaluations, we develop design guidelines that allow a manager to properly configure a system in order to minimize the detection rate.
Performance Optimization With An Integrated View Of Compiler And Application Knowledge
Compiler optimization is a long-standing research field that enhances program performance with a set of rigorous code analyses and transformations. Traditional compiler optimization focuses on general programs or program structures without considering too much high-level application operations or data structure knowledge. In this thesis, we claim that an integrated view of the application and compiler is helpful to further improve program performance. Particularly, we study integrated optimization opportunities for three kinds of applications: irregular tree-based query processing systems such as B+ tree, security enhancement such as buffer overflow protection, and tensor/matrix-based linear algebra computation. The performance of B+ tree query processing is important for many applications, such as file systems and databases. Latch-free B+ tree query processing is efficient since the queries are processed in batches without locks. To avoid long latency, the batch size can not be very large. However, modern processors provide opportunities to process larger batches parallel with acceptable latency. From studying real-world data, we find that there are many redundant and unnecessary queries especially when the real-world data is highly skewed. We develop a query sequence transformation framework Qtrans to reduce the redundancies in queries by applying classic dataflow analysis to queries. To further confirm the effectiveness, we integrate Qtrans into an existing BSP-based B+ tree query processing system, PALM tree. The evaluations show that the throughput can be improved up to 16X. Heap overflows are still the most common vulnerabilities in C/C++ programs. Common approaches incur high overhead since it checks every memory access. By analyzing dozens of bugs, we find that all heap overflows are related to arrays. We only need to check array-related memory accesses. We propose Prober to efficiently detect and prevent heap overflows. It contains Prober-Static to identify the array-related allocations and Prober-Dynamic to protect objects at runtime. In this thesis, our contributions lie on the Prober-Static side. The key challenge is to correctly identify the array-related allocations. We propose a hybrid method. Some objects can be identified as array-related (or not) by static analysis. For the remaining ones, we instrument the basic allocation type size statically and then determine the real allocation size at runtime. The evaluations show Prober-Static is effective. Tensor algebra is widely used in many applications, such as machine learning and data analytics. Tensors representing real-world data are usually large and sparse. There are many sparse tensor storage formats, and the kernels are different with varied formats. These different kernels make performance optimization for sparse tensor algebra challenging. We propose a tensor algebra domain-specific language and a compiler to automatically generate kernels for sparse tensor algebra computations, called SPACe. This compiler supports a wide range of sparse tensor formats. To further improve the performance, we integrate the data reordering into SPACe to improve data locality. The evaluations show that the code generated by SPACe outperforms state-of-the-art sparse tensor algebra compilers
Private and censorship-resistant communication over public networks
Society’s increasing reliance on digital communication networks is creating unprecedented opportunities for wholesale
surveillance and censorship. This thesis investigates the use of public networks such as the Internet to build
robust, private communication systems that can resist monitoring and attacks by powerful adversaries such as national
governments.
We sketch the design of a censorship-resistant communication system based on peer-to-peer Internet overlays in which
the participants only communicate directly with people they know and trust. This ‘friend-to-friend’ approach protects
the participants’ privacy, but it also presents two significant challenges. The first is that, as with any peer-to-peer
overlay, the users of the system must collectively provide the resources necessary for its operation; some users might
prefer to use the system without contributing resources equal to those they consume, and if many users do so, the
system may not be able to survive.
To address this challenge we present a new game theoretic model of the problem of encouraging cooperation between
selfish actors under conditions of scarcity, and develop a strategy for the game that provides rational incentives for
cooperation under a wide range of conditions.
The second challenge is that the structure of a friend-to-friend overlay may reveal the users’ social relationships to
an adversary monitoring the underlying network. To conceal their sensitive relationships from the adversary, the
users must be able to communicate indirectly across the overlay in a way that resists monitoring and attacks by other
participants.
We address this second challenge by developing two new routing protocols that robustly deliver messages across
networks with unknown topologies, without revealing the identities of the communication endpoints to intermediate
nodes or vice versa. The protocols make use of a novel unforgeable acknowledgement mechanism that proves that a
message has been delivered without identifying the source or destination of the message or the path by which it was
delivered. One of the routing protocols is shown to be robust to attacks by malicious participants, while the other
provides rational incentives for selfish participants to cooperate in forwarding messages
Decentralized link analysis in peer-to-peer web search networks
Analyzing the authority or reputation of entities that are connected by a graph structure and ranking these entities is an important issue that arises in the Web, in Web 2.0 communities, and in other applications. The problem is typically addressed by computing the dominant eigenvector of a matrix that is suitably derived from the underlying graph, or by performing a full spectral decomposition of the matrix. Although such analyses could be performed by a centralized server, there are good reasons that suggest running theses computations in a decentralized manner across many peers, like scalability, privacy, censorship, etc. There exist a number of approaches for speeding up the analysis by partitioning the graph into disjoint fragments. However, such methods are not suitable for a peer-to-peer network, where overlap among the fragments might occur. In addition, peer-to-peer approaches need to consider network characteristics, such as peers unaware of other peers' contents, susceptibility to malicious attacks, and network dynamics (so-called churn). In this thesis we make the following major contributions. We present JXP, a decentralized algorithm for computing authority scores of entities distributed in a peer-to-peer (P2P) network that allows peers to have overlapping content and requires no a priori knowledge of other peers' content. We also show the benets of JXP in the Minerva distributed Web search engine. We present an extension of JXP, coined TrustJXP, that contains a reputation model in order to deal with misbehaving peers. We present another extension of JXP, that handles dynamics on peer-to-peer networks, as well as an algorithm for estimating the current number of entities in the network. This thesis also presents novel methods for embedding JXP in peer-to-peer networks and applications. We present an approach for creating links among peers, forming semantic overlay networks, where peers are free to decide which connections they create and which they want to avoid based on various usefulness estimators. We show how peer-to-peer applications, like the JXP algorithm, can greatly benet from these additional semantic relations.Die Berechnung von Autoritäts- oder Reputationswerten für Knoten eines Graphen, welcher verschiedene Entitäten verknüpft, ist von großem Interesse in Web-Anwendungen, z.B. in der Analyse von Hyperlinkgraphen, Web 2.0 Portalen, sozialen Netzen und anderen Anwendungen. Die Lösung des Problems besteht oftmals im Kern aus der Berechnung des dominanten Eigenvektors einer Matrix, die vom zugrunde liegenden Graphen abgeleitet wird. Obwohl diese Analysen in einer zentralisierten Art und Weise berechnet werden können, gibt es gute Gründe, diese Berechnungen auf mehrere Knoten eines Netzwerkes zu verteilen, insbesondere bezüglich Skalierbarkeit, Datenschutz und Zensur. In der Literatur finden sich einige Methoden, welche die Berechnung beschleunigen, indem der zugrunde liegende Graph in nicht überlappende Teilgraphen zerlegt wird. Diese Annahme ist in Peer-to-Peer-System allerdings nicht realistisch, da die einzelnen Peers ihre Graphen in einer nicht synchronisierten Weise erzeugen, was inhärent zu starken oder weniger starken Überlappungen der Graphen führt. Darüber hinaus sind Peer-to-Peer-Systeme per Definition ein lose gekoppelter Zusammenschluss verschiedener Benutzer (Peers), verteilt im ganzen Internet, so dass Netzwerkcharakteristika, Netzwerkdynamik und mögliche Attacken krimineller Benutzer unbedingt berücksichtigt werden müssen. In dieser Arbeit liefern wir die folgenden grundlegenden Beiträge. Wir präsentieren JXP, einen verteilten Algorithmus für die Berechnung von Autoritätsmaßen über Entitäten in einem Peer-to-Peer Netzwerk. Wir präsentieren Trust-JXP, eine Erweiterung von JXP, ausgestattet mit einem Modell zur Berechnung von Reputationswerten, die benutzt werden, um bösartig agierende Benutzer zu identizieren. Wir betrachten, wie JXP robust gegen Veränderungen des Netzwerkes gemacht werden kann und wie die Anzahl der verschiedenen Entitäten im Netzwerk effizient geschätzt werden kann. Darüber hinaus beschreiben wir in dieser Arbeit neuartige Ansätze, JXP in bestehende Peer-to-Peer-Netzwerke einzubinden. Wir präsentieren eine Methode, mit deren Hilfe Peers entscheiden können, welche Verbindungen zu anderen Peers von Nutzen sind und welche Verbindungen vermieden werden sollen. Diese Methode basiert auf verschiedenen Qualitätsindikatoren, und wir zeigen, wie Peer-to-Peer-Anwendungen, zum Beispiel JXP, von diesen zusätzlichen Relationen profitieren können