129 research outputs found

    ENHANCING THE PERFORMANCE AND SECURITY OF ANONYMOUS COMMUNICATION NETWORKS

    Get PDF
    With the increasing importance of the Internet in our daily lives, the private information of millions of users is prone to more security risks. Users data are collected either for commercial purposes and sold by service providers to marketeers or political purposes and used to track people by governments, or even for personal purposes by hackers. Protecting online users privacy has become a more pressing matter over the years. To this end, anonymous communication networks were developed to serve this purpose. Tors anonymity network is one of the most widely used anonymity networks online; it consists of thousands of routers run by volunteers. Tor preserves the anonymity of its users by relaying the traffic through a number of routers (called onion routers) forming a circuit. Tor was mainly developed as a low-latency network to support interactive applications such as web browsing and messaging applications. However, due to some deficiencies in the original design of Tors network, the performance is affected to the point that interactive applications cannot tolerate it. In this thesis, we attempt to address a number of the performance-limiting issues in Tor networks design. Several researches proposed changes in the transport design to eliminate the effect of these problems and improve the performance of Tors network. In our work, we propose "QuicTor," an improvement to the transport layer of Tors network by using Googles protocol "QUIC" instead of TCP. QUIC was mainly developed to eliminate TCPs latency introduced from the handshaking delays and the head-of-line blocking problem. We provide an empirical evaluation of our proposed design and compare it to two other proposed designs, IMUX and PCTCP.We show that QuicTor significantly enhances the performance of Tors network. Tor was mainly developed as a low-latency network to support interactive web browsing and messaging applications. However, a considerable percentage of Tor traffic is consumed by bandwidth acquisitive applications such as BitTorrent. This results in an unfair allocation of the available bandwidth and significant degradation in the Quality-of-service (QoS) delivered to users. In this thesis, we present a QoS-aware deep reinforcement learning approach for Tors circuit scheduling (QDRL). We propose a design that coalesces the two scheduling levels originally presented in Tor and addresses it as a single resource allocation problem. We use the QoS requirements of different applications to set the weight of active circuits passing through a relay. Furthermore, we propose a set of approaches to achieve the optimal trade-off between system fairness and efficiency. We designed and implemented a reinforcement-learning-based scheduling approach (TRLS), a convex-optimization-based scheduling approach (CVX-OPT), and an average-rate-based proportionally fair heuristic (AR-PF). We also compared the proposed approaches with basic heuristics and with the implemented scheduler in Tor. We show that our reinforcement-learning-based approach (TRLS) achieved the highest QoS-aware fairness level with a resilient performance to the changes in an environment with a dynamic nature, such as the Tor networ

    BOOM: Broadcast Optimizations for On-chip Meshes

    Get PDF
    Future many-core chips will require an on-chip network that can support broadcasts and multicasts at good power-performance. A vanilla on-chip network would send multiple unicast packets for each broadcast packet, resulting in latency, throughput and power overheads. Recent research in on-chip multicast support has proposed forking of broadcast/multicast packets within the network at the router buffers, but these techniques are far from ideal, since they increase buffer occupancy which lowers throughput, and packets incur delay and power penalties at each router. In this work, we analyze an ideal broadcast mesh; show the substantial gaps between state-of-the-art multicast NoCs and the ideal; then propose BOOM, which comprises a WHIRL routing protocol that ideally load balances broadcast traffic, a mXbar multicast crossbar circuit that enables multicast traversal at similar energy-delay as unicasts, and speculative bypassing of buffering for multicast flits. Together, they enable broadcast packets to approach the delay, energy, and throughput of the ideal fabric. Our simulations show BOOM realizing an average network latency that is 5% off ideal, attaining 96% of ideal throughput, with energy consumption that is 9% above ideal. Evaluations using synthetic traffic show BOOM achieving a latency reduction of 61%, throughput improvement of 63%, and buffer power reduction of 80% as compared to a baseline broadcast. Simulations with PARSEC benchmarks show BOOM reducing average request and network latency by 40% and 15% respectively

    Design and implementation of in-network coherence

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Title as it appears in MIT Commencement Exercises program, June 2013: Design and implementation of in-network coherence. Cataloged from PDF version of thesis.Includes bibliographical references (p. 101-104).CMOS technology scaling has enabled increasing transistor density on chip. At the same time, multi-core processors that provide increased performance, vis-a'-vis power efficiency, have become prevalent in a power constrained environment. The shared memory model is a predominant paradigm in such systems, easing programmability and increasing portability. However with memory being shared by an increasing number of cores, a scalable coherence mechanism is imperative for these systems. Snoopy coherence has been a favored coherence scheme owing to its high performance and simplicity. However there are few viable proposals to extend snoopy coherence to unordered interconnects - specifically, modular packet-switched interconnects that have emerged as a scalable solution to the communication challenges in the CMP era. This thesis proposes a distributed in-network global ordering scheme that enables snoopy coherence on unordered interconnects. The proposed scheme is realized on a two-dimensional mesh interconnection network, referred to as OMNI (Ordered Mesh Network Interconnect). OMNI is an enabling solution for the SCORPIO processor prototype developed at MIT - a 36-core chip multi-processor supporting snoopy coherence, and fabricated in a commercial 45nm technology. OMNI is shown to be effective, reducing runtime by 36% in comparison to directory and Hammer coherence protocol implementations. The OMNI network achieves an operating frequency of 833 MHz post-layout, occupies 10% of the chip area, and consumes less than 100mW of power.by Suvinay Subramanian.S.M

    Multipath Routing on Anonymous Communication Systems: Enhancing Privacy and Performance

    Get PDF
    We live in an era where mass surveillance and online tracking against civilians and organizations have reached alarming levels. This has resulted in more and more users relying on anonymous communications tools for their daily online activities. Nowadays, Tor is the most popular and widely deployed anonymization network, serving millions of daily users in the entire world. Tor promises to hide the identity of users (i.e., IP addresses) and prevents that external agents disclose relationships between the communicating parties. However, the benefit of privacy protection comes at the cost of severe performance loss. This performance loss degrades the user experience to such an extent that many users do not use anonymization networks and forgo the privacy protection offered. On the other hand, the popularity of Tor has captured the attention of attackers wishing to deanonymize their users. As a response, this dissertation presents a set of multipath routing techniques, both at transport and circuit level, to improve the privacy and performance offered to Tor users. To this end, we first present a comprehensive taxonomy to identify the implications of integrating multipath on each design aspect of Tor. Then, we present a novel transport design to address the existing performance unfairness of the Tor traffic.In Tor, traffic from multiple users is multiplexed in a single TCP connection between two relays. While this has positive effects on privacy, it negatively influences performance and is characterized by unfairness as TCP congestion control gives all the multiplexed Tor traffic as little of the available bandwidth as it gives to every single TCP connection that competes for the same resource. To counter this, we propose to use multipath TCP (MPTCP) to allow for better resource utilization, which, in turn, increases throughput of the Tor traffic to a fairer extend. Our evaluation in real-world settings shows that using out-of-the-box MPTCP leads to 15% performance gain. We analyze the privacy implications of MPTCP in Tor settings and discuss potential threats and mitigation strategies. Regarding privacy, in Tor, a malicious entry node can mount website fingerprinting (WFP) attacks to disclose the identities of Tor users by only observing patterns of data flows.In response to this, we propose splitting traffic over multiple entry nodes to limit the observable patterns that an adversary has access to. We demonstrate that our sophisticated splitting strategy reduces the accuracy from more than 98% to less than 16% for all state-of-the-art WFP attacks without adding any artificial delays or dummy traffic. Additionally, we show that this defense, initially designed against WFP, can also be used to mitigate end-to-end correlation attacks. The contributions presented in this thesis are orthogonal to each other and their synergy comprises a boosted system in terms of both privacy and performance. This results in a more attractive anonymization network for new and existing users, which, in turn, increases the security of all users as a result of enlarging the anonymity set

    Supporting Sequential Consistency through Ordered Network in Many-Core Systems

    Get PDF
    University of Minnesota M.S.E.E. thesis.December 2017. Major: Computer Engineering. Advisor: Antonia Zhai. 1 computer file (PDF); vi, 48 pages.Recently, there are two trends in parallel computing. On one hand, emerging workloads have exhibited significant data-level parallelism; on the other hand, modern processors are increasing in core count to satisfy the increasing demand of processing power under stringent power and thermal constraints. Hence, multi-core and many-core systems have become ubiquitous. To facilitate software development on such processors, it is desirable to efficiently support an intuitive memory consistency model, such as the sequential consistency model. In this work, we demonstrate the feasibility of supporting the sequential memory consistency model on many-core systems. Our experiments show that in many-core systems where in-order cores with no private caches and shared memory modules are connected with a 2D-mesh network that supports circuit-switching, we are able to efficiently support sequential memory consistency by ordering memory requests in the network. In this work, memory requests are ordered by time-stamping each memory request and circulating a token among the memory modules. Furthermore, we extended the mechanism for ordering memory traffic in network to speed-up the performance of critical sections. We evaluated the proposed techniques on three different many-core systems that contain 8, 20 and 32 cores respectively. Compared to conventional systems where sequential consistency is supported by serializing memory requests at the cores through fences, the proposed systems are able to outperform the conventional systems by 4.95% , 5.74% and 9.70% respectively on the three different many-core systems

    Revisiting Resource Utilization in The Internet: Architectural Considerations and Challenges

    Get PDF
    The Internet has been a success story for many years. Recently researchers have started to deal with new questions that challenge the effectiveness of the Internet architecture in response to the new demands, e.g. overwhelming traffic growth and latency optimizations. Various proposals ranging from new application level protocols to new network stacks are emerging to help the Internet to keep up with the demand. In this dissertation we look at a few different proposals that deal with improving the speed and resource utilization in the Internet. We first discuss improving the resource utilization in the current Internet by minor changes such as adjusting various parameters in TCP. We then discuss a more radical form of resource utilization through combining the network and the available storage. Combining these two resources, which have traditionally been considered separate, could provide many new speed improvement opportunities. We discuss relaxing the barrier between the storage and the network in the context of Information Centric Networking (ICN), which in itself is an alternative proposals to the current TCP/IP style Internet. With the help of ICN, we propose different forms of in-network caching below the application layer. We argue that, although useful, the new models of utilizing network resource could show to have their own challenges. We namely discuss the resource management and privacy challenges that are introduced with ICN in general and within our proposed solutions in particular. The lack of end-host bindings and the existence of network routable data names in different data chunks make the congestion control, reliability, and privacy in ICN rather different from TCP/IP. We discuss some of these differences and propose solutions that can help addressing each issue in our particular form of ICN-based mechanisms

    Improving Tor using a TCP-over-DTLS Tunnel

    Get PDF
    The Tor network gives anonymity to Internet users by relaying their traffic through the world over a variety of routers. This incurs latency, and this thesis first explores where this latency occurs. Experiments discount the latency induced by routing traffic and computational latency to determine there is a substantial component that is caused by delay in the communication path. We determine that congestion control is causing the delay. Tor multiplexes multiple streams of data over a single TCP connection. This is not a wise use of TCP, and as such results in the unfair application of congestion control. We illustrate an example of this occurrence on a Tor node on the live network and also illustrate how packet dropping and reordering cause interference between the multiplexed streams. Our solution is to use a TCP-over-DTLS (Datagram Transport Layer Security) transport between routers, and give each stream of data its own TCP connection. We give our design for our proposal, and details about its implementation. Finally, we perform experiments on our implemented version to illustrate that our proposal has in fact resolved the multiplexing issues discovered in our system performance analysis. The future work gives a number of steps towards optimizing and improving our work, along with some tangential ideas that were discovered during research. Additionally, the open-source software projects latency_proxy and libspe, which were designed for our purposes but programmed for universal applicability, are discussed

    Cooperative high-performance computing with FPGAs - matrix multiply case-study

    Get PDF
    In high-performance computing, there is great opportunity for systems that use FPGAs to handle communication while also performing computation on data in transit in an ``altruistic'' manner--that is, using resources for computation that might otherwise be used for communication, and in a way that improves overall system performance and efficiency. We provide a specific definition of \textbf{Computing in the Network} that captures this opportunity. We then outline some overall requirements and guidelines for cooperative computing that include this ability, and make suggestions for specific computing capabilities to be added to the networking hardware in a system. We then explore some algorithms running on a network so equipped for a few specific computing tasks: dense matrix multiplication, sparse matrix transposition and sparse matrix multiplication. In the first instance we give limits of problem size and estimates of performance that should be attainable with present-day FPGA hardware
    • …
    corecore