199 research outputs found
Fast-Convergence Microsecond-Accurate Clock Discipline Algorithm for Hardware Implementation
Discrete microprocessor-based equipment is a typical synchronization
system on the market which implements the most critical
features of the synchronization protocols in hardware and the synchronization
algorithms in software. In this paper, a new clock discipline
algorithm for hardware implementation is presented, allowing for full
hardware implementation of synchronization systems. Measurements on
field-programmable gate array prototypes show a fast convergence time
(below 10 s) and a high accuracy (1 μs) for typical configuration
parameters.Ministerio de Educación y Cultura HIPER TEC2007-61802/MI
A Software-based Low-Jitter Servo Clock for Inexpensive Phasor Measurement Units
This paper presents the design and the implementation of a servo-clock (SC)
for low-cost Phasor Measurement Units (PMUs). The SC relies on a classic
Proportional Integral (PI) controller, which has been properly tuned to
minimize the synchronization error due to the local oscillator triggering the
on-board timer. The SC has been implemented into a PMU prototype developed
within the OpenPMU project using a BeagleBone Black (BBB) board. The
distinctive feature of the proposed solution is its ability to track an input
Pulse-Per-Second (PPS) reference with good long-term stability and with no need
for specific on-board synchronization circuitry. Indeed, the SC implementation
relies only on one co-processor for real-time application and requires just an
input PPS signal that could be distributed from a single substation clock
Fastpass: A Centralized “Zero-Queue” Datacenter Network
An ideal datacenter network should provide several properties, including low median and tail latency, high utilization (throughput), fair allocation of network resources between users or applications, deadline-aware scheduling, and congestion (loss) avoidance. Current datacenter networks inherit the principles that went into the design of the Internet, where packet transmission and path selection decisions are distributed among the endpoints and routers. Instead, we propose that each sender should delegate control—to a centralized arbiter—of when each packet should be transmitted and what path it should follow. This paper describes Fastpass, a datacenter network architecture built using this principle. Fastpass incorporates two fast algorithms: the first determines the time at which each packet should be transmitted, while the second determines the path to use for that packet. In addition, Fastpass uses an efficient protocol between the endpoints and the arbiter and an arbiter replication strategy for fault-tolerant failover. We deployed and evaluated Fastpass in a portion of Facebook’s datacenter network. Our results show that Fastpass achieves high throughput comparable to current networks at a 240 reduction is queue lengths (4.35 Mbytes reducing to 18 Kbytes), achieves much fairer and consistent flow throughputs than the baseline TCP (5200 reduction in the standard deviation of per-flow throughput with five concurrent connections), scalability from 1 to 8 cores in the arbiter implementation with the ability to schedule 2.21 Terabits/s of traffic in software on eight cores, and a 2.5 reduction in the number of TCP retransmissions in a latency-sensitive service at Facebook.National Science Foundation (U.S.) (grant IIS-1065219)Irwin Mark Jacobs and Joan Klein Jacobs Presidential FellowshipHertz Foundation (Fellowship
A Beaconless Asymmetric Energy-Efficient Time Synchronization Scheme for Resource-Constrained Multi-Hop Wireless Sensor Networks
The ever-increasing number of WSN deployments based on a large number of
battery-powered, low-cost sensor nodes, which are limited in their computing
and power resources, puts the focus of WSN time synchronization research on
three major aspects, i.e., accuracy, energy consumption and computational
complexity. In the literature, the latter two aspects have not received much
attention compared to the accuracy of WSN time synchronization. Especially in
multi-hop WSNs, intermediate gateway nodes are overloaded with tasks for not
only relaying messages but also a variety of computations for their offspring
nodes as well as themselves. Therefore, not only minimizing the energy
consumption but also lowering the computational complexity while maintaining
the synchronization accuracy is crucial to the design of time synchronization
schemes for resource-constrained sensor nodes. In this paper, focusing on the
three aspects of WSN time synchronization, we introduce a framework of reverse
asymmetric time synchronization for resource-constrained multi-hop WSNs and
propose a beaconless energy-efficient time synchronization scheme based on
reverse one-way message dissemination. Experimental results with a WSN testbed
based on TelosB motes running TinyOS demonstrate that the proposed scheme
conserves up to 95% energy consumption compared to the flooding time
synchronization protocol while achieving microsecond-level synchronization
accuracy.Comment: 12 pages, 16 figure
414 InternatIonal Journal of electronIcs & communIcatIon technology
Abstract Neural networks are a new method of programming computers. They are exceptionally good at performing pattern recognition and other tasks that are very difficult to program using conventional techniques. Programs that employ neural nets are also capable of learning on their own and adapting to changing conditions. Neural nets may be the future of computing .A good way to understand them is with a puzzle that neural nets can be used to solve. Suppose that you are given 500 characters of code that you know to be C, C++, Java, or Python. Now, construct a program that identifies the code's language. One solution is to construct a neural net that learns to identify these languages. According to a simplified account, the human brain consists of about ten billion neurons --and a neuron is, on average, connected to several thousand other neurons. By way of these connections, neurons both send and receive varying quantities of energy. One very important feature of neurons is that they don't react immediately to the reception of energy. Instead, they sum their received energies, and they send their own quantities of energy to other neurons only when this sum has reached a certain critical threshold. The brain learns by adjusting the number and strength of these connections. The brain's network of neurons forms a massively parallel information processing system. This contrasts with conventional computers, in which a single processor executes a single series of instructions
Datacenter Architectures for the Microservices Era
Modern internet services are shifting away from single-binary, monolithic services into numerous loosely-coupled microservices that interact via Remote Procedure Calls (RPCs), to improve programmability, reliability, manageability, and scalability of cloud services.
Computer system designers are faced with many new challenges with microservice-based architectures, as individual RPCs/tasks are only a few microseconds in most microservices. In this dissertation, I seek to address the most notable challenges that arise due to the dissimilarities of the modern microservice based and classic monolithic cloud services, and design novel server architectures and runtime systems that enable efficient execution of µs-scale microservices on modern hardware.
In the first part of my dissertation, I seek to address the problem of Killer Microseconds, which refers to µs-scale “holes” in CPU schedules caused by stalls to access fast I/O devices or brief idle times between requests in high throughput µs-scale microservices. Whereas modern computing platforms can efficiently hide ns-scale and ms-scale stalls through micro-architectural techniques and OS context switching, they lack efficient support to hide the latency of µs-scale stalls. In chapter II, I propose Duplexity, a heterogeneous server architecture that employs aggressive multithreading to hide the latency of killer microseconds, without sacrificing the Quality-of-Service (QoS) of latency-sensitive microservices. Duplexity is able to achieve 1.9× higher core utilization and 2.7× lower iso-throughput 99th-percentile tail latency over an SMT-based server design, on average.
In chapters III-IV, I comprehensively investigate the problem of tail latency in the context of microservices and address multiple aspects of it. First, in chapter III, I characterize the tail latency behavior of microservices and provide general guidelines for optimizing computer systems from a queuing perspective to minimize tail latency. Queuing is a major contributor to end-to-end tail latency, wherein nominal tasks are enqueued behind rare, long ones, due to Head-of-Line (HoL) blocking. Next, in chapter IV, I introduce Q-Zilla,
a scheduling framework to tackle tail latency from a queuing perspective, and CoreZilla, a microarchitectural instantiation of the framework. Q-Zilla is composed of the ServerQueue Decoupled Size-Interval Task Assignment (SQD-SITA) scheduling algorithm and the Express-lane Simultaneous Multithreading (ESMT) microarchitecture, which together seek to address HoL blocking by providing an “express-lane” for short tasks, protecting them from queuing behind rare, long ones. By combining the ESMT microarchitecture and the SQD-SITA scheduling algorithm, CoreZilla is able to improves tail latency over a conventional SMT core with 2, 4, and 8 contexts by 2.25×, 3.23×, and 4.38×, on average, respectively, and outperform a theoretical 32-core scale-up organization by 12%, on average, with 8 contexts.
Finally, in chapters V-VI, I investigate the tail latency problem of microservices from a cluster, rather than server-level, perspective. Whereas Service Level Objectives (SLOs) define end-to-end latency targets for the entire service to ensure user satisfaction, with microservice-based applications, it is unclear how to scale individual microservices when end-to-end SLOs are violated or underutilized. I introduce Parslo as an analytical framework for partial SLO allocation in virtualized cloud microservices. Parslo takes a microservice
graph as an input and employs a Gradient Descent-based approach to allocate “partial SLOs” to different microservice nodes, enabling independent auto-scaling of individual microservices. Parslo achieves the optimal solution, minimizing the total cost for the entire service deployment, and is applicable to general microservice graphs.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/167978/1/miramir_1.pd
System-on-chip architecture for secure sub-microsecond synchronization systems
213 p.En esta tesis, se pretende abordar los problemas que conlleva la protección cibernética del Precision Time Protocol (PTP). Éste es uno de los protocolos de comunicación más sensibles de entre los considerados por los organismos de estandarización para su aplicación en las futuras Smart Grids o redes eléctricas inteligentes. PTP tiene como misión distribuir una referencia de tiempo desde un dispositivo maestro al resto de dispositivos esclavos, situados dentro de una misma red, de forma muy precisa. El protocolo es altamente vulnerable, ya que introduciendo tan sólo un error de tiempo de un microsegundo, pueden causarse graves problemas en las funciones de protección del equipamiento eléctrico, o incluso detener su funcionamiento. Para ello, se propone una nueva arquitectura System-on-Chip basada en dispositivos reconfigurables, con el objetivo de integrar el protocolo PTP y el conocido estándar de seguridad MACsec para redes Ethernet. La flexibilidad que los modernos dispositivos reconfigurables proporcionan, ha sido aprovechada para el diseño de una arquitectura en la que coexisten procesamiento hardware y software. Los resultados experimentales avalan la viabilidad de utilizar MACsec para proteger la sincronización en entornos industriales, sin degradar la precisión del protocolo
- …