'Paleontological Institute at The University of Kansas'
Abstract
Distributed computations are a very important aspect of modern computing, especially given the rise of distributed systems used for applications such as web search, massively multiplayer online games, financial trading, and cloud computing. When running these computations across several physical machines it becomes much more difficult to determine exactly what is occurring on each system at a specific point in time. This is due to each server having an independent clock, thus making event timestamps inherently inaccurate across machine boundaries. Another difficulty with evaluating distributed experiments is the coordination required to launch daemons, executables, and logging across all machines, followed by the necessary gathering of all related output data. The goal of this research is to overcome these obstacles and construct a single, global timeline of events from all servers. We employ high-resolution clock synchronization to bring all servers within microseconds as measured by a modified version of the Network Time Protocol implementation. Kernel and user-level events with wall-clock timestamps are then logged during basic network socket experiments. These data are then collected from each server and merged into a single dataset, sorted by timestamp, and plotted on a timeline. The entire experiment, from setup to teardown to data collection, is coordinated from a single server. The timeline visualizations provide a narrative of not only how packets flow between servers, but also how kernel interrupt handlers and other events shape an experiment's execution