4 research outputs found

    Interacting with Large Distributed Datasets using Sketch

    Get PDF
    We present Sketch, a distributed software infrastructure for building interactive tools for exploring large datasets, distributed across multiple machines. We have built three sophisticated applications using this framework: a billion-row spreadsheet, a distributed log browser, and a distributed- systems performance debugging tool. Sketch applications allow interactive and responsive exploration of complex distributed datasets, scaling gracefully to large system sizes. The conflicting constraints of large-scale data and small timescales required by human interaction are difficult to satisfy simultaneously. Sketch exploits a sweet spot in this trade-off by exploiting the observation that the precision of a data view is limited by the resolution of the user?s screen. The system pushes data reduction operations to the data sources. The core Sketch abstraction provides a narrow programming interface; Sketch clients construct a distributed application by stacking modular components with identical interfaces, each providing a useful feature: network transparency, concurrency, fault-tolerance, straggler avoidance, round-trip reduction, distributed aggregation

    Análisis de los parámetros de operación del clúster de servidores en el centro de datos de la carrera de Ingeniería en Ciencias de la Computación en la Universidad Politécnica Salesiana Sede Quito Campus Sur

    Get PDF
    En el presente proyecto técnico se encuentran las actualizaciones de los valores del rendimiento de la infraestructura de procesamiento en flops, debido al incremento de recursos en los servidores, y se descubre el cálculo del rendimiento de almacenamiento en Iops del Centro de Datos de la carrera de Ingeniería en Ciencias de la Computación de la Universidad Politécnica Salesiana. A través de las técnicas de benchmarking se logró un análisis entre programas para definir los más apropiados y alineados a las características que se requerían para medir el rendimiento del procesamiento y almacenamiento. Debido a que el Centro de Datos se encuentra virtualizada, se implementaron seis máquinas virtuales, una para cada servidor, con los programas elegidos. Se realizaron pruebas de 15 días para determinar el valor específico de cada rendimiento. Después de capturar los valores arrojados por las herramientas, se emplearon fórmulas específicas para descubrir el resultado de la infraestructura de rendimiento, teniendo como resultado del Clúster un valor de 69,02 TFlops y de los dos servidores independientes un valor de 7.7627 TFlops. Siendo así, un resultado final de 76.85 TFlops. En cuanto al almacenamiento, se arrojó como resultado final 93580 Iops, los cuales 52940 Iops de E/S son funcionales. Por esta razón, se define que hubo un incremento notable para la infraestructura de procesamiento según los resultados planteados anteriormente en el año 2020, donde se determinaba que existía un valor de 5.60 TFlops en el clúster, y 7TFlops para el único servidor no virtualizado.In the present technical project are the updates of the values of the performance of the processing infrastructure in flops, due to the increase of resources in the servers, and the calculation of the storage performance in Iops of the Data Center of the career of Engineering in Computer Science from the “Universidad Politécnica Salesiana”. Through benchmarking techniques, an analysis between programs was achieved to define the most appropriate and aligned to the characteristics that were required to measure the performance of processing and storage. Because the Data Center is virtualized, six virtual machines were implemented, one for each server, with the chosen programs. 15-day tests were conducted to determine the specific value of each yield. After capturing the values returned by the tools, specific formulas were used to discover the performance infrastructure result, with the Cluster resulting in a value of 69.02 TFlops and the two independent servers with a value of 7.7627 TFlops. Thus, a final result of 76.85 TFlops. Regarding storage, the final result was 93,580 Iops, of which 52,940 I/O Iops are functional. For this reason, it is defined that there was a notable increase for the processing infrastructure according to the results previously stated in the year 2020, where it was determined that there was a value of 5.60 TFlops in the cluster, and 7TFlops for the only non-virtualized server

    Operating System Support for High-Performance Solid State Drives

    Get PDF

    Modular Data Storage with Anvil

    No full text
    Databases have achieved orders-of-magnitude performance improvements by changing the layout of stored data – for instance, by arranging data in columns or compressing it before storage. These improvements have been implemented in monolithic new engines, however, making it difficult to experiment with feature combinations or extensions. We present Anvil, a modular and extensible toolkit for building database back ends. Anvil’s storage modules, called dTables, have much finer granularity than prior work. For example, some dTables specialize in writing data, while others provide optimized read-only formats. This specialization makes both kinds of dTable simple to write and understand. Unifying dTables implement more comprehensive functionality by layering over other dTables – for instance, building a read/write store from read-only tables and a writable journal, or building a generalpurpose store from optimized special-purpose stores. The dTable design leads to a flexible system powerful enough to implement many database storage layouts. Our prototype implementation of Anvil performs up to 5.5 times faster than an existing B-tree-based database back end on conventional workloads, and can easily be customized for further gains on specific data and workloads
    corecore