13 research outputs found

    Methodology and Application of HPC I/O Characterization with MPIProf and IOT

    Get PDF
    Combining the strengths of MPIProf and IOT, an efficient and systematic method is devised for I/O characterization at the per-job, per-rank, per-file and per-call levels of HPC programs running on the NASA Advanced Supercomputing Center. This method is applied to answer four I/O questions in this paper. A total of 13 MPI programs and 15 cases, ranging from 24 to 5968 ranks, are analyzed to establish the I/O landscape from answers to the four questions. Four of the 13 programs use MPI I/O and the behavior of their collective writes depends on the specific implementation of the MPI library used. The SGI MPT library, the prevailing MPI library for our systems, was found to gather small writes from a large number of ranks to perform larger writes by a small subset of collective buffering ranks. The number of collective buffering ranks invoked by MPT depends on the Lustre stripe count and the number of nodes used for the run. A demonstration of varying the stripe count to achieve double-digit speedup of one program's I/O was presented. Another program, which concurrently opens private files by all ranks and could potentially create a heavy load on the Lustre servers, was identified. The ability to systematically characterize I/O for a large number of programs running on a supercomputer, seek I/O optimization opportunity and identify programs that could cause a high load and instability on the filesystems is important for pursuing exascale in a real production environment

    Architecting Data Centers for High Efficiency and Low Latency

    Full text link
    Modern data centers, housing remarkably powerful computational capacity, are built in massive scales and consume a huge amount of energy. The energy consumption of data centers has mushroomed from virtually nothing to about three percent of the global electricity supply in the last decade, and will continuously grow. Unfortunately, a significant fraction of this energy consumption is wasted due to the inefficiency of current data center architectures, and one of the key reasons behind this inefficiency is the stringent response latency requirements of the user-facing services hosted in these data centers such as web search and social networks. To deliver such low response latency, data center operators often have to overprovision resources to handle high peaks in user load and unexpected load spikes, resulting in low efficiency. This dissertation investigates data center architecture designs that reconcile high system efficiency and low response latency. To increase the efficiency, we propose techniques that understand both microarchitectural-level resource sharing and system-level resource usage dynamics to enable highly efficient co-locations of latency-critical services and low-priority batch workloads. We investigate the resource sharing on real-system simultaneous multithreading (SMT) processors to enable SMT co-locations by precisely predicting the performance interference. We then leverage historical resource usage patterns to further optimize the task scheduling algorithm and data placement policy to improve the efficiency of workload co-locations. Moreover, we introduce methodologies to better manage the response latency by automatically attributing the source of tail latency to low-level architectural and system configurations in both offline load testing environment and online production environment. We design and develop a response latency evaluation framework at microsecond-level precision for data center applications, with which we construct statistical inference procedures to attribute the source of tail latency. Finally, we present an approach that proactively enacts carefully designed causal inference micro-experiments to diagnose the root causes of response latency anomalies, and automatically correct them to reduce the response latency.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/144144/1/yunqi_1.pd

    XSEDE: eXtreme Science and Engineering Discovery Environment Third Quarter 2012 Report

    Get PDF
    The Extreme Science and Engineering Discovery Environment (XSEDE) is the most advanced, powerful, and robust collection of integrated digital resources and services in the world. It is an integrated cyberinfrastructure ecosystem with singular interfaces for allocations, support, and other key services that researchers can use to interactively share computing resources, data, and expertise.This a report of project activities and highlights from the third quarter of 2012.National Science Foundation, OCI-105357

    Solving large permutation flow-shop scheduling problems on GPU-accelerated supercomputers

    Get PDF
    Makespan minimization in permutation flow-shop scheduling is a well-known hard combinatorial optimization problem. Among the 120 standard benchmark instances proposed by E. Taillard in 1993, 23 have remained unsolved for almost three decades. In this paper, we present our attempts to solve these instances to optimality using parallel Branch-and-Bound tree search on the GPU-accelerated Jean Zay supercomputer. We report the exact solution of 11 previously unsolved problem instances and improved upper bounds for 8 instances. The solution of these problems requires both algorithmic improvements and leveraging the computing power of peta-scale high-performance computing platforms. The challenge consists in efficiently performing parallel depth-first traversal of a highly irregular, fine-grained search tree on distributed systems composed of hundreds of massively parallel accelerator devices and multi-core processors. We present and discuss the design and implementation of our permutation-based B&B and experimentally evaluate its parallel performance on up to 384 V100 GPUs (2 million CUDA cores) and 3840 CPU cores. The optimality proof for the largest solved instance requires about 64 CPU-years of computation-using 256 GPUs and over 4 million parallel search agents, the traversal of the search tree is completed in 13 hours, exploring 339 Tera-nodes

    Transformations and pathways of Southern Ocean waters into the South Atlantic Ocean

    Full text link
    [eng] The returning limb of Atlantic Meridional Overturning Circulation (AMOC) is partly supplied by the cold-fresh waters that enter through the Drake Passage. Up to the isoneutral 28.0 kg m−3, the mean water inflow through the Drake Passage to the Scotia Sea is 140.8 ± 7.4 Sv and the outflow through the Northern Passages is 115.9 ± 8.3 Sv. Below this isoneutral reference and down to 2000 m, an additional 23.4 Sv enters through the Drake Passage. The mean barotropic contribution always represents over half the total transports, with substantial seasonal and moderate interannual variability in the water transports. The water mean-residence time is about 6 - 8 months. Combining the Argo floats data with other observational measurements, we apply a climatological high-resolution inverse model over the Scotia Sea boundaries up to the 28.0 kg m-3 isoneutral. The ACC enters 136.7 ± 1.0 Sv through the Drake Passage and exits 137.9 ± 1.0 Sv through the northern boundary, with the difference responding to the South Scotia Ridge and Philip Passages contributions. Along its northward path, the ACC waters lose heat but gain equatorward freshwater transport. Within the Scotia Sea, the surface-modal and modal-intermediate waters experience production in all biogeochemical variables. Finally, regarding anthropogenic DIC, the Scotia Sea stores 0.123 Pg C yr-1. Then, the ROD method compares actual drifters' displacements with numerical trajectory predictions; the observed-predicted differences in final positions respond to diffusive motions not captured by the numerical models. The ROD method is applied in the western South Atlantic Ocean leading to maximum diffusivities of 4630 - 4980 m2 s-1 in the upper 200 m of the water column, presenting an inverse relationship with depth. The diffusivities near the surface are fairly constant in latitude but the diffusion coefficients at 1000 m decrease considerably south of the Southern Boundary. With the horizontal diffusion coefficients obtained previously, we use the Lagrangian technique to determine the fraction of the upper-ocean transport that remains in the ACC flow as it crosses the South Atlantic Ocean and the fraction that contributes to the South Atlantic subtropical gyre. The mean results reveal that 94.8 Sv remains in the ACC, whereas a total of 15.1 Sv contributes directly to the AMOC. This AMOC transport takes a median of 14.3 years to arrive to the Brazilian Current from the Drake Passage. Furthermore, 78.1% of the particles that recirculate in the subtropical gyre perform one recirculation. The results confirm that the water masses entering the subtropical gyre through its eastern edge warm up substantially and lose density, partly transformed to surface waters. Furthermore, the contributions at the eastern edge of the South Atlantic subtropical gyre from the warm-water and the cold-water routes are compared. We perform numerical simulations of Lagrangian trajectories to identify the multiple direct and indirect pathways of intermediate waters. The total cold-route contribution represents between 17.9 and 18.9%, substantially higher than the 7.1 to 12.3% warm- route contribution. Several individual pathways form both routes, but the direct path is the preferential pathway followed by 83.6 to 87.2% of the water parcels. The direct cold route is the one that undergoes a greater transformation of its water masses, and it is confirmed that also feeds the Agulhas Current, contributing similarly to that coming from the Indonesian Throughflow.[spa] La rama de retorno de la circulación meridional del Atlántico (AMOC) es parcialmente sustentada por las aguas frías y frescas que proceden del océano Pacífico a través del pasaje de Drake. Hasta la isoneutra de 28.0 kg m-3, la entrada de agua media a través del paso de Drake hacia el mar de Scotia es de 140.8 ± 7.4 Sv mientras que la salida a través de los pasos del Norte corresponde a 115.9 ± 8.3 Sv. Su componente barotrópica media siempre representa más de la mitad de los transportes totales, con una variación interanual moderada y estacional considerable en los transportes de agua. El tiempo medio estimado de residencia en el mar de Scotia es de unos 6 - 8 meses. Combinando los datos de boyas Argo con otras medidas observacionales y aplicando un modelo inverso climatológico en los límites del mar de Scotia con el fin de definir el flujo de entrada y salida de la ACC hasta la isoneutra de 28.0 kg m-3 se estima que 136.7 ± 1.0 Sv del ACC entran por el paso de Drake y salen 137.9 ± 1.0 Sv por el límite norte. A lo largo de su trayectoria hacia el norte, las aguas del ACC pierden calor pero ganan transporte de agua dulce. Dentro del mar de Scotia, las aguas superficiales- modales y las superficiales-intermedias experimentan producción en todas las variables biogeoquímicas. Finalmente, en cuanto al DIC antropogénico, el mar de Scotia almacena 0.123 Pg C año-1. El método ROD compara los desplazamientos reales de los derivadores con las trayectorias de las partículas simuladas en modelos numéricos; las diferencias observadas-predichas en las posiciones finales responden a movimientos difusivos no capturados por los modelos numéricos. Dicho método es aplicado en el Atlántico Sur occidental obteniendo difusiones máximas de 4630 - 4980 m² s-1 en los primeros 200 m de la columna de agua, presentando una relación inversa con la profundidad. Cerca de la superficie, los coeficientes de difusión son bastante constantes en latitud, sin embargo, a 1000 m los coeficientes disminuyen considerablemente en el sur del frente Límite Sur. Con los coeficientes de difusión horizontal anteriores, se realizan simulaciones Lagrangianas para determinar qué fracción de transporte permanece en la ACC y qué cantidad de transporte se desvía en el norte para alimentar el giro subtropical del Atlántico Sur. Los resultados medianos revelan que 94.8 Sv permanecen en la ACC, mientras que un total de 15.1 Sv contribuyen directamente a la AMOC. Este transporte que se incorpora a la AMOC tarda una mediana de 14.3 años al llegar al corriente del Brasil. Los resultados confirman que las masas de agua que entran al giro subtropical por el borde oriental se calientan sustancialmente y la mayoría pierden densidad, parcialmente transformadas en aguas superficiales, mientras que las masas de agua que permanecen a la ACC se transfieren en gran medida a las capas superficiales y profundas. En cuanto a las contribuciones de la ruta cálida y la ruta fría en el margen oriental del giro subtropical del Atlántico Sur, se realizan simulaciones numéricas para identificar las múltiples vías directas e indirectas de las aguas intermedias. La contribución total de las aguas de la ruta fría representa entre un 17.9 y 18.9%, siendo sustancialmente mayor que el 7.1 y 12.3% de la contribución por parte de la ruta cálida. Ambas rutas están formadas por múltiples vías individuales pero la vía directa es la vía preferente seguida por el 83.6 – 87.2% de las parcelas de agua, siendo la ruta fría directa la que sufre una mayor transformación de sus aguas.[cat] La branca de retorn de la circulació meridional de l'Atlàntic (AMOC) és parcialment sustentada per les aigües fredes i fresques que procedeixen a través del passatge de Drake. Fins a la isoneutra de 28.0 kg m-3, l'entrada d'aigua mitja al mar de Scotia és de 140.8 ± 7.4 Sv mentre que la sortida a través dels passos del Nord correspon a 115.9 ± 8.3 Sv. La seva component barotròpica mitjana sempre representa més de la meitat dels transports d’aigua totals, amb una variació interanual moderada i estacional considerable. El seu temps mig de residència és d'uns 6 - 8 mesos. Mitjançant les dades de boies Argo amb altres mesures observacionals, apliquem un model invers en el mar de Scotia. Exactament, 136.7 ± 1.0 Sv del ACC entren pel pas de Drake i surten 137.9 ± 1.0 Sv pel límit nord. Al llarg de la seva trajectòria, les aigües perden calor però guanyen transport d'aigua dolça; a més les aigües superficials-modals i les superficials- intermèdies experimenten producció en totes les variables biogeoquímiques. El mar de Scotia emmagatzema 0.123 Pg C any-1 de DIC antropogènic. El mètode ROD revela que a l'Atlàntic Sud la difusió horitzontal té una relació inversa amb la profunditat. Les difusions són màximes de 4630 - 4980 m² s-1 en els primers 200 m i bastant constants en latitud, no obstant això a 1000 m els coeficients disminueixen considerablement al sud del front Límit Sud. Les simulacions Lagrangianes revelen que 94.8 Sv romanen en el ACC, mentre que un total de 15.1 Sv contribueixen directament a la AMOC. Aquest transport triga una mediana de 14.3 anys en arribar al corrent del Brasil. Les masses d'aigua que entren al gir subtropical s'escalfen substancialment i la majoria perden densitat, parcialment transformades en aigües superficials. Finalment, comparem les contribucions de la ruta càlida i la ruta freda en el marge oriental del gir subtropical de l'Atlàntic Sud obtenint que la ruta freda representa entre un 17.9 i 18.9%, sent substancialment major que la ruta càlida. Totes dues rutes estan formades per múltiples vies individuals però la via directa sempre és la via preferent

    Internet of Underwater Things and Big Marine Data Analytics -- A Comprehensive Survey

    Full text link
    The Internet of Underwater Things (IoUT) is an emerging communication ecosystem developed for connecting underwater objects in maritime and underwater environments. The IoUT technology is intricately linked with intelligent boats and ships, smart shores and oceans, automatic marine transportations, positioning and navigation, underwater exploration, disaster prediction and prevention, as well as with intelligent monitoring and security. The IoUT has an influence at various scales ranging from a small scientific observatory, to a midsized harbor, and to covering global oceanic trade. The network architecture of IoUT is intrinsically heterogeneous and should be sufficiently resilient to operate in harsh environments. This creates major challenges in terms of underwater communications, whilst relying on limited energy resources. Additionally, the volume, velocity, and variety of data produced by sensors, hydrophones, and cameras in IoUT is enormous, giving rise to the concept of Big Marine Data (BMD), which has its own processing challenges. Hence, conventional data processing techniques will falter, and bespoke Machine Learning (ML) solutions have to be employed for automatically learning the specific BMD behavior and features facilitating knowledge extraction and decision support. The motivation of this paper is to comprehensively survey the IoUT, BMD, and their synthesis. It also aims for exploring the nexus of BMD with ML. We set out from underwater data collection and then discuss the family of IoUT data communication techniques with an emphasis on the state-of-the-art research challenges. We then review the suite of ML solutions suitable for BMD handling and analytics. We treat the subject deductively from an educational perspective, critically appraising the material surveyed.Comment: 54 pages, 11 figures, 19 tables, IEEE Communications Surveys & Tutorials, peer-reviewed academic journa

    Deep learning for internet of underwater things and ocean data analytics

    Get PDF
    The Internet of Underwater Things (IoUT) is an emerging technological ecosystem developed for connecting objects in maritime and underwater environments. IoUT technologies are empowered by an extreme number of deployed sensors and actuators. In this thesis, multiple IoUT sensory data are augmented with machine intelligence for forecasting purposes
    corecore