78 research outputs found
Streaming Graph Challenge: Stochastic Block Partition
An important objective for analyzing real-world graphs is to achieve scalable
performance on large, streaming graphs. A challenging and relevant example is
the graph partition problem. As a combinatorial problem, graph partition is
NP-hard, but existing relaxation methods provide reasonable approximate
solutions that can be scaled for large graphs. Competitive benchmarks and
challenges have proven to be an effective means to advance state-of-the-art
performance and foster community collaboration. This paper describes a graph
partition challenge with a baseline partition algorithm of sub-quadratic
complexity. The algorithm employs rigorous Bayesian inferential methods based
on a statistical model that captures characteristics of the real-world graphs.
This strong foundation enables the algorithm to address limitations of
well-known graph partition approaches such as modularity maximization. This
paper describes various aspects of the challenge including: (1) the data sets
and streaming graph generator, (2) the baseline partition algorithm with
pseudocode, (3) an argument for the correctness of parallelizing the Bayesian
inference, (4) different parallel computation strategies such as node-based
parallelism and matrix-based parallelism, (5) evaluation metrics for partition
correctness and computational requirements, (6) preliminary timing of a
Python-based demonstration code and the open source C++ code, and (7)
considerations for partitioning the graph in streaming fashion. Data sets and
source code for the algorithm as well as metrics, with detailed documentation
are available at GraphChallenge.org.Comment: To be published in 2017 IEEE High Performance Extreme Computing
Conference (HPEC
Performance Measurements of Supercomputing and Cloud Storage Solutions
Increasing amounts of data from varied sources, particularly in the fields of
machine learning and graph analytics, are causing storage requirements to grow
rapidly. A variety of technologies exist for storing and sharing these data,
ranging from parallel file systems used by supercomputers to distributed block
storage systems found in clouds. Relatively few comparative measurements exist
to inform decisions about which storage systems are best suited for particular
tasks. This work provides these measurements for two of the most popular
storage technologies: Lustre and Amazon S3. Lustre is an open-source, high
performance, parallel file system used by many of the largest supercomputers in
the world. Amazon's Simple Storage Service, or S3, is part of the Amazon Web
Services offering, and offers a scalable, distributed option to store and
retrieve data from anywhere on the Internet. Parallel processing is essential
for achieving high performance on modern storage systems. The performance tests
used span the gamut of parallel I/O scenarios, ranging from single-client,
single-node Amazon S3 and Lustre performance to a large-scale, multi-client
test designed to demonstrate the capabilities of a modern storage appliance
under heavy load. These results show that, when parallel I/O is used correctly
(i.e., many simultaneous read or write processes), full network bandwidth
performance is achievable and ranged from 10 gigabits/s over a 10 GigE S3
connection to 0.35 terabits/s using Lustre on a 1200 port 10 GigE switch. These
results demonstrate that S3 is well-suited to sharing vast quantities of data
over the Internet, while Lustre is well-suited to processing large quantities
of data locally.Comment: 5 pages, 4 figures, to appear in IEEE HPEC 201
GraphChallenge.org: Raising the Bar on Graph Analytic Performance
The rise of graph analytic systems has created a need for new ways to measure
and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE
Graph Challenge has been developed to provide a well-defined community venue
for stimulating research and highlighting innovations in graph analysis
software, hardware, algorithms, and systems. GraphChallenge.org provides a wide
range of pre-parsed graph data sets, graph generators, mathematically defined
graph algorithms, example serial implementations in a variety of languages, and
specific metrics for measuring performance. Graph Challenge 2017 received 22
submissions by 111 authors from 36 organizations. The submissions highlighted
graph analytic innovations in hardware, software, algorithms, systems, and
visualization. These submissions produced many comparable performance
measurements that can be used for assessing the current state of the art of the
field. There were numerous submissions that implemented the triangle counting
challenge and resulted in over 350 distinct measurements. Analysis of these
submissions show that their execution time is a strong function of the number
of edges in the graph, , and is typically proportional to for
large values of . Combining the model fits of the submissions presents a
picture of the current state of the art of graph analysis, which is typically
edges processed per second for graphs with edges. These results
are times faster than serial implementations commonly used by many graph
analysts and underscore the importance of making these performance benefits
available to the broader community. Graph Challenge provides a clear picture of
current graph analysis systems and underscores the need for new innovations to
achieve high performance on very large graphs.Comment: 7 pages, 6 figures; submitted to IEEE HPEC Graph Challenge. arXiv
admin note: text overlap with arXiv:1708.0686
Steppe-tundra composition and deglacial floristic turnover in interior Alaska revealed by sedimentary ancient DNA (sedaDNA)
When tracing vegetation dynamics over long timescales, obtaining enough floristic information to gain a detailed understanding of past communities and their transitions can be challenging. The first high-resolution sedimentary DNA (sedaDNA) metabarcoding record from lake sediments in Alaska—reported here—covers nearly 15,000 years of change. It shows in unprecedented detail the composition of late-Pleistocene “steppe-tundra” vegetation of ice-free Alaska, part of an intriguing late-Quaternary “no-analogue” biome, and it covers the subsequent changes that led to the development of modern spruce-dominated boreal forest. The site (Chisholm Lake) lies close to key archaeological sites, and the record throws new light on the landscape and resources available to early humans. Initially, vegetation was dominated by forbs found in modern tundra and/or subarctic steppe vegetation (e.g., Potentilla, Draba, Eritrichium, Anemone patens), and graminoids (e.g., Bromus pumpellianus, Festuca, Calamagrostis, Puccinellia), with Salix the only prominent woody taxon. Predominantly xeric, warm-to-cold habitats are indicated, and we explain the mixed ecological preferences of the fossil assemblages as a topo-mosaic strongly affected by insolation load. At ca. 14,500 cal yr BP (calendar years before C.E. 1950), about the same time as well documented human arrivals and coincident with an increase in effective moisture, Betula expanded. Graminoids became less abundant, but many open-ground forb taxa persisted. This woody-herbaceous mosaic is compatible with the observed persistence of Pleistocene megafaunal species (animals weighing ≥44 kg)—important resources for early humans. The greatest taxonomic turnover, marking a transition to regional woodland and a further moisture increase, began ca. 11,000 cal yr BP when Populus expanded, along with new shrub taxa (e.g., Shepherdia, Eleagnus, Rubus, Viburnum). Picea then expanded ca. 9500 cal yr BP, along with shrub and forb taxa typical of evergreen boreal woodland (e.g., Spiraea, Cornus, Linnaea). We found no evidence for Picea in the late Pleistocene, however. Most taxa present today were established by ca. 5000 cal yr BP after almost complete taxonomic turnover since the start of the record (though Larix appeared only at ca. 1500 cal yr BP). Prominent fluctuations in aquatic communities ca. 14,000–9,500 cal yr BP are probably related to lake-level fluctuations prior to the lake reaching its high, near-modern depth ca. 8,000 cal yr BP
Feasibility study of small and micro wind turbines for residential use in New Zealand: an analysis of technical implementation, spatial planning processes and of economic viability of small and micro scale wind energy generation systems for residential use in New Zealand
Even though there might not seem to be any similarity between a holiday lodge on the verge of New Zealand’s Banks Peninsula, a satellite earth station on the unmanned Black Island in the middle of the Ross Ice Shelf in the Antarctica and an American stargazer on his property in the middle of the Arizona desert, they all have something in common. They, among many other people across the globe, use the free resource wind to generate eco-friendly electricity, facilitating small and micro scale wind turbines. Japan, the USA and the UK, for example, have already installed thousands of domestic wind turbines. In New Zealand small and micro scale wind energy generation still has not established itself among other distributed energy generation methods on a domestic scale, even though the conditions for wind energy generation are perfect in many places.
The aim of this study was to assess the potential of domestic wind turbines in New Zealand. It established an overview of small and micro scale wind energy generation planning and implementation processes to gain insight into effectiveness, feasibility and straight forwardness of the processes involved. Hereby, economic, technical and planning aspects of domestic wind energy generation systems were analysed to investigate the benefits from small and micro scale wind energy generation
Using Transport Services instead of specific Transport Protocols
For most applications the used transport service providers are predetermined during the development of the application. This makes it difficult to consider the application communication requirements and to exploit specific features of the network technology. Specialized protocols that are more efficient and offer a qualitative improved service are typically not supported by most applications because they are not commonly available. In this paper we propose a concept for the realization of protocol independent transport services. Only a transport service is predetermined during the development of the application and an appropriate transport service provider is dynamically selected at run time. This enables to exploit specialized protocols if possible, but standard protocols could still be used if necessary. The main focus of this paper is how a transport service could provide a new transport service provider transparently to existing applications. A prototype is presented that maps TCP/IP based applications to an ATM specific transport service provider which offers a reliable and unreliable transport service like TCP/IP
- …