2,388 research outputs found
Automated System Performance Testing at MongoDB
Distributed Systems Infrastructure (DSI) is MongoDB's framework for running
fully automated system performance tests in our Continuous Integration (CI)
environment. To run in CI it needs to automate everything end-to-end:
provisioning and deploying multi-node clusters, executing tests, tuning the
system for repeatable results, and collecting and analyzing the results. Today
DSI is MongoDB's most used and most useful performance testing tool. It runs
almost 200 different benchmarks in daily CI, and we also use it for manual
performance investigations. As we can alert the responsible engineer in a
timely fashion, all but one of the major regressions were fixed before the
4.2.0 release. We are also able to catch net new improvements, of which DSI
caught 17. We open sourced DSI in March 2020.Comment: Author Preprint. Appearing in DBTest.io 202
Synapse: Synthetic Application Profiler and Emulator
We introduce Synapse motivated by the needs to estimate and emulate workload
execution characteristics on high-performance and distributed heterogeneous
resources. Synapse has a platform independent application profiler, and the
ability to emulate profiled workloads on a variety of heterogeneous resources.
Synapse is used as a proxy application (or "representative application") for
real workloads, with the added advantage that it can be tuned at arbitrary
levels of granularity in ways that are simply not possible using real
applications. Experiments show that automated profiling using Synapse
represents application characteristics with high fidelity. Emulation using
Synapse can reproduce the application behavior in the original runtime
environment, as well as reproducing properties when used in a different
run-time environments
Prototyping Operational Autonomy for Space Traffic Management
Current state of the art in Space Traffic Management (STM) relies on a handful of providers for surveillance and collision prediction, and manual coordination between operators. Neither is scalable to support the expected 10x increase in spacecraft population in less than 10 years, nor does it support automated manuever planning. We present a software prototype of an STM architecture based on open Application Programming Interfaces (APIs), drawing on previous work by NASA to develop an architecture for low-altitude Unmanned Aerial System Traffic Management. The STM architecture is designed to provide structure to the interactions between spacecraft operators, various regulatory bodies, and service suppliers, while maintaining flexibility of these interactions and the ability for new market participants to enter easily. Autonomy is an indispensable part of the proposed architecture in enabling efficient data sharing, coordination between STM participants and safe flight operations. Examples of autonomy within STM include syncing multiple non-authoritative catalogs of resident space objects, or determining which spacecraft maneuvers when preventing impending conjunctions between multiple spacecraft. The STM prototype is based on modern micro-service architecture adhering to OpenAPI standards and deployed in industry standard Docker containers, facilitating easy communication between different participants or services. The system architecture is designed to facilitate adding and replacing services with minimal disruption. We have implemented some example participant services (e.g. a space situational awareness provider/SSA, a conjunction assessment supplier/CAS, an automated maneuver advisor/AMA) within the prototype. Different services, with creative algorithms folded into then, can fulfil similar functional roles within the STM architecture by flexibly connecting to it using pre-defined APIs and data models, thereby lowering the barrier to entry of new players in the STM marketplace. We demonstrate the STM prototype on a multiple conjunction scenario with multiple maneuverable spacecraft, where an example CAS and AMA can recommend optimal maneuvers to the spacecraft operators, based on a predefined reward function. Such tools can intelligently search the space of potential collision avoidance maneuvers with varying parameters like lead time and propellant usage, optimize a customized reward function, and be implemented as a scheduling service within the STM architecture. The case study shows an example of autonomous maneuver planning is possible using the API-based framework. As satellite populations and predicted conjunctions increase, an STM architecture can facilitate seamless information exchange related to collision prediction and mitigation among various service applications on different platforms and servers. The availability of such an STM network also opens up new research topics on satellite maneuver planning, scheduling and negotiation across disjoint entities
Big Data and the Internet of Things
Advances in sensing and computing capabilities are making it possible to
embed increasing computing power in small devices. This has enabled the sensing
devices not just to passively capture data at very high resolution but also to
take sophisticated actions in response. Combined with advances in
communication, this is resulting in an ecosystem of highly interconnected
devices referred to as the Internet of Things - IoT. In conjunction, the
advances in machine learning have allowed building models on this ever
increasing amounts of data. Consequently, devices all the way from heavy assets
such as aircraft engines to wearables such as health monitors can all now not
only generate massive amounts of data but can draw back on aggregate analytics
to "improve" their performance over time. Big data analytics has been
identified as a key enabler for the IoT. In this chapter, we discuss various
avenues of the IoT where big data analytics either is already making a
significant impact or is on the cusp of doing so. We also discuss social
implications and areas of concern.Comment: 33 pages. draft of upcoming book chapter in Japkowicz and Stefanowski
(eds.) Big Data Analysis: New algorithms for a new society, Springer Series
on Studies in Big Data, to appea
User Applications Driven by the Community Contribution Framework MPContribs in the Materials Project
This work discusses how the MPContribs framework in the Materials Project
(MP) allows user-contributed data to be shown and analyzed alongside the core
MP database. The Materials Project is a searchable database of electronic
structure properties of over 65,000 bulk solid materials that is accessible
through a web-based science-gateway. We describe the motivation for enabling
user contributions to the materials data and present the framework's features
and challenges in the context of two real applications. These use-cases
illustrate how scientific collaborations can build applications with their own
"user-contributed" data using MPContribs. The Nanoporous Materials Explorer
application provides a unique search interface to a novel dataset of hundreds
of thousands of materials, each with tables of user-contributed values related
to material adsorption and density at varying temperature and pressure. The
Unified Theoretical and Experimental x-ray Spectroscopy application discusses a
full workflow for the association, dissemination and combined analyses of
experimental data from the Advanced Light Source with MP's theoretical core
data, using MPContribs tools for data formatting, management and exploration.
The capabilities being developed for these collaborations are serving as the
model for how new materials data can be incorporated into the Materials Project
website with minimal staff overhead while giving powerful tools for data search
and display to the user community.Comment: 12 pages, 5 figures, Proceedings of 10th Gateway Computing
Environments Workshop (2015), to be published in "Concurrency in Computation:
Practice and Experience
Mechanism for Change Detection in HTML Web Pages as XML Documents
Veebilehtede muudatuste tuvastamine on oluline osa veebi monitoorimisest. Veebi automaatset monitoorimist saab kasutada spetsiiflise informatsiooni kogumiseks, näiteks avalike teadaannete, uudiste või hinnamuutuste automaatseks märkamiseks. Kui lehe HTML-kood talletada, on võimalik seda lehte uuesti külastades uut ja eelnevat koodi võrrelda ning nendevahelised erinevused leida. HTML-koode saab võrrelda tavateksti võrdlemise meetodite abil, kuid sel juhul riskime lehe struktuuri kohta käiva informatsiooni kaotamisega. HTML-kood on struktuurilt puulaadne ja selle omaduse säilitamine muudatuste tuvastamisel on soovitav. Selles töös kirjeldame mehhanismi, millega eelnevalt kogutud HTML-koodis lehed teisendatakse XML dokumentide kujule ning võrreldakse neid XML puudena. Me kirjeldame selle ülesande täitmiseks vajalikke komponente ja oma teostust, mis kasutab NutchWAX-i, NekoHTML-i, XMLUnit-it, Jena-t ja MongoDBd. Me analüüsime mõõtmistulemusi, mis koguti selle programmiga 1,1 miljoni HTML lehe läbimisel. Meile teadaolevatel andmetel pole sellist mehhanismi varem rakendatud. Me näitame, et mehhanism on kasutatav tegelikkuses esinevate andmete töötlemiseks.Change detection of web pages is an important aspect of web monitoring. Automated web monitoring can be used for the collection of specifc information, for example for detecting public announcements, news posts and changes of prices. If we store the HTML code of a page, we can compare the current and previous codes when we revisit the page, allowing us to find their changes. HTML code can be compared using ordinary text comparison, but this brings the risk of losing information about the structure of the page. HTML code is treelike in structure and it is a desirable property to preserve when finding changes. In this work we describe a mechanism that can be applied to collected HTML pages to find their changes by transforming HTML pages into XML documents and comparing the resulting XML trees. We give a general list of the components needed for this task, describe our implementation which uses NutchWAX, NekoHTML, XMLUnit, Jena and MongoDB, and show the results of applying the program to a dataset. We analyse the results of measurements collected when running our program on 1.1 million HTML pages. To our knowledge this mechanism has not been tested in previous works. We show that the mechanism is usable on real world data
- …