2,288 research outputs found

    Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications

    Full text link
    MapReduce is a popular programming paradigm for developing large-scale, data-intensive computation. Many frameworks that implement this paradigm have recently been developed. To leverage these frameworks, however, developers must become familiar with their APIs and rewrite existing code. Casper is a new tool that automatically translates sequential Java programs into the MapReduce paradigm. Casper identifies potential code fragments to rewrite and translates them in two steps: (1) Casper uses program synthesis to search for a program summary (i.e., a functional specification) of each code fragment. The summary is expressed using a high-level intermediate language resembling the MapReduce paradigm and verified to be semantically equivalent to the original using a theorem prover. (2) Casper generates executable code from the summary, using either the Hadoop, Spark, or Flink API. We evaluated Casper by automatically converting real-world, sequential Java benchmarks to MapReduce. The resulting benchmarks perform up to 48.2x faster compared to the original.Comment: 12 pages, additional 4 pages of references and appendi

    New Directions in Cloud Programming

    Full text link
    Nearly twenty years after the launch of AWS, it remains difficult for most developers to harness the enormous potential of the cloud. In this paper we lay out an agenda for a new generation of cloud programming research aimed at bringing research ideas to programmers in an evolutionary fashion. Key to our approach is a separation of distributed programs into a PACT of four facets: Program semantics, Availablity, Consistency and Targets of optimization. We propose to migrate developers gradually to PACT programming by lifting familiar code into our more declarative level of abstraction. We then propose a multi-stage compiler that emits human-readable code at each stage that can be hand-tuned by developers seeking more control. Our agenda raises numerous research challenges across multiple areas including language design, query optimization, transactions, distributed consistency, compilers and program synthesis

    mlirSynth: Automatic, Retargetable Program Raising in Multi-Level IR using Program Synthesis

    Get PDF
    MLIR is an emerging compiler infrastructure for modern hardware, but existing programs cannot take advantage of MLIR’s high-performance compilation if they are described in lower-level general purpose languages. Consequently, to avoid programs needing to be rewritten manually, this has led to efforts to automatically raise lower-level to higher-level dialects in MLIR. However, current methods rely on manually-defined raising rules, which limit their applicability and make them challenging to maintain as MLIR dialects evolve. We present mlirSynth – a novel approach which translates programs from lower-level MLIR dialects to high-level ones without manually defined rules. Instead, it uses available dialect definitions to construct a program space and searches it effectively using type constraints and equivalences. We demonstrate its effectiveness by raising C programs to two distinct high-level MLIR dialects, which enables us to use existing high-level dialect specific compilation flows. On Polybench, we show a greater coverage than previous approaches, resulting in geomean speedups of 2.5x (Intel) and 3.4x (AMD) over state-of-the-art compilation flows. mlirSynth also enables retargetability to domain-specific accelerators, resulting in a geomean speedup of 21.6x on a TPU

    mlirSynth: Automatic, Retargetable Program Raising in Multi-Level IR using Program Synthesis

    Full text link
    MLIR is an emerging compiler infrastructure for modern hardware, but existing programs cannot take advantage of MLIR's high-performance compilation if they are described in lower-level general purpose languages. Consequently, to avoid programs needing to be rewritten manually, this has led to efforts to automatically raise lower-level to higher-level dialects in MLIR. However, current methods rely on manually-defined raising rules, which limit their applicability and make them challenging to maintain as MLIR dialects evolve. We present mlirSynth -- a novel approach which translates programs from lower-level MLIR dialects to high-level ones without manually defined rules. Instead, it uses available dialect definitions to construct a program space and searches it effectively using type constraints and equivalences. We demonstrate its effectiveness \revi{by raising C programs} to two distinct high-level MLIR dialects, which enables us to use existing high-level dialect specific compilation flows. On Polybench, we show a greater coverage than previous approaches, resulting in geomean speedups of 2.5x (Intel) and 3.4x (AMD) over state-of-the-art compilation flows for the C programming language. mlirSynth also enables retargetability to domain-specific accelerators, resulting in a geomean speedup of 21.6x on a TPU

    Big Data and the Internet of Things

    Full text link
    Advances in sensing and computing capabilities are making it possible to embed increasing computing power in small devices. This has enabled the sensing devices not just to passively capture data at very high resolution but also to take sophisticated actions in response. Combined with advances in communication, this is resulting in an ecosystem of highly interconnected devices referred to as the Internet of Things - IoT. In conjunction, the advances in machine learning have allowed building models on this ever increasing amounts of data. Consequently, devices all the way from heavy assets such as aircraft engines to wearables such as health monitors can all now not only generate massive amounts of data but can draw back on aggregate analytics to "improve" their performance over time. Big data analytics has been identified as a key enabler for the IoT. In this chapter, we discuss various avenues of the IoT where big data analytics either is already making a significant impact or is on the cusp of doing so. We also discuss social implications and areas of concern.Comment: 33 pages. draft of upcoming book chapter in Japkowicz and Stefanowski (eds.) Big Data Analysis: New algorithms for a new society, Springer Series on Studies in Big Data, to appea

    Leveraging Automated Unit Tests for Unsupervised Code Translation

    Get PDF
    With little to no parallel data available for programming languages, unsupervised methods are well-suited to source code translation. However, the majority of unsupervised machine translation approaches rely on back-translation, a method developed in the context of natural language translation and one that inherently involves training on noisy inputs. Unfortunately, source code is highly sensitive to small changes; a single token can result in compilation failures or erroneous programs, unlike natural languages where small inaccuracies may not change the meaning of a sentence. To address this issue, we propose to leverage an automated unit-testing system to filter out invalid translations, thereby creating a fully tested parallel corpus. We found that fine-tuning an unsupervised model with this filtered data set significantly reduces the noise in the translations so-generated, comfortably outperforming the state-of-the-art for all language pairs studied. In particular, for Java→Python and Python→C++ we outperform the best previous methods by more than 16% and 24% respectively, reducing the error rate by more than 35%

    Software engineering perspectives on physiological computing

    Get PDF
    Physiological computing is an interesting and promising concept to widen the communication channel between the (human) users and computers, thus allowing an increase of software systems' contextual awareness and rendering software systems smarter than they are today. Using physiological inputs in pervasive computing systems allows re-balancing the information asymmetry between the human user and the computer system: while pervasive computing systems are well able to flood the user with information and sensory input (such as sounds, lights, and visual animations), users only have a very narrow input channel to computing systems; most of the time, restricted to keyboards, mouse, touchscreens, accelerometers and GPS receivers (through smartphone usage, e.g.). Interestingly, this information asymmetry often forces the user to subdue to the quirks of the computing system to achieve his goals -- for example, users may have to provide information the software system demands through a narrow, time-consuming input mode that the system could sense implicitly from the human body. Physiological computing is a way to circumvent these limitations; however, systematic means for developing and moulding physiological computing applications into software are still unknown. This thesis proposes a methodological approach to the creation of physiological computing applications that makes use of component-based software engineering. Components help imposing a clear structure on software systems in general, and can thus be used for physiological computing systems as well. As an additional bonus, using components allow physiological computing systems to leverage reconfigurations as a means to control and adapt their own behaviours. This adaptation can be used to adjust the behaviour both to the human and to the available computing environment in terms of resources and available devices - an activity that is crucial for complex physiological computing systems. With the help of components and reconfigurations, it is possible to structure the functionality of physiological computing applications in a way that makes them manageable and extensible, thus allowing a stepwise and systematic extension of a system's intelligence. Using reconfigurations entails a larger issue, however. Understanding and fully capturing the behaviour of a system under reconfiguration is challenging, as the system may change its structure in ways that are difficult to fully predict. Therefore, this thesis also introduces a means for formal verification of reconfigurations based on assume-guarantee contracts. With the proposed assume-guarantee contract framework, it is possible to prove that a given system design (including component behaviours and reconfiguration specifications) is satisfying real-time properties expressed as assume-guarantee contracts using a variant of real-time linear temporal logic introduced in this thesis - metric interval temporal logic for reconfigurable systems. Finally, this thesis embeds both the practical approach to the realisation of physiological computing systems and formal verification of reconfigurations into Scrum, a modern and agile software development methodology. The surrounding methodological approach is intended to provide a frame for the systematic development of physiological computing systems from first psychological findings to a working software system with both satisfactory functionality and software quality aspects. By integrating practical and theoretical aspects of software engineering into a self-contained development methodology, this thesis proposes a roadmap and guidelines for the creation of new physiological computing applications.Physiologisches Rechnen ist ein interessantes und vielversprechendes Konzept zur Erweiterung des Kommunikationskanals zwischen (menschlichen) Nutzern und Rechnern, und dadurch die BerĂŒcksichtigung des Nutzerkontexts in Software-Systemen zu verbessern und damit Software-Systeme intelligenter zu gestalten, als sie es heute sind. Physiologische Eingangssignale in ubiquitĂ€ren Rechensystemen zu verwenden, ermöglicht eine Neujustierung der Informationsasymmetrie, die heute zwischen Menschen und Rechensystemen existiert: WĂ€hrend ubiquitĂ€re Rechensysteme sehr wohl in der Lage sind, den Menschen mit Informationen und sensorischen Reizen zu ĂŒberfluten (z.B. durch Töne, Licht und visuelle Animationen), hat der Mensch nur sehr begrenzte Einflussmöglichkeiten zu Rechensystemen. Meistens stehen nur Tastaturen, die Maus, berĂŒhrungsempfindliche Bildschirme, Beschleunigungsmesser und GPS-EmpfĂ€nger (zum Beispiel durch Mobiltelefone oder digitale Assistenten) zur VerfĂŒgung. Diese Informationsasymmetrie zwingt die Benutzer zur Unterwerfung unter die Usancen der Rechensysteme, um ihre Ziele zu erreichen - zum Beispiel mĂŒssen Nutzer Daten manuell eingeben, die auch aus Sensordaten des menschlichen Körpers auf unauffĂ€llige weise erhoben werden können. Physiologisches Rechnen ist eine Möglichkeit, diese BeschrĂ€nkung zu umgehen. Allerdings fehlt eine systematische Methodik fĂŒr die Entwicklung physiologischer Rechensysteme bis zu fertiger Software. Diese Dissertation prĂ€sentiert einen methodischen Ansatz zur Entwicklung physiologischer Rechenanwendungen, der auf der komponentenbasierten Softwareentwicklung aufbaut. Der komponentenbasierte Ansatz hilft im Allgemeinen dabei, eine klare Architektur des Software-Systems zu definieren, und kann deshalb auch fĂŒr physiologische Rechensysteme angewendet werden. Als zusĂ€tzlichen Vorteil erlaubt die Komponentenorientierung in physiologischen Rechensystemen, Rekonfigurationen als Mittel zur Kontrolle und Anpassung des Verhaltens von physiologischen Rechensystemen zu verwenden. Diese Adaptionstechnik kann genutzt werden um das Verhalten von physiologischen Rechensystemen an den Benutzer anzupassen, sowie an die verfĂŒgbare Recheninfrastruktur im Sinne von Systemressourcen und GerĂ€ten - eine Maßnahme, die in komplexen physiologischen Rechensystemen entscheidend ist. Mit Hilfe der Komponentenorientierung und von Rekonfigurationen wird es möglich, die FunktionalitĂ€t von physiologischen Rechensystemen so zu strukturieren, dass das System wartbar und erweiterbar bleibt. Dadurch wird eine schrittweise und systematische Erweiterung der FunktionalitĂ€t des Systems möglich. Die Verwendung von Rekonfigurationen birgt allerdings Probleme. Das Systemverhalten eines Software-Systems, das Rekonfigurationen unterworfen ist zu verstehen und vollstĂ€ndig einzufangen ist herausfordernd, da das System seine Struktur auf schwer vorhersehbare Weise verĂ€ndern kann. Aus diesem Grund fĂŒhrt diese Arbeit eine Methode zur formalen Verifikation von Rekonfigurationen auf Grundlage von Annahme-Zusicherungs-VertrĂ€gen ein. Mit dem vorgeschlagenen Annahme-Zusicherungs-Vertragssystem ist es möglich zu beweisen, dass ein gegebener Systementwurf (mitsamt Komponentenverhalten und Spezifikation des Rekonfigurationsverhaltens) eine als Annahme-Zusicherungs-Vertrag spezifizierte Echtzeiteigenschaft erfĂŒllt. FĂŒr die Spezifikation von Echtzeiteigenschaften kann eine Variante von linearer Temporallogik fĂŒr Echtzeit verwendet werden, die in dieser Arbeit eingefĂŒhrt wird: Die metrische Intervall-Temporallogik fĂŒr rekonfigurierbare Systeme. Schließlich wird in dieser Arbeit sowohl ein praktischer Ansatz zur Realisierung von physiologischen Rechensystemen als auch die formale Verifikation von Rekonfigurationen in Scrum eingebettet, einer modernen und agilen Softwareentwicklungsmethodik. Der methodische Ansatz bietet einen Rahmen fĂŒr die systematische Entwicklung physiologischer Rechensysteme von Erkenntnissen zur menschlichen Physiologie hin zu funktionierenden physiologischen Softwaresystemen mit zufriedenstellenden funktionalen und qualitativen Eigenschaften. Durch die Integration sowohl von praktischen wie auch theoretischen Aspekten der Softwaretechnik in eine vollstĂ€ndige Entwicklungsmethodik bietet diese Arbeit einen Fahrplan und Richtlinien fĂŒr die Erstellung neuer physiologischer Rechenanwendungen
    • 

    corecore