Search CORE

422 research outputs found

Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks

Author: Ananthanarayanan G.
Ananthanarayanan G.
Baker J.
Bent J.
Dean J.
Elnozahy E.
Gunda P. K.
Guo P. J.
Nightingale E. B.
Plank J. S.
Power R.
Weil S. A.
Yu Y.
Zaharia M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Tachyon is a distributed file system enabling reliable data sharing at memory speed across cluster computing frameworks. While caching today improves read workloads, writes are either network or disk bound, as replication is used for fault-tolerance. Tachyon eliminates this bottleneck by pushing lineage, a well-known technique, into the storage layer. The key challenge in making a long-running lineage-based storage system is timely data recovery in case of failures. Tachyon addresses this issue by introducing a checkpointing algorithm that guarantees bounded recovery cost and resource allocation strategies for recomputation under commonly used resource schedulers. Our evaluation shows that Tachyon outperforms in-memory HDFS by 110x for writes. It also improves the end-to-end latency of a realistic workflow by 4x. Tachyon is open source and is deployed at multiple companies.National Science Foundation (U.S.) (CISE Expeditions Award CCF-1139158)Lawrence Berkeley National Laboratory (Award 7076018)United States. Defense Advanced Research Projects Agency (XData Award FA8750-12-2-0331

CiteSeerX

Incremental elasticity for array databases

Author: Ang K. H.
de Witt S.
Ganesan P.
P.
Stonebraker M.
Stonebraker M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2014
Field of study

Relational databases benefit significantly from elasticity, whereby they execute on a set of changing hardware resources provisioned to match their storage and processing requirements. Such flexibility is especially attractive for scientific databases because their users often have a no-overwrite storage model, in which they delete data only when their available space is exhausted. This results in a database that is regularly growing and expanding its hardware proportionally. Also, scientific databases frequently store their data as multidimensional arrays optimized for spatial querying. This brings about several novel challenges in clustered, skew-aware data placement on an elastic shared-nothing database. In this work, we design and implement elasticity for an array database. We address this challenge on two fronts: determining when to expand a database cluster and how to partition the data within it. In both steps we propose incremental approaches, affecting a minimum set of data and nodes, while maintaining high performance. We introduce an algorithm for gradually augmenting an array database's hardware using a closed-loop control system. After the cluster adds nodes, we optimize data placement for n-dimensional arrays. Many of our elastic partitioners incrementally reorganize an array, redistributing data only to new nodes. By combining these two tools, the scientific database efficiently and seamlessly manages its monotonically increasing hardware resources.Intel Corporation (Science and Technology Center for Big Data

CiteSeerX

Workflow Provenance: from Modeling to Reporting

Author: Ferdous Rayhan 1992-
Publication venue: 'University of Saskatchewan Library'
Publication date: 12/03/2019
Field of study

Workflow provenance is a crucial part of a workflow system as it enables data lineage analysis, error tracking, workflow monitoring, usage pattern discovery, and so on. Integrating provenance into a workflow system or modifying a workflow system to capture or analyze different provenance information is burdensome, requiring extensive development because provenance mechanisms rely heavily on the modelling, architecture, and design of the workflow system. Various tools and technologies exist for logging events in a software system. Unfortunately, logging tools and technologies are not designed for capturing and analyzing provenance information. Workflow provenance is not only about logging, but also about retrieving workflow related information from logs. In this work, we propose a taxonomy of provenance questions and guided by these questions, we created a workflow programming model 'ProvMod' with a supporting run-time library to provide automated provenance and log analysis for any workflow system. The design and provenance mechanism of ProvMod is based on recommendations from prominent research and is easy to integrate into any workflow system. ProvMod offers Neo4j graph database support to manage semi-structured heterogeneous JSON logs. The log structure is adaptable to any NoSQL technology. For each provenance question in our taxonomy, ProvMod provides the answer with data visualization using Neo4j and the ELK Stack. Besides analyzing performance from various angles, we demonstrate the ease of integration by integrating ProvMod with Apache Taverna and evaluate ProvMod usability by engaging users. Finally, we present two Software Engineering research cases (clone detection and architecture extraction) where our proposed model ProvMod and provenance questions taxonomy can be applied to discover meaningful insights

eCommons@USASK

University of Saskatchewan Research Archive

Mobile computing algorithms and systems for user-aware optimization of enterprise applications

Author: Moravapalle Uma Parthavi
Publication venue: Georgia Institute of Technology
Publication date: 29/05/2019
Field of study

The adoption of mobile devices, particularly smartphones, has grown steadily over the last decade, also permeating the enterprise sector. Enterprises are investing heavily in mobilization to improve employee productivity and perform business workflows, including smartphones and tablets. Enterprise mobility is expected to be more than a $250 billion market in 2019. Strategies to achieve mobilization range from building native apps, using mobile enterprise application platforms (MEAPS), developing with a mobile backend as a service (mBaaS), relying on application virtualization, and employing application refactoring. Enterprises are not yet experiencing the many benefits of mobilization, even though there is great promise. Email and browsing are used heavily, but the practical adoption of enterprise mobility to deliver value beyond these applications is in its infancy and faces barriers. Enterprises deploy few business workflows (<5 percent). Barriers include the heavy task burden in executing workflows on mobile devices, the irrelevance of available mobile features, non-availability of necessary business functions, the high cost of network access, increased security risks associated with smartphones, and increased complexity of mobile application development. This dissertation identifies key barriers to user productivity on smartphones and investigates user-aware solutions that leverage redundancies in user behavior to reduce burden, focusing on the following mobility aspects: (1) Workflow Mobilization: For an employee to successfully perform workflows on a smartphone, a mobile app must be available, and the specific workflow must survive the defeaturization process necessary for mobilization. While typical mobilization strategies offer mobile access to a few heavily-used features, there is a long-tail problem for enterprise application mobilization, in that many application features are left unsupported or are too difficult to access. We propose a do-it-yourself (DIY) platform, Taskr, that allows users at all skill levels to mobilize workflows. Taskr uses remote computing with application refactoring to achieve code-less mobilization of enterprise web applications. It allows for flexible mobile delivery so that users can execute spot tasks through Twitter, email, or a native mobile app. Taskr prototypes from 15 enterprise applications reduce the number of user actions performing workflows by 40 percent compared to the desktop; (2) Content sharing (enterprise email): An enterprise employee spends an inordinate amount of time on email responding to queries and sharing information with co-workers. This problem is further aggravated on smartphones due to smaller screen real estate. We consider automated information suggestions to ease the burden of reply construction on smartphones. The premise is that a significant portion of the information content in a reply is likely present in prior emails. We first motivate this premise by analyzing both public and private email datasets. We then present Dejavu, a system that relies on inverse document frequency (IDF) and keyword matching to provide relevant suggestions for responses. Evaluation of Dejavu over email datasets shows a 22 percent reduction in the user’s typing burden; (3) Collaboration: Even though many business processes within enterprises require employees to work as a team and collaborate, few mobile apps allow two employees to work on an object from two separate devices simultaneously. We present Peek, a mobile-to-mobile remote computing protocol for collaboration that lets users remotely interact with an application in a responsive manner. Unlike traditional desktop remote computing protocols, Peek provides multi-touch support for ease of operation and a flexible frame compression scheme that accounts for the resource constraints of a smartphone. An Android prototype of Peek shows a 62 percent reduction in time to perform touchscreen actions.Ph.D