18,353 research outputs found

    A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures

    Full text link
    Scientific problems that depend on processing large amounts of data require overcoming challenges in multiple areas: managing large-scale data distribution, co-placement and scheduling of data with compute resources, and storing and transferring large volumes of data. We analyze the ecosystems of the two prominent paradigms for data-intensive applications, hereafter referred to as the high-performance computing and the Apache-Hadoop paradigm. We propose a basis, common terminology and functional factors upon which to analyze the two approaches of both paradigms. We discuss the concept of "Big Data Ogres" and their facets as means of understanding and characterizing the most common application workloads found across the two paradigms. We then discuss the salient features of the two paradigms, and compare and contrast the two approaches. Specifically, we examine common implementation/approaches of these paradigms, shed light upon the reasons for their current "architecture" and discuss some typical workloads that utilize them. In spite of the significant software distinctions, we believe there is architectural similarity. We discuss the potential integration of different implementations, across the different levels and components. Our comparison progresses from a fully qualitative examination of the two paradigms, to a semi-quantitative methodology. We use a simple and broadly used Ogre (K-means clustering), characterize its performance on a range of representative platforms, covering several implementations from both paradigms. Our experiments provide an insight into the relative strengths of the two paradigms. We propose that the set of Ogres will serve as a benchmark to evaluate the two paradigms along different dimensions.Comment: 8 pages, 2 figure

    An Experiment in Interoperable Cryptographic Protocol Implementation Using Automatic Code Generation

    Get PDF
    Spi2Java is a tool that enables semi-automatic generation of cryptographic protocol implementations, starting from verified formal models. This paper shows how the last version of spi2Java has been enhanced in order to enable interoperability of the generated implementations. The new features that have been added to spi2Java are reported here. A case study on the SSH transport layer protocol, along with some experiments and measures on the generated code, is also provided. The case study shows, with facts, that reliable and interoperable implementations of standard security protocols can indeed be obtained by using a code generation tool like spi2Jav

    Formally based semi-automatic implementation of an open security protocol

    Get PDF
    International audienceThis paper presents an experiment in which an implementation of the client side of the SSH Transport Layer Protocol (SSH-TLP) was semi-automatically derived according to a model-driven development paradigm that leverages formal methods in order to obtain high correctness assurance. The approach used in the experiment starts with the formalization of the protocol at an abstract level. This model is then formally proved to fulfill the desired secrecy and authentication properties by using the ProVerif prover. Finally, a sound Java implementation is semi-automatically derived from the verified model using an enhanced version of the Spi2Java framework. The resulting implementation correctly interoperates with third party servers, and its execution time is comparable with that of other manually developed Java SSH-TLP client implementations. This case study demonstrates that the adopted model-driven approach is viable even for a real security protocol, despite the complexity of the models needed in order to achieve an interoperable implementation

    Semantic query languages for knowledge-based web services in a construction context

    Get PDF
    Since the early 2000s, different frameworks were set up to enable web-based collaboration in building projects. Unfortunately, none of these initiatives was granted a long life. Recently, however, the use of web technologies in the building industry has been gaining momentum again, considered some promising technologies for reaching a more interoperable BIM practice. Specifically, this relates to (1) Linked Data and Semantic Web technologies, and (2) cloud-based applications. In order to combine these into a network of interlinked applications and datastores, an agreed-upon mechanism for automatic communication and data retrieval needs to be used. Apart from the W3C standard SPARQL, often considered too high a threshold for developers to implement, there are some recent GraphQL-based solutions that simplify the querying process and its implementation into web services. In this paper, we review two recent open source technologies based on GraphQL, that enable to query Linked Data on the web: GraphQL-LD and HyperGraphQL

    A look at cloud architecture interoperability through standards

    Get PDF
    Enabling cloud infrastructures to evolve into a transparent platform while preserving integrity raises interoperability issues. How components are connected needs to be addressed. Interoperability requires standard data models and communication encoding technologies compatible with the existing Internet infrastructure. To reduce vendor lock-in situations, cloud computing must implement universal strategies regarding standards, interoperability and portability. Open standards are of critical importance and need to be embedded into interoperability solutions. Interoperability is determined at the data level as well as the service level. Corresponding modelling standards and integration solutions shall be analysed

    Enhanced Search for Educational Resources - A Perspective and a Prototype from ccLearn

    Get PDF
    Users of search tools who seek educational materials on the Internet are typically presented with either a web-scale search (e.g., Google or Yahoo) or a specialized, site-specific tool. The specialized search tools often rely upon custom data fields, such as user-entered ratings, to provide additional value. As currently designed, these systems are generally too labor intensive to manage and scale up beyond a single site or set of resources.However, custom (or structured) data of some form is necessary if search outcomes foreducational materials are to be improved. For example, design criteria and evaluative metrics are crucial attributes for educational resources, and these currently require human labeling and verification. Thus, one challenge is to design a search tool that capitalizes on available structured data (also called metadata) but is not crippled if the data are missing. This information should be amenable to repurposing by anyone, which means that it must be archived in a manner that can be discovered and leveraged easily.In this paper, we describe the extent to which DiscoverEd, a prototype developed by ccLearn, meets the design challenge of a scalable, enhanced search platform for educational resources. We then explore some of the key challenges regarding enhanced search for topic-specific Internet resources generally. We conclude by illustrating some possible future developments and third-party enhancements to the DiscoverEd prototype
    • ā€¦
    corecore