20 research outputs found

    Performance Evaluation of LINQ to HPC and Hadoop for Big Data

    Get PDF
    There is currently considerable enthusiasm around the MapReduce paradigm, and the distributed computing paradigm for analysis of large volumes of data. The Apache Hadoop is the most popular open source implementation of MapReduce model and LINQ to HPC is Microsoft\u27s alternative to open source Hadoop. In this thesis, the performance of LINQ to HPC and Hadoop are compared using different benchmarks. To this end, we identified four benchmarks (Grep, Word Count, Read and Write) that we have run on LINQ to HPC as well as on Hadoop. For each benchmark, we measured each system’s performance metrics (Execution Time, Average CPU utilization and Average Memory utilization) for various degrees of parallelism on clusters of different sizes. Results revealed some interesting trade-offs. For example, LINQ to HPC performed better on three out of the four benchmarks (Grep, Read and Write), whereas Hadoop performed better on the Word Count benchmark. While more research that is extensive has focused on Hadoop, there are not many references to similar research on the LINQ to HPC platform, which is slowly evolving during the writing of this thesis

    Towards Providing Hadoop Storage and Computing as Services

    Get PDF

    Reinventing the Social Scientist and Humanist in the Era of Big Data

    Get PDF
    This book explores the big data evolution by interrogating the notion that big data is a disruptive innovation that appears to be challenging existing epistemologies in the humanities and social sciences. Exploring various (controversial) facets of big data such as ethics, data power, and data justice, the book attempts to clarify the trajectory of the epistemology of (big) data-driven science in the humanities and social sciences

    Operating policies for energy efficient large scale computing

    Get PDF
    PhD ThesisEnergy costs now dominate IT infrastructure total cost of ownership, with datacentre operators predicted to spend more on energy than hardware infrastructure in the next five years. With Western European datacentre power consumption estimated at 56 TWh/year in 2007 and projected to double by 2020, improvements in energy efficiency of IT operations is imperative. The issue is further compounded by social and political factors and strict environmental legislation governing organisations. One such example of large IT systems includes high-throughput cycle stealing distributed systems such as HTCondor and BOINC, which allow organisations to leverage spare capacity on existing infrastructure to undertake valuable computation. As a consequence of increased scrutiny of the energy impact of these systems, aggressive power management policies are often employed to reduce the energy impact of institutional clusters, but in doing so these policies severely restrict the computational resources available for high-throughput systems. These policies are often configured to quickly transition servers and end-user cluster machines into low power states after only short idle periods, further compounding the issue of reliability. In this thesis, we evaluate operating policies for energy efficiency in large-scale computing environments by means of trace-driven discrete event simulation, leveraging real-world workload traces collected within Newcastle University. The major contributions of this thesis are as follows: i) Evaluation of novel energy efficient management policies for a decentralised peer-to-peer (P2P) BitTorrent environment. ii) Introduce a novel simulation environment for the evaluation of energy efficiency of large scale high-throughput computing systems, and propose a generalisable model of energy consumption in high-throughput computing systems. iii iii) Proposal and evaluation of resource allocation strategies for energy consumption in high-throughput computing systems for a real workload. iv) Proposal and evaluation for a realworkload ofmechanisms to reduce wasted task execution within high-throughput computing systems to reduce energy consumption. v) Evaluation of the impact of fault tolerance mechanisms on energy consumption

    e-Skills: The International dimension and the Impact of Globalisation - Final Report 2014

    Get PDF
    In today’s increasingly knowledge-based economies, new information and communication technologies are a key engine for growth fuelled by the innovative ideas of highly - skilled workers. However, obtaining adequate quantities of employees with the necessary e-skills is a challenge. This is a growing international problem with many countries having an insufficient numbers of workers with the right e-Skills. For example: Australia: “Even though there’s 10,000 jobs a year created in IT, there are only 4500 students studying IT at university, and not all of them graduate” (Talevski and Osman, 2013). Brazil: “Brazil’s ICT sector requires about 78,000 [new] people by 2014. But, according to Brasscom, there are only 33,000 youths studying ICT related courses in the country” (Ammachchi, 2012). Canada: “It is widely acknowledged that it is becoming inc reasingly difficult to recruit for a variety of critical ICT occupations –from entry level to seasoned” (Ticoll and Nordicity, 2012). Europe: It is estimated that there will be an e-skills gap within Europe of up to 900,000 (main forecast scenario) ICT pr actitioners by 2020” (Empirica, 2014). Japan: It is reported that 80% of IT and user companies report an e-skills shortage (IPA, IT HR White Paper, 2013) United States: “Unlike the fiscal cliff where we are still peering over the edge, we careened over the “IT Skills Cliff” some years ago as our economy digitalized, mobilized and further “technologized”, and our IT skilled labour supply failed to keep up” (Miano, 2013)

    Cybersecurity issues in software architectures for innovative services

    Get PDF
    The recent advances in data center development have been at the basis of the widespread success of the cloud computing paradigm, which is at the basis of models for software based applications and services, which is the "Everything as a Service" (XaaS) model. According to the XaaS model, service of any kind are deployed on demand as cloud based applications, with a great degree of flexibility and a limited need for investments in dedicated hardware and or software components. This approach opens up a lot of opportunities, for instance providing access to complex and widely distributed applications, whose cost and complexity represented in the past a significant entry barrier, also to small or emerging businesses. Unfortunately, networking is now embedded in every service and application, raising several cybersecurity issues related to corruption and leakage of data, unauthorized access, etc. However, new service-oriented architectures are emerging in this context, the so-called services enabler architecture. The aim of these architectures is not only to expose and give the resources to these types of services, but it is also to validate them. The validation includes numerous aspects, from the legal to the infrastructural ones e.g., but above all the cybersecurity threats. A solid threat analysis of the aforementioned architecture is therefore necessary, and this is the main goal of this thesis. This work investigate the security threats of the emerging service enabler architectures, providing proof of concepts for these issues and the solutions too, based on several use-cases implemented in real world scenarios

    Unleashing Innovation and Entrepreneurship in Europe: People, Places and Policies. Report of a CEPS Task Force February 2017

    Get PDF
    This report sets out the elements for the design of a streamlined and future-proof policy on innovation and entrepreneurship in Europe. It is the result of a collective effort led by CEPS, which formed a Task Force on Innovation and Entrepreneurship in the EU, composed of authoritative scholars, industry experts, entrepreneurs, practitioners and representatives of EU and international institutions. The result of these deliberations is a set of policy recommendations aimed at improving the overall environment and approach for entrepreneurship and innovation in Europe and a new paradigmatic understanding of the role that innovation and entrepreneurship can and should play within the overall context of EU policy. These recommendations are based on a new, multi-dimensional approach to both innovation and entrepreneurship as social phenomena and to the policies that are meant to promote them

    DRIVE: A Distributed Economic Meta-Scheduler for the Federation of Grid and Cloud Systems

    No full text
    The computational landscape is littered with islands of disjoint resource providers including commercial Clouds, private Clouds, national Grids, institutional Grids, clusters, and data centers. These providers are independent and isolated due to a lack of communication and coordination, they are also often proprietary without standardised interfaces, protocols, or execution environments. The lack of standardisation and global transparency has the effect of binding consumers to individual providers. With the increasing ubiquity of computation providers there is an opportunity to create federated architectures that span both Grid and Cloud computing providers effectively creating a global computing infrastructure. In order to realise this vision, secure and scalable mechanisms to coordinate resource access are required. This thesis proposes a generic meta-scheduling architecture to facilitate federated resource allocation in which users can provision resources from a range of heterogeneous (service) providers. Efficient resource allocation is difficult in large scale distributed environments due to the inherent lack of centralised control. In a Grid model, local resource managers govern access to a pool of resources within a single administrative domain but have only a local view of the Grid and are unable to collaborate when allocating jobs. Meta-schedulers act at a higher level able to submit jobs to multiple resource managers, however they are most often deployed on a per-client basis and are therefore concerned with only their allocations, essentially competing against one another. In a federated environment the widespread adoption of utility computing models seen in commercial Cloud providers has re-motivated the need for economically aware meta-schedulers. Economies provide a way to represent the different goals and strategies that exist in a competitive distributed environment. The use of economic allocation principles effectively creates an open service market that provides efficient allocation and incentives for participation. The major contributions of this thesis are the architecture and prototype implementation of the DRIVE meta-scheduler. DRIVE is a Virtual Organisation (VO) based distributed economic metascheduler in which members of the VO collaboratively allocate services or resources. Providers joining the VO contribute obligation services to the VO. These contributed services are in effect membership “dues” and are used in the running of the VOs operations – for example allocation, advertising, and general management. DRIVE is independent from a particular class of provider (Service, Grid, or Cloud) or specific economic protocol. This independence enables allocation in federated environments composed of heterogeneous providers in vastly different scenarios. Protocol independence facilitates the use of arbitrary protocols based on specific requirements and infrastructural availability. For instance, within a single organisation where internal trust exists, users can achieve maximum allocation performance by choosing a simple economic protocol. In a global utility Grid no such trust exists. The same meta-scheduler architecture can be used with a secure protocol which ensures the allocation is carried out fairly in the absence of trust. DRIVE establishes contracts between participants as the result of allocation. A contract describes individual requirements and obligations of each party. A unique two stage contract negotiation protocol is used to minimise the effect of allocation latency. In addition due to the co-op nature of the architecture and the use of secure privacy preserving protocols, DRIVE can be deployed in a distributed environment without requiring large scale dedicated resources. This thesis presents several other contributions related to meta-scheduling and open service markets. To overcome the perceived performance limitations of economic systems four high utilisation strategies have been developed and evaluated. Each strategy is shown to improve occupancy, utilisation and profit using synthetic workloads based on a production Grid trace. The gRAVI service wrapping toolkit is presented to address the difficulty web enabling existing applications. The gRAVI toolkit has been extended for this thesis such that it creates economically aware (DRIVE-enabled) services that can be transparently traded in a DRIVE market without requiring developer input. The final contribution of this thesis is the definition and architecture of a Social Cloud – a dynamic Cloud computing infrastructure composed of virtualised resources contributed by members of a Social network. The Social Cloud prototype is based on DRIVE and highlights the ease in which dynamic DRIVE markets can be created and used in different domains
    corecore