20 research outputs found
Performance Evaluation of LINQ to HPC and Hadoop for Big Data
There is currently considerable enthusiasm around the MapReduce paradigm, and the distributed computing paradigm for analysis of large volumes of data. The Apache Hadoop is the most popular open source implementation of MapReduce model and LINQ to HPC is Microsoft\u27s alternative to open source Hadoop. In this thesis, the performance of LINQ to HPC and Hadoop are compared using different benchmarks.
To this end, we identified four benchmarks (Grep, Word Count, Read and Write) that we have run on LINQ to HPC as well as on Hadoop. For each benchmark, we measured each systemâs performance metrics (Execution Time, Average CPU utilization and Average Memory utilization) for various degrees of parallelism on clusters of different sizes. Results revealed some interesting trade-offs. For example, LINQ to HPC performed better on three out of the four benchmarks (Grep, Read and Write), whereas Hadoop performed better on the Word Count benchmark. While more research that is extensive has focused on Hadoop, there are not many references to similar research on the LINQ to HPC platform, which is slowly evolving during the writing of this thesis
Reinventing the Social Scientist and Humanist in the Era of Big Data
This book explores the big data evolution by interrogating the notion that big data is a disruptive innovation that appears to be challenging existing epistemologies in the humanities and social sciences. Exploring various (controversial) facets of big data such as ethics, data power, and data justice, the book attempts to clarify the trajectory of the epistemology of (big) data-driven science in the humanities and social sciences
Operating policies for energy efficient large scale computing
PhD ThesisEnergy costs now dominate IT infrastructure total cost of ownership, with datacentre
operators predicted to spend more on energy than hardware infrastructure in the
next five years. With Western European datacentre power consumption estimated at
56 TWh/year in 2007 and projected to double by 2020, improvements in energy efficiency
of IT operations is imperative. The issue is further compounded by social and
political factors and strict environmental legislation governing organisations.
One such example of large IT systems includes high-throughput cycle stealing distributed
systems such as HTCondor and BOINC, which allow organisations to leverage
spare capacity on existing infrastructure to undertake valuable computation.
As a consequence of increased scrutiny of the energy impact of these systems, aggressive
power management policies are often employed to reduce the energy impact
of institutional clusters, but in doing so these policies severely restrict the computational
resources available for high-throughput systems. These policies are often configured
to quickly transition servers and end-user cluster machines into low power
states after only short idle periods, further compounding the issue of reliability.
In this thesis, we evaluate operating policies for energy efficiency in large-scale
computing environments by means of trace-driven discrete event simulation, leveraging
real-world workload traces collected within Newcastle University.
The major contributions of this thesis are as follows:
i) Evaluation of novel energy efficient management policies for a decentralised
peer-to-peer (P2P) BitTorrent environment.
ii) Introduce a novel simulation environment for the evaluation of energy efficiency
of large scale high-throughput computing systems, and propose a generalisable
model of energy consumption in high-throughput computing systems.
iii
iii) Proposal and evaluation of resource allocation strategies for energy consumption
in high-throughput computing systems for a real workload.
iv) Proposal and evaluation for a realworkload ofmechanisms to reduce wasted task
execution within high-throughput computing systems to reduce energy consumption.
v) Evaluation of the impact of fault tolerance mechanisms on energy consumption
e-Skills: The International dimension and the Impact of Globalisation - Final Report 2014
In todayâs increasingly knowledge-based economies, new information and communication technologies are a key engine for growth fuelled by the innovative ideas of highly - skilled workers. However, obtaining adequate quantities of employees
with the necessary e-skills is a challenge. This is a growing
international problem with many countries having an insufficient numbers of workers with the right e-Skills.
For example:
Australia: âEven though thereâs 10,000 jobs a year created in IT, there are only 4500 students studying IT at university, and not all of them graduateâ (Talevski and Osman, 2013).
Brazil: âBrazilâs ICT sector requires about 78,000 [new] people by 2014. But, according to Brasscom, there are only 33,000 youths studying ICT related courses in the countryâ (Ammachchi, 2012).
Canada: âIt is widely acknowledged that it is becoming inc
reasingly difficult to recruit for a variety of critical ICT occupations
âfrom entry level to seasonedâ (Ticoll and Nordicity, 2012).
Europe: It is estimated that there will be an e-skills gap within Europe of up to 900,000 (main forecast scenario) ICT pr
actitioners by 2020â (Empirica, 2014).
Japan: It is reported that 80% of IT and user companies report an e-skills shortage (IPA, IT HR White Paper, 2013)
United States: âUnlike the fiscal cliff where we are still peering over the edge, we careened over the âIT Skills Cliffâ some years ago as our economy digitalized, mobilized and further âtechnologizedâ, and our IT skilled labour supply failed to keep upâ (Miano, 2013)
Cybersecurity issues in software architectures for innovative services
The recent advances in data center development have been at the basis of the widespread
success of the cloud computing paradigm, which is at the basis of models for software based applications and services, which is the "Everything as a Service" (XaaS) model. According to the XaaS model, service of any kind are deployed on demand
as cloud based applications, with a great degree of flexibility and a limited need for investments in dedicated hardware and or software components. This approach opens up a lot of opportunities, for instance providing access to complex and widely
distributed applications, whose cost and complexity represented in the past a significant entry barrier, also to small or emerging businesses. Unfortunately, networking is now embedded in every service and application, raising several cybersecurity issues related to corruption and leakage of data, unauthorized access, etc. However, new service-oriented architectures are emerging in this context, the so-called services enabler architecture. The aim of these architectures is not only to expose and give the resources to these types of services, but it is also to validate them. The validation includes numerous aspects, from the legal to the infrastructural ones e.g., but above all the cybersecurity threats. A solid threat analysis of the aforementioned architecture is therefore necessary, and this is the main goal of this thesis. This work investigate the security threats of the emerging service enabler architectures, providing proof of concepts for these issues and the solutions too, based on several use-cases implemented in real world scenarios
Unleashing Innovation and Entrepreneurship in Europe: People, Places and Policies. Report of a CEPS Task Force February 2017
This report sets out the elements for the design of a streamlined and future-proof policy on innovation and entrepreneurship in Europe. It is the result of a collective effort led by CEPS, which formed a Task Force on Innovation and Entrepreneurship in the EU, composed of authoritative scholars, industry experts, entrepreneurs, practitioners and representatives of EU and international institutions. The result of these deliberations is a set of policy recommendations aimed at improving the overall environment and approach for entrepreneurship and innovation in Europe and a new paradigmatic understanding of the role that innovation and entrepreneurship can and should play within the overall context of EU policy. These recommendations are based on a new, multi-dimensional approach to both innovation and entrepreneurship as social phenomena and to the policies that are meant to promote them
DRIVE: A Distributed Economic Meta-Scheduler for the Federation of Grid and Cloud Systems
The computational landscape is littered with islands of disjoint resource providers including
commercial Clouds, private Clouds, national Grids, institutional Grids, clusters, and data centers.
These providers are independent and isolated due to a lack of communication and coordination,
they are also often proprietary without standardised interfaces, protocols, or execution environments.
The lack of standardisation and global transparency has the effect of binding consumers
to individual providers. With the increasing ubiquity of computation providers there is an opportunity
to create federated architectures that span both Grid and Cloud computing providers
effectively creating a global computing infrastructure. In order to realise this vision, secure and
scalable mechanisms to coordinate resource access are required. This thesis proposes a generic
meta-scheduling architecture to facilitate federated resource allocation in which users can provision
resources from a range of heterogeneous (service) providers.
Efficient resource allocation is difficult in large scale distributed environments due to the inherent
lack of centralised control. In a Grid model, local resource managers govern access to a
pool of resources within a single administrative domain but have only a local view of the Grid
and are unable to collaborate when allocating jobs. Meta-schedulers act at a higher level able to
submit jobs to multiple resource managers, however they are most often deployed on a per-client
basis and are therefore concerned with only their allocations, essentially competing against one
another. In a federated environment the widespread adoption of utility computing models seen in
commercial Cloud providers has re-motivated the need for economically aware meta-schedulers.
Economies provide a way to represent the different goals and strategies that exist in a competitive
distributed environment. The use of economic allocation principles effectively creates an
open service market that provides efficient allocation and incentives for participation.
The major contributions of this thesis are the architecture and prototype implementation of the
DRIVE meta-scheduler. DRIVE is a Virtual Organisation (VO) based distributed economic metascheduler
in which members of the VO collaboratively allocate services or resources. Providers
joining the VO contribute obligation services to the VO. These contributed services are in effect
membership âduesâ and are used in the running of the VOs operations â for example allocation,
advertising, and general management. DRIVE is independent from a particular class of provider
(Service, Grid, or Cloud) or specific economic protocol. This independence enables allocation in
federated environments composed of heterogeneous providers in vastly different scenarios. Protocol
independence facilitates the use of arbitrary protocols based on specific requirements and
infrastructural availability. For instance, within a single organisation where internal trust exists,
users can achieve maximum allocation performance by choosing a simple economic protocol.
In a global utility Grid no such trust exists. The same meta-scheduler architecture can be used
with a secure protocol which ensures the allocation is carried out fairly in the absence of trust.
DRIVE establishes contracts between participants as the result of allocation. A contract describes
individual requirements and obligations of each party. A unique two stage contract negotiation
protocol is used to minimise the effect of allocation latency. In addition due to the co-op nature of
the architecture and the use of secure privacy preserving protocols, DRIVE can be deployed in a
distributed environment without requiring large scale dedicated resources.
This thesis presents several other contributions related to meta-scheduling and open service
markets. To overcome the perceived performance limitations of economic systems four high utilisation
strategies have been developed and evaluated. Each strategy is shown to improve occupancy,
utilisation and profit using synthetic workloads based on a production Grid trace. The
gRAVI service wrapping toolkit is presented to address the difficulty web enabling existing applications.
The gRAVI toolkit has been extended for this thesis such that it creates economically
aware (DRIVE-enabled) services that can be transparently traded in a DRIVE market without requiring
developer input. The final contribution of this thesis is the definition and architecture of
a Social Cloud â a dynamic Cloud computing infrastructure composed of virtualised resources
contributed by members of a Social network. The Social Cloud prototype is based on DRIVE
and highlights the ease in which dynamic DRIVE markets can be created and used in different
domains