Search CORE

2,773 research outputs found

Autonomous grid scheduling using probabilistic job runtime scheduling

Author: Lazarević A.
Publication venue: 'Queen Mary University of London'
Publication date: 01/01/2008
Field of study

Computational Grids are evolving into a global, service-oriented architecture – a universal platform for delivering future computational services to a range of applications of varying complexity and resource requirements. The thesis focuses on developing a new scheduling model for general-purpose, utility clusters based on the concept of user requested job completion deadlines. In such a system, a user would be able to request each job to finish by a certain deadline, and possibly to a certain monetary cost. Implementing deadline scheduling is dependent on the ability to predict the execution time of each queued job, and on an adaptive scheduling algorithm able to use those predictions to maximise deadline adherence. The thesis proposes novel solutions to these two problems and documents their implementation in a largely autonomous and self-managing way. The starting point of the work is an extensive analysis of a representative Grid workload revealing consistent workflow patterns, usage cycles and correlations between the execution times of jobs and its properties commonly collected by the Grid middleware for accounting purposes. An automated approach is proposed to identify these dependencies and use them to partition the highly variable workload into subsets of more consistent and predictable behaviour. A range of time-series forecasting models, applied in this context for the first time, were used to model the job execution times as a function of their historical behaviour and associated properties. Based on the resulting predictions of job runtimes a novel scheduling algorithm is able to estimate the latest job start time necessary to meet the requested deadline and sort the queue accordingly to minimise the amount of deadline overrun. The testing of the proposed approach was done using the actual job trace collected from a production Grid facility. The best performing execution time predictor (the auto-regressive moving average method) coupled to workload partitioning based on three simultaneous job properties returned the median absolute percentage error centroid of only 4.75%. This level of prediction accuracy enabled the proposed deadline scheduling method to reduce the average deadline overrun time ten-fold compared to the benchmark batch scheduler. Overall, the thesis demonstrates that deadline scheduling of computational jobs on the Grid is achievable using statistical forecasting of job execution times based on historical information. The proposed approach is easily implementable, substantially self-managing and better matched to the human workflow making it well suited for implementation in the utility Grids of the future

UCL Discovery

OpenGrey Repository

Robustness-Driven Resilience Evaluation of Self-Adaptive Software Systems

Author: Cámara Javier
de Lemos Rogerio
Laranjeiro Nuno
Ventura Rafael
Vieira Marco
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

An increasingly important requirement for certain classes of software-intensive systems is the ability to self-adapt their structure and behavior at run-time when reacting to changes that may occur to the system, its environment, or its goals. A major challenge related to self-adaptive software systems is the ability to provide assurances of their resilience when facing changes. Since in these systems, the components that act as controllers of a target system incorporate highly complex software, there is the need to analyze the impact that controller failures might have on the services delivered by the system. In this paper, we present a novel approach for evaluating the resilience of self-adaptive software systems by applying robustness testing techniques to the controller to uncover failures that can affect system resilience. The approach for evaluating resilience, which is based on probabilistic model checking, quantifies the probability of satisfaction of system properties when the target system is subject to controller failures. The feasibility of the proposed approach is evaluated in the context of an industrial middleware system used to monitor and manage highly populated networks of devices, which was implemented using the Rainbow framework for architecture-based self-adaptation

Crossref

Kent Academic Repository

Coding and inferences from visual and other behavioural data

Author: Andersen Henning Boje
Blechko A.
Hauland G.
Hilburn B.G.
Ober J.
Sønderstrup-Andersen Hans Henrik Krogh
Zon R.
Publication venue: Danmarks Tekniske Universitet (DTU)
Publication date: 01/01/2004
Field of study

Online Research Database In Technology

Autonomous grid scheduling using probabilistic job runtime forecasting.

Author: Lazarevic A.
Publication venue: 'Queen Mary University of London'
Publication date: 01/01/2008
Field of study

Computational Grids are evolving into a global, service-oriented architecture a universal platform for delivering future computational services to a range of applications of varying complexity and resource requirements. The thesis focuses on developing a new scheduling model for general-purpose, utility clusters based on the concept of user requested job completion deadlines. In such a system, a user would be able to request each job to finish by a certain deadline. and possibly to a certain monetary cost. Implementing deadline scheduling is dependent on the ability to predict the execution time of each queued job. and on an adaptive scheduling algorithm able to use those predictions to maximise deadline adherence. The thesis proposes novel solutions to these two problems and documents their implementation in a largely autonomous and self-managing way. The starting point of the work is an extensive analysis of a representative Grid workload revealing consistent workflow patterns, usage cycles and correlations between the execution times of jobs and its properties commonly collected by the Grid middleware for accounting purposes. An automated approach is proposed to identify these dependencies and use them to partition the highly variable workload into subsets of more consistent and predictable behaviour. A range of time-series forecasting models, applied in this context for the first time, were used to model the job execution times as a function of their historical behaviour and associated properties. Based on the resulting predictions of job runtimes a novel scheduling algorithm is able to estimate the latest job start time necessary to meet the requested deadline and sort the queue accordingly to minimise the amount of deadline overrun. The testing of the proposed approach was done using the actual job trace collected from a production Grid facility. The best performing execution time predictor (the auto-regressive moving average method) coupled to workload partitioning based on three simultaneous job properties returned the median absolute percentage error eentroid of only 4.75CX. This level of prediction accuracy enabled the proposed deadline scheduling method to reduce the average deadline overrun time ten-fold compared to the benchmark batch scheduler. Overall, the thesis demonstrates that deadline scheduling of computational jobs on the Grid is achievable using statistical forecasting of job execution times based on historical information. The proposed approach is easily implementable, substantially self-managing and better matched to the human workflow making it well suited for implementation in the utility Grids of the future

UCL Discovery

Recommended from our members

Pattern recognition systems design on parallel GPU architectures for breast lesions characterisation employing multimodality images

Author: Sidiropoulos Konstantinos
Publication venue
Publication date: 01/01/2014
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University London.The aim of this research was to address the computational complexity in designing multimodality Computer-Aided Diagnosis (CAD) systems for characterising breast lesions, by harnessing the general purpose computational potential of consumer-level Graphics Processing Units (GPUs) through parallel programming methods. The complexity in designing such systems lies on the increased dimensionality of the problem, due to the multiple imaging modalities involved, on the inherent complexity of optimal design methods for securing high precision, and on assessing the performance of the design prior to deployment in a clinical environment, employing unbiased system evaluation methods. For the purposes of this research, a Pattern Recognition (PR)-system was designed to provide highest possible precision by programming in parallel the multiprocessors of the NVIDIA’s GPU-cards, GeForce 8800GT or 580GTX, and using the CUDA programming framework and C++. The PR-system was built around the Probabilistic Neural Network classifier and its performance was evaluated by a re-substitution method, for estimating the system’s highest accuracy, and by the external cross validation method, for assessing the PR-system’s unbiased accuracy to new, “unseen” by the system, data. Data comprised images of patients with histologically verified (benign or malignant) breast lesions, who underwent both ultrasound (US) and digital mammography (DM). Lesions were outlined on the images by an experienced radiologist, and textural features were calculated. Regarding breast lesion classification, the accuracies for discriminating malignant from benign lesions were, 85.5% using US-features alone, 82.3% employing DM-features alone, and 93.5% combining US and DM features. Mean accuracy to new “unseen” data for the combined US and DM features was 81%. Those classification accuracies were about 10% higher than accuracies achieved on a single CPU, using sequential programming methods, and 150-fold faster. In addition, benign lesions were found smoother, more homogeneous, and containing larger structures. Additionally, the PR-system design was adapted for tackling other medical problems, as a proof of its generalisation. These included classification of rare brain tumours, (achieving 78.6% for overall accuracy (OA) and 73.8% for estimated generalisation accuracy (GA), and accelerating system design 267 times), discrimination of patients with micro-ischemic and multiple sclerosis lesions (90.2% OA and 80% GA with 32-fold design acceleration), classification of normal and pathological knee cartilages (93.2% OA and 89% GA with 257-fold design acceleration), and separation of low from high grade laryngeal cancer cases (93.2% OA and 89% GA, with 130-fold design acceleration). The proposed PR-system improves breast-lesion discrimination accuracy, it may be redesigned on site when new verified data are incorporated in its depository, and it may serve as a second opinion tool in a clinical environment

Brunel University Research Archive

A prescriptive analytics approach for energy efficiency in datacentres.

Author: Panneerselvam John
Publication venue: University of Derby
Publication date: 01/01/2018
Field of study

Given the evolution of Cloud Computing in recent years, users and clients adopting Cloud Computing for both personal and business needs have increased at an unprecedented scale. This has naturally led to the increased deployments and implementations of Cloud datacentres across the globe. As a consequence of this increasing adoption of Cloud Computing, Cloud datacentres are witnessed to be massive energy consumers and environmental polluters. Whilst the energy implications of Cloud datacentres are being addressed from various research perspectives, predicting the future trend and behaviours of workloads at the datacentres thereby reducing the active server resources is one particular dimension of green computing gaining the interests of researchers and Cloud providers. However, this includes various practical and analytical challenges imposed by the increased dynamism of Cloud systems. The behavioural characteristics of Cloud workloads and users are still not perfectly clear which restrains the reliability of the prediction accuracy of existing research works in this context. To this end, this thesis presents a comprehensive descriptive analytics of Cloud workload and user behaviours, uncovering the cause and energy related implications of Cloud Computing. Furthermore, the characteristics of Cloud workloads and users including latency levels, job heterogeneity, user dynamicity, straggling task behaviours, energy implications of stragglers, job execution and termination patterns and the inherent periodicity among Cloud workload and user behaviours have been empirically presented. Driven by descriptive analytics, a novel user behaviour forecasting framework has been developed, aimed at a tri-fold forecast of user behaviours including the session duration of users, anticipated number of submissions and the arrival trend of the incoming workloads. Furthermore, a novel resource optimisation framework has been proposed to avail the most optimum level of resources for executing jobs with reduced server energy expenditures and job terminations. This optimisation framework encompasses a resource estimation module to predict the anticipated resource consumption level for the arrived jobs and a classification module to classify tasks based on their resource intensiveness. Both the proposed frameworks have been verified theoretically and tested experimentally based on Google Cloud trace logs. Experimental analysis demonstrates the effectiveness of the proposed framework in terms of the achieved reliability of the forecast results and in reducing the server energy expenditures spent towards executing jobs at the datacentres.N/

UDORA - University of Derby Online Research Archive

Arquitectura, técnicas y modelos para posibilitar la Ciencia de Datos en el Archivo de la Misión Gaia

Author: Tapiador de Pedro Daniel
Publication venue: 'Universidad Complutense de Madrid (UCM)'
Publication date: 29/10/2018
Field of study

Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadores y Automática, leída el 26/05/2017.The massive amounts of data that the world produces every day pose new challenges to modern societies in terms of how to leverage their inherent value. Social networks, instant messaging, video, smart devices and scientific missions are just mere examples of the vast number of sources generating data every second. As the world becomes more and more digitalized, new needs arise for organizing, archiving, sharing, analyzing, visualizing and protecting the ever-increasing data sets, so that we can truly develop into a data-driven economy that reduces inefficiencies and increases sustainability, creating new business opportunities on the way. Traditional approaches for harnessing data are not suitable any more as they lack the means for scaling to the larger volumes in a timely and cost efficient manner. This has somehow changed with the advent of Internet companies like Google and Facebook, which have devised new ways of tackling this issue. However, the variety and complexity of the value chains in the private sector as well as the increasing demands and constraints in which the public one operates, needs an ongoing research that can yield newer strategies for dealing with data, facilitate the integration of providers and consumers of information, and guarantee a smooth and prompt transition when adopting these cutting-edge technological advances. This thesis aims at providing novel architectures and techniques that will help perform this transition towards Big Data in massive scientific archives. It highlights the common pitfalls that must be faced when embracing it and how to overcome them, especially when the data sets, their transformation pipelines and the tools used for the analysis are already present in the organizations. Furthermore, a new perspective for facilitating a smoother transition is laid out. It involves the usage of higher-level and use case specific frameworks and models, which will naturally bridge the gap between the technological and scientific domains. This alternative will effectively widen the possibilities of scientific archives and therefore will contribute to the reduction of the time to science. The research will be applied to the European Space Agency cornerstone mission Gaia, whose final data archive will represent a tremendous discovery potential. It will create the largest and most precise three dimensional chart of our galaxy (the Milky Way), providing unprecedented position, parallax and proper motion measurements for about one billion stars. The successful exploitation of this data archive will depend to a large degree on the ability to offer the proper architecture, i.e. infrastructure and middleware, upon which scientists will be able to do exploration and modeling with this huge data set. In consequence, the approach taken needs to enable data fusion with other scientific archives, as this will produce the synergies leading to an increment in scientific outcome, both in volume and in quality. The set of novel techniques and frameworks presented in this work addresses these issues by contextualizing them with the data products that will be generated in the Gaia mission. All these considerations have led to the foundations of the architecture that will be leveraged by the Science Enabling Applications Work Package. Last but not least, the effectiveness of the proposed solution will be demonstrated through the implementation of some ambitious statistical problems that will require significant computational capabilities, and which will use Gaia-like simulated data (the first Gaia data release has recently taken place on September 14th, 2016). These ambitious problems will be referred to as the Grand Challenge, a somewhat grandiloquent name that consists in inferring a set of parameters from a probabilistic point of view for the Initial Mass Function (IMF) and Star Formation Rate (SFR) of a given set of stars (with a huge sample size), from noisy estimates of their masses and ages respectively. This will be achieved by using Hierarchical Bayesian Modeling (HBM). In principle, the HBM can incorporate stellar evolution models to infer the IMF and SFR directly, but in this first step presented in this thesis, we will start with a somewhat less ambitious goal: inferring the PDMF and PDAD. Moreover, the performance and scalability analyses carried out will also prove the suitability of the models for the large amounts of data that will be available in the Gaia data archive.Las grandes cantidades de datos que se producen en el mundo diariamente plantean nuevos retos a la sociedad en términos de cómo extraer su valor inherente. Las redes sociales, mensajería instantánea, los dispositivos inteligentes y las misiones científicas son meros ejemplos del gran número de fuentes generando datos en cada momento. Al mismo tiempo que el mundo se digitaliza cada vez más, aparecen nuevas necesidades para organizar, archivar, compartir, analizar, visualizar y proteger la creciente cantidad de datos, para que podamos desarrollar economías basadas en datos e información que sean capaces de reducir las ineficiencias e incrementar la sostenibilidad, creando nuevas oportunidades de negocio por el camino. La forma en la que se han manejado los datos tradicionalmente no es la adecuada hoy en día, ya que carece de los medios para escalar a los volúmenes más grandes de datos de una forma oportuna y eficiente. Esto ha cambiado de alguna manera con la llegada de compañías que operan en Internet como Google o Facebook, ya que han concebido nuevas aproximaciones para abordar el problema. Sin embargo, la variedad y complejidad de las cadenas de valor en el sector privado y las crecientes demandas y limitaciones en las que el sector público opera, necesitan una investigación continua en la materia que pueda proporcionar nuevas estrategias para procesar las enormes cantidades de datos, facilitar la integración de productores y consumidores de información, y garantizar una transición rápida y fluida a la hora de adoptar estos avances tecnológicos innovadores. Esta tesis tiene como objetivo proporcionar nuevas arquitecturas y técnicas que ayudarán a realizar esta transición hacia Big Data en archivos científicos masivos. La investigación destaca los escollos principales a encarar cuando se adoptan estas nuevas tecnologías y cómo afrontarlos, principalmente cuando los datos y las herramientas de transformación utilizadas en el análisis existen en la organización. Además, se exponen nuevas medidas para facilitar una transición más fluida. Éstas incluyen la utilización de software de alto nivel y específico al caso de uso en cuestión, que haga de puente entre el dominio científico y tecnológico. Esta alternativa ampliará de una forma efectiva las posibilidades de los archivos científicos y por tanto contribuirá a la reducción del tiempo necesario para generar resultados científicos a partir de los datos recogidos en las misiones de astronomía espacial y planetaria. La investigación se aplicará a la misión de la Agencia Espacial Europea (ESA) Gaia, cuyo archivo final de datos presentará un gran potencial para el descubrimiento y hallazgo desde el punto de vista científico. La misión creará el catálogo en tres dimensiones más grande y preciso de nuestra galaxia (la Vía Láctea), proporcionando medidas sin precedente acerca del posicionamiento, paralaje y movimiento propio de alrededor de mil millones de estrellas. Las oportunidades para la explotación exitosa de este archivo de datos dependerán en gran medida de la capacidad de ofrecer la arquitectura adecuada, es decir infraestructura y servicios, sobre la cual los científicos puedan realizar la exploración y modelado con esta inmensa cantidad de datos. Por tanto, la estrategia a realizar debe ser capaz de combinar los datos con otros archivos científicos, ya que esto producirá sinergias que contribuirán a un incremento en la ciencia producida, tanto en volumen como en calidad de la misma. El conjunto de técnicas e infraestructuras innovadoras presentadas en este trabajo aborda estos problemas, contextualizándolos con los productos de datos que se generarán en la misión Gaia. Todas estas consideraciones han conducido a los fundamentos de la arquitectura que se utilizará en el paquete de trabajo de aplicaciones que posibilitarán la ciencia en el archivo de la misión Gaia (Science Enabling Applications). Por último, la eficacia de la solución propuesta se demostrará a través de la implementación de dos problemas estadísticos que requerirán cantidades significativas de cómputo, y que usarán datos simulados en el mismo formato en el que se producirán en el archivo de la misión Gaia (la primera versión de datos recogidos por la misión está disponible desde el día 14 de Septiembre de 2016). Estos ambiciosos problemas representan el Gran Reto (Grand Challenge), un nombre grandilocuente que consiste en inferir una serie de parámetros desde un punto de vista probabilístico para la función de masa inicial (Initial Mass Function) y la tasa de formación estelar (Star Formation Rate) dado un conjunto de estrellas (con una muestra grande), desde estimaciones con ruido de sus masas y edades respectivamente. Esto se abordará utilizando modelos jerárquicos bayesianos (Hierarchical Bayesian Modeling). Enprincipio,losmodelospropuestos pueden incorporar otros modelos de evolución estelar para inferir directamente la función de masa inicial y la tasa de formación estelar, pero en este primer paso presentado en esta tesis, empezaremos con un objetivo algo menos ambicioso: la inferencia de la función de masa y distribución de edades actual (Present-Day Mass Function y Present-Day Age Distribution respectivamente). Además, se llevará a cabo el análisis de rendimiento y escalabilidad para probar la idoneidad de la implementación de dichos modelos dadas las enormes cantidades de datos que estarán disponibles en el archivo de la misión Gaia...Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu

Docta Complutense

Recommended from our members

Hunting High and Low: Visualising Shifting Correlations in Financial Markets

Author: Ang
Archambault
Bach
Bach
Baur
Beck
Birch
Borg
Darbar
Drozdz
Eades
Egghe
Fruchterman
Geyer
Gibson
Goetzmann
Goldman
Groenen
Kenett
Kenett
Kirk
Krishnan
Leydesdorff
Leydesdorff
Longin
Mantegna
Markowitz
McKenna
Munzner
Munzner
Naylor
Onnela
Onnela
Onnela
Papavassiliou
Piccardi
Preis
Qu
Rea
Rufiange
Sancetta
Sandoval
Seo
Sharpe
Shawky
Smith
Song
Tse
Tumminello
Tumminello
Turkay
Turkay
Tversky
Zhang
Publication venue: 'Wiley'
Publication date: 10/07/2018
Field of study

The analysis of financial assets' correlations is fundamental to many aspects of finance theory and practice, especially modern portfolio theory and the study of risk. In order to manage investment risk, in-depth analysis of changing correlations is needed, with both high and low correlations between financial assets (and groups thereof) important to identify. In this paper, we propose a visual analytics framework for the interactive analysis of relations and structures in dynamic, high-dimensional correlation data. We conduct a series of interviews and review the financial correlation analysis literature to guide our design. Our solution combines concepts from multi-dimensional scaling, weighted complete graphs and threshold networks to present interactive, animated displays which use proximity as a visual metaphor for correlation and animation stability to encode correlation stability. We devise interaction techniques coupled with context-sensitive auxiliary views to support the analysis of subsets of correlation networks. As part of our contribution, we also present behaviour profiles to help guide future users of our approach. We evaluate our approach by checking the validity of the layouts produced, presenting a number of analysis stories, and through a user study. We observe that our solutions help unravel complex behaviours and resonate well with study participants in addressing their needs in the context of correlation analysis in finance

City Research Online

Crossref

Lipophilic and cationic gallium-68 complexes for the detection of mitochondrial dysfunction

Author: Osborne Bradley Edward
Publication venue: Chemistry, Imperial College London
Publication date: 01/05/2023
Field of study

This thesis gives an account of the synthesis of macrocyclic and acyclic ligands for radiolabelling with gallium-68 to afford lipophilic and cationic radiotracers to report on mitochondrial dysfunction. These radiotracers are designed to be lipophilic enough to ensure the successful permeation of lipid bilayer membranes, and to also possess a cationic charge in order to be attracted into the mitochondria in a manner dependent on the mitochondrial membrane potential. The synthesis of novel macrocyclic chelators for gallium-68 labelling is described in chapter 2. The synthesis of three sets of macrocyclic compounds are discussed, beginning with a focus on triarylphosphonium functionalisation to afford lipophilic and cationic compounds. All three sets of compounds described in chapter 2 were successfully radiolabelled with generator-produced gallium-68. Chapter 3 describes this radiolabelling, starting with the optimisation of radiochemical yields through a series of altering reaction condition experiments. Initial screening of these radiolabelled compounds was performed through the measurement of their log D values. RadioHPLC analysis was used to investigate the purity of the reaction mixture and assess the presence of speciation. Chapter 4 described the candidate radiotracer’s behaviour in a preclinical model. The cardiac uptake and retention of the candidate radiotracer was determined using the ex vivo Langendorff isolated perfused heart model in both healthy and depolarised mitochondria, with MIBI used as a baseline for this model. This work provides the foundation for additional, more thorough, biological evaluation of such radiotracers, as well as informing on the design and development of future lipophilic and cationic gallium-68 radiotracers. As well as macrocyclic ligands discussed in chapter 2, several acyclic ligand families were also synthesised. These series of acyclic ligands are described in chapter 5, including a focus on bis(salicylaldimine) compounds and some phosphinic acid functionalised analogues, two series of tridentate ligands which have been shown to selectively form 2:1 complexes with metals in the oxidation state of +3, and a final series of bis(semicarbazone) chelates with a focus on triarylphosphonium functionalisation to yield lipophilic and cationic compounds. Chapter 6 summarises the conclusions of this thesis and ideas for future work of this research. Chapter 7 summarises the experimental procedures for the synthetic chemistry and radiochemistry. Chapter 8 contains an Appendix for NMR, MS and DFT data.Open Acces

Spiral - Imperial College Digital Repository

Learning Analytics for E-Learning Content Recommendations

Author: Perera Amal
Tharsan Sivakumar
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 15/06/2015
Field of study

[EN] E-Learning systems have caused a rapid increase to the amount of learning content available on the web. It has become a time consuming and a daunting task for e-learners to find the relevant content that they should study. Existing e-learning technology lacks the automated capability to provide guidance for students to prioritize and engage in the most vital course content. The students who are unable to find out the most suitable resources, for their studies and the assignments, may waste most of their time on browsing and searching. Some of the “good-students” can indirectly act as good guides to other students. Average learners could follow the content adopted by good students in the process of learning. It is possible to capture the behaviour of “good-students” and expose it as a form of automated guiding. For thisto work it is important to be able to predict students who are going to be successful at the end of the course based on their performance during the early part of the course. This work demonstrates the use of data mining techniques on e-Learning data to enable “Good-students” to indirectly guide “Average-Students” to find the most relevant content on an e-Learning environment.Perera, A.; Tharsan, S. (2015). Learning Analytics for E-Learning Content Recommendations. En 1ST INTERNATIONAL CONFERENCE ON HIGHER EDUCATION ADVANCES (HEAD' 15). Editorial Universitat Politècnica de València. 121-126. https://doi.org/10.4995/HEAd15.2015.448OCS12112

Crossref

RiuNet