Search CORE

462 research outputs found

Estimating Example Difficulty using Variance of Gradients

Author: Agarwal Chirag
Hooker Sara
Publication venue
Publication date: 26/08/2020
Field of study

In machine learning, a question of great interest is understanding what examples are challenging for a model to classify. Identifying atypical examples helps inform safe deployment of models, isolates examples that require further human inspection, and provides interpretability into model behavior. In this work, we propose Variance of Gradients (VOG) as a proxy metric for detecting outliers in the data distribution. We provide quantitative and qualitative support that VOG is a meaningful way to rank data by difficulty and to surface a tractable subset of the most challenging examples for human-in-the-loop auditing. Data points with high VOG scores are more difficult for the model to classify and over-index on examples that require memorization.Comment: Accepted to Workshop on Human Interpretability in Machine Learning (WHI), ICML, 202

arXiv.org e-Print Archive

Distributed and Multiprocessor Scheduling

Author: Chapin Steven J.
Weissman Jon B,
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/2002
Field of study

This chapter discusses CPU scheduling in parallel and distributed systems. CPU scheduling is part of a broader class of resource allocation problems, and is probably the most carefully studied such problem. The main motivation for multiprocessor scheduling is the desire for increased speed in the execution of a workload. Parts of the workload, called tasks, can be spread across several processors and thus be executed more quickly than on a single processor. In this chapter, we will examine techniques for providing this facility. The scheduling problem for multiprocessor systems can be generally stated as \How can we execute a set of tasks T on a set of processors P subject to some set of optimizing criteria C? The most common goal of scheduling is to minimize the expected runtime of a task set. Examples of other scheduling criteria include minimizing the cost, minimizing communication delay, giving priority to certain users\u27 processes, or needs for specialized hardware devices. The scheduling policy for a multiprocessor system usually embodies a mixture of several of these criteria. Section 2 outlines general issues in multiprocessor scheduling and gives background material, including issues specific to either parallel or distributed scheduling. Section 3 describes the best practices from prior work in the area, including a broad survey of existing scheduling algorithms and mechanisms. Section 4 outlines research issues and gives a summary. Section 5 lists the terms defined in this chapter, while sections 6 and 7 give references to important research publications in the area

Syracuse University Research Facility and Collaborative Environment

Autonomous grid scheduling using probabilistic job runtime scheduling

Author: Lazarević A.
Publication venue: 'Queen Mary University of London'
Publication date: 01/01/2008
Field of study

Computational Grids are evolving into a global, service-oriented architecture – a universal platform for delivering future computational services to a range of applications of varying complexity and resource requirements. The thesis focuses on developing a new scheduling model for general-purpose, utility clusters based on the concept of user requested job completion deadlines. In such a system, a user would be able to request each job to finish by a certain deadline, and possibly to a certain monetary cost. Implementing deadline scheduling is dependent on the ability to predict the execution time of each queued job, and on an adaptive scheduling algorithm able to use those predictions to maximise deadline adherence. The thesis proposes novel solutions to these two problems and documents their implementation in a largely autonomous and self-managing way. The starting point of the work is an extensive analysis of a representative Grid workload revealing consistent workflow patterns, usage cycles and correlations between the execution times of jobs and its properties commonly collected by the Grid middleware for accounting purposes. An automated approach is proposed to identify these dependencies and use them to partition the highly variable workload into subsets of more consistent and predictable behaviour. A range of time-series forecasting models, applied in this context for the first time, were used to model the job execution times as a function of their historical behaviour and associated properties. Based on the resulting predictions of job runtimes a novel scheduling algorithm is able to estimate the latest job start time necessary to meet the requested deadline and sort the queue accordingly to minimise the amount of deadline overrun. The testing of the proposed approach was done using the actual job trace collected from a production Grid facility. The best performing execution time predictor (the auto-regressive moving average method) coupled to workload partitioning based on three simultaneous job properties returned the median absolute percentage error centroid of only 4.75%. This level of prediction accuracy enabled the proposed deadline scheduling method to reduce the average deadline overrun time ten-fold compared to the benchmark batch scheduler. Overall, the thesis demonstrates that deadline scheduling of computational jobs on the Grid is achievable using statistical forecasting of job execution times based on historical information. The proposed approach is easily implementable, substantially self-managing and better matched to the human workflow making it well suited for implementation in the utility Grids of the future

UCL Discovery

OpenGrey Repository

Applications of Machine Learning in Apple Crop Yield Prediction

Author: van den Heever Deirdre
Publication venue: Department of Statistical Sciences
Publication date: 22/03/2022
Field of study

This study proposes the application of machine learning techniques to predict yield in the apple industry. Crop yield prediction is important because it impacts resource and capacity planning. It is, however, challenging because yield is affected by multiple interrelated factors such as climate conditions and orchard management practices. Machine learning methods have the ability to model complex relationships between input and output features. This study considers the following machine learning methods for apple yield prediction: multiple linear regression, artificial neural networks, random forests and gradient boosting. The models are trained, optimised, and evaluated using both a random and chronological data split, and the out-of-sample results are compared to find the best-suited model. The methodology is based on a literature analysis that aims to provide a holistic view of the field of study by including research in the following domains: smart farming, machine learning, apple crop management and crop yield prediction. The models are built using apple production data and environmental factors, with the modelled yield measured in metric tonnes per hectare. The results show that the random forest model is the best performing model overall with a Root Mean Square Error (RMSE) of 21.52 and 14.14 using the chronological and random data splits respectively. The final machine learning model outperforms simple estimator models showing that a data-driven approach using machine learning methods has the potential to benefit apple growers

Cape Town University OpenUCT

Attribute-Based, Usefully Secure Email

Author: Masone Christopher P
Publication venue: Dartmouth Digital Commons
Publication date: 01/08/2008
Field of study

A secure system that cannot be used by real users to secure real-world processes is not really secure at all. While many believe that usability and security are diametrically opposed, a growing body of research from the field of Human-Computer Interaction and Security (HCISEC) refutes this assumption. All researchers in this field agree that focusing on aligning usability and security goals can enable the design of systems that will be more secure under actual usage. We bring to bear tools from the social sciences (economics, sociology, psychology, etc.) not only to help us better understand why deployed systems fail, but also to enable us to accurately characterize the problems that we must solve in order to build systems that will be secure in the real world. Trust, a critically important facet of any socio-technical secure system, is ripe for analysis using the tools provided for us by the social sciences. There are a variety of scopes in which issues of trust in secure systems can be stud- ied. We have chosen to focus on how humans decide to trust new correspondents. Current secure email systems such as S/MIME and PGP/MIME are not expressive enough to capture the real ways that trust flows in these sorts of scenarios. To solve this problem, we begin by applying concepts from social science research to a variety of such cases from interesting application domains; primarily, crisis management in the North American power grid. We have examined transcripts of telephone calls made between grid manage- ment personnel during the August 2003 North American blackout and extracted several different classes of trust flows from these real-world scenarios. Combining this knowl- edge with some design patterns from HCISEC, we develop criteria for a system that will enable humans apply these same methods of trust-building in the digital world. We then present Attribute-Based, Usefully Secure Email (ABUSE) and not only show that it meets our criteria, but also provide empirical evidence that real users are helped by the system

Dartmouth Digital Commons (Dartmouth College)

Integer Bilevel Linear Programming Problems: New Results and Applications

Author: MARI RENATO
Publication venue
Publication date: 16/05/2014
Field of study

Integer Bilevel Linear Programming Problems: New Results and Application

Archivio della ricerca- Università di Roma La Sapienza

Integer Bilevel Linear Programming Problems: New Results and Applications

Author: MARI RENATO
Publication venue
Publication date: 16/05/2014
Field of study

Integer Bilevel Linear Programming Problems: New Results and Application

Pubblicazioni Aperte Digitali Interateneo Sapienza

Archivio della ricerca- Università di Roma La Sapienza

CTRL SHIFT

Author: Lee Kit Son
Publication venue: DigitalCommons@RISD
Publication date: 01/06/2021
Field of study

CTRL SHIFT makes a case for design under contemporary computation. The abstractions of reading, writing, metaphors, mythology, code, cryptography, interfaces, and other such symbolic languages are leveraged as tools for understanding. Alternative modes of knowledge become access points through which users can subvert the control structures of software. By challenging the singular expertise of programmers, the work presented within advocates for the examination of internalized beliefs, the redistribution of networked power, and the collective sabotage of computational authority

Rhode Island School of Design

Autonomous grid scheduling using probabilistic job runtime forecasting.

Author: Lazarevic A.
Publication venue: 'Queen Mary University of London'
Publication date: 01/01/2008
Field of study

Computational Grids are evolving into a global, service-oriented architecture a universal platform for delivering future computational services to a range of applications of varying complexity and resource requirements. The thesis focuses on developing a new scheduling model for general-purpose, utility clusters based on the concept of user requested job completion deadlines. In such a system, a user would be able to request each job to finish by a certain deadline. and possibly to a certain monetary cost. Implementing deadline scheduling is dependent on the ability to predict the execution time of each queued job. and on an adaptive scheduling algorithm able to use those predictions to maximise deadline adherence. The thesis proposes novel solutions to these two problems and documents their implementation in a largely autonomous and self-managing way. The starting point of the work is an extensive analysis of a representative Grid workload revealing consistent workflow patterns, usage cycles and correlations between the execution times of jobs and its properties commonly collected by the Grid middleware for accounting purposes. An automated approach is proposed to identify these dependencies and use them to partition the highly variable workload into subsets of more consistent and predictable behaviour. A range of time-series forecasting models, applied in this context for the first time, were used to model the job execution times as a function of their historical behaviour and associated properties. Based on the resulting predictions of job runtimes a novel scheduling algorithm is able to estimate the latest job start time necessary to meet the requested deadline and sort the queue accordingly to minimise the amount of deadline overrun. The testing of the proposed approach was done using the actual job trace collected from a production Grid facility. The best performing execution time predictor (the auto-regressive moving average method) coupled to workload partitioning based on three simultaneous job properties returned the median absolute percentage error eentroid of only 4.75CX. This level of prediction accuracy enabled the proposed deadline scheduling method to reduce the average deadline overrun time ten-fold compared to the benchmark batch scheduler. Overall, the thesis demonstrates that deadline scheduling of computational jobs on the Grid is achievable using statistical forecasting of job execution times based on historical information. The proposed approach is easily implementable, substantially self-managing and better matched to the human workflow making it well suited for implementation in the utility Grids of the future

UCL Discovery