2,131 research outputs found
Recommended from our members
Scheduling, Characterization and Prediction of HPC Workloads for Distributed Computing Environments
As High Performance Computing (HPC) has grown considerably and is expected to grow even more, effective resource management for distributed computing sys- tems is motivated more than ever. As the computational workloads grow in quantity, it is becoming more crucial to apply efficient resource management and workload scheduling to use resources efficiently while keeping the computational performance reasonably good. The problem of efficiently scheduling workloads on resources while meeting performance standards is hard. Additionally, non-clairvoyance of job dimen- sions makes resource management even harder in real-world scenarios. Our research methodology investigates the scheduling problem compliant for HPC and researches the challenges for deploying the scheduling in real world-scenarios using state of the art machine learning and data science techniques.To this end, this Ph.D. dissertation makes the following core contributions: a) We perform a theoretical analysis of space-sharing, non-preemptive scheduling: we studied this scheduling problem and proposed scheduling algorithms with polyno- mial computation time. We also proved constant upper-bounds for the performance of these algorithms. b) We studied the sensitivity of scheduling algorithms to the accuracy of runtime and devised a meta-learning approach to estimate prediction accuracy for newly submitted jobs to the HPC system. c) We studied the runtime prediction problem for HPC applications. For this purpose, we studied the distri- bution of available public workloads and proposed two different solutions that can predict multi-modal distributions: switching state-space models and Mixture Density Networks. d) We studied the effectiveness of recent recurrent neural network models for CPU usage trace prediction for individual VM traces as well as aggregate CPU usage traces. In this dissertation, we explore solutions to improve the performance of scheduling workloads on distributed systems.We begin by looking at the problem from the theoretical perspective. Modeling the problem mathematically, we first propose a scheduling algorithm that finds a constant approximation of the optimal solution for the problem in polynomial time. We prove that the performance of the algorithm (average completion time is the constant approximation of the performance of the optimal scheduling. We next look at the problem in real-world scenarios. Considering High-Performance Computing (HPC) workload computing environments as the most similar real-world equivalent of our mathematical model, we explore the problem of predicting application runtime. We propose an algorithm to handle the existing uncertainties in the real world and show-case our algorithm with demonstrative effectiveness in terms of response time and resource utilization. After looking at the uncertainty problem, we focus on trying to improve the accuracy of existing prediction approaches for HPC application runtime. We propose two solutions, one based on Kalman filters and one based on deep density mixture networks. We showcase the effectiveness of our prediction approaches by comparing with previous prediction approaches in terms of prediction accuracy and impact on improving scheduling performance. In the end, we focus on predicting resource usage for individual applications during their execution. We explore the application of recurrent neural networks for predicting resource usage of applications deployed on individual virtual machines. To validate our proposed models and solutions, we performed extensive trace-driven simulation and measured the effectiveness of our approaches
Information needs along the journey chain: usersā perspective about bus system
Buses constitute the main public transport mode in most cities of the world. Accessible
Bus Systems are defined as systems that are easy to use. However accessible the
infrastructure may be, it is unlikely to provide access if people cannot know about it.
Therefore it is essential to have comprehensive and accessible information systems
which describe the bus systems during all the stages of the journey.
There is a widespread understanding amongst researchers that Information Systems can
increase the efficiency of the system and that they should be oriented to meet bus usersā
needs. However, existing information systems largely ignore the userās point of view, in
special the requirement of the disabled users. This thesis describes a methodology
developed to investigate the problem of using information during a journey by bus in
real conditions taking into account the (un)familiarity of the area in study and the
individualās previous knowledge of information system.
Two main aspects are identified ā the āRequired Environment Capabilityā (the
physical, social and psychological environment conditions) and the āIndividual
Capability Providedā (the individual ability in physical, sensorial and cognitive terms)
to plan and execute a journey by bus in an unfamiliar environment. Because of the
multidisciplinary aspect of the theme this study uses approaches from different fields of
research to construct a methodology to understand individual information use. Based on
the principles of Single Case Analysis adapted by adding the concept of the Capabilities
Model (CM) (which explores interactions between individual and environment), the
combined SCA/CM approach was employed to construct the INFOChain experiment. A
set of information pieces were developed for the experiment, delivering Accessibility-
Issues (AI-type) information in order to help older people to plan and execute different
bus journeys in two different cities: London/UK and Brasilia/BR.
General results have shown that although the AI-Type of information is considered
important by older people, it needs more than simple expositions to actually take
advantages of the information and be able to help disabled users
Inhibitory Control and Source Monitoring: A Developmental Investigation into Memory for Recently Witnessed Events
Research has demonstrated that younger children experience difficulty monitoring the source of information and, accordingly, have disproportionately more difficulty accurately recalling details of witnessed events. Within age variability in memory performance, however, suggests that chronological age may not be the only nor the best predictor of source monitoring ability. The present study examined whether inhibitory control (IC) better accounts for variations in the ability to monitor the source of retrieved information than chronological age. Ninety-five children aged 4 to 10 years engaged in a source monitoring task designed to evaluate their ability to accurately identify what information they had witnessed the prior week. Participants further completed measures of IC and other cognitive tasks (receptive vocabulary, memory span, verbal fluency). Exploratory factor analyses revealed three distinct types of IC processes (distractor interference, resistance to PI, prepotent inhibition), indicating that the IC measures administered did not all tap the same unified construct. Participants across ages and IC ability successfully identified witnessed events, and experienced difficulty rejecting the items they previously confabulated. Multiple regression analyses further indicated that IC predicted substantial variance in the ability to reject events that were not witnessed or discussed, while age and the cognitive variables only added a small non-statistically- significant amount of variance above this. IC further predicted variance in the ability to reject events that were not witnessed or discussed once controlling for age and the cognitive variables. The current findings provide evidence suggesting that: 1) measures of IC should not be assumed to assess the same underlying processes; and 2) distractor interference and prepotent inhibition abilities specifically contribute to the ability to reject information that was not witnessed or discussed during source monitoring tasks. This provides further evidence that the development of IC is an important aspect of source monitoring ability in children
A Study of Scalability and Cost-effectiveness of Large-scale Scientific Applications over Heterogeneous Computing Environment
Recent advances in large-scale experimental facilities ushered in an era of data-driven science. These large-scale data increase the opportunity to answer many fundamental questions in basic science. However, these data pose new challenges to the scientific community in terms of their optimal processing and transfer. Consequently, scientists are in dire need of robust high performance computing (HPC) solutions that can scale with terabytes of data.
In this thesis, I address the challenges in three major aspects of scientific big data processing as follows: 1) Developing scalable software and algorithms for data- and compute-intensive scientific applications. 2) Proposing new cluster architectures that these software tools need for good performance. 3) Transferring the big scientific dataset among clusters situated at geographically disparate locations.
In the first part, I develop scalable algorithms to process huge amounts of scientific big data using the power of recent analytic tools such as, Hadoop, Giraph, NoSQL, etc. At a broader level, these algorithms take the advantage of locality-based computing that can scale with increasing amount of data. The thesis mainly addresses the challenges involved in large-scale genome analysis applications such as, genomic error correction and genome assembly which made their way to the forefront of big data challenges recently.
In the second part of the thesis, I perform a systematic benchmark study using the above-mentioned algorithms on different distributed cyberinfrastructures to pinpoint the limitations in a traditional HPC cluster to process big data. Then I propose the solution to those limitations by balancing the I/O bandwidth of the solid state drive (SSD) with the computational speed of high-performance CPUs. A theoretical model has been also proposed to help the HPC system designers who are striving for system balance.
In the third part of the thesis, I develop a high throughput architecture for transferring these big scientific datasets among geographically disparate clusters. The architecture leverages the power of Ethereum\u27s Blockchain technology and Swarm\u27s peer-to-peer (P2P) storage technology to transfer the data in secure, tamper-proof fashion. Instead of optimizing the computation in a single cluster, in this part, my major motivation is to foster translational research and data interoperability in collaboration with multiple institutions
Verbing and nouning in French : toward an ecologically valid approach to sentence processing
La preĢsente theĢse utilise la technique des potentiels eĢvoqueĢs afin dāeĢtudier les meĢchanismes neurocognitifs qui sous-tendent la compreĢhension de la phrase. Plus particulieĢrement, cette recherche vise aĢ clarifier lāinteraction entre les processus syntaxiques et seĢmantiques chez les locuteurs natifs et les apprenants dāune deuxieĢme langue (L2). Le modeĢle āsyntaxe en premierā (Friederici, 2002, 2011) preĢdit que les cateĢgories syntaxiques sont analyseĢes de facĢ§on preĢcoce: ce stade est refleĢteĢ par la composante ELAN (Early anterior negativity, NeĢgativiteĢ anteĢrieure gauche), qui est induite par les erreurs de cateĢgorie syntaxique. De plus, ces erreurs semblent empeĢcher lāapparition de la composante N400 qui refleĢte les processus lexico-seĢmantiques. Ce pheĢnomeĢne est deĢfini comme le bloquage seĢmantique (Friederici et al., 1999). Cependant, la plupart des eĢtudes qui observent la ELAN utilisent des protocoles expeĢrimentaux probleĢmatiques dans lesquels les diffeĢrences entre les contextes qui preĢceĢdent la cible pourraient eĢtre aĢ lāorigine de reĢsultats fallacieux expliquant aĢ la fois lāapparente āELANā et lāabsence de N400 (Steinhauer & Drury, 2012).
La premieĢre eĢtude reĢeĢevalue lāapproche de la āsyntaxe en premierā en adoptant un paradigme expeĢriemental novateur en francĢ§ais qui introduit des erreurs de cateĢgorie syntaxique et les anomalies de seĢmantique lexicale. Ce dessin expeĢrimental eĢquilibreĢ controĢle aĢ la fois le mot-cible (nom vs. verbe) et le contexte qui le preĢceĢde. Les reĢsultats reĢcolteĢs aupreĢs de locuteurs natifs du francĢ§ais queĢbeĢcois ont reĢveĢleĢ un complexe N400-P600 en reĢponse aĢ toutes les anomalies, en contradiction avec les preĢdictions du modeĢle de Friederici. Les effets additifs des manipulations syntaxique et seĢmantique sur la N400 suggeĢrent la deĢtection dāune incoheĢrence entre la racine du mot qui avait eĢteĢ preĢdite et la cible, dāune part, et lāactivation lexico-seĢmantique, dāautre part. Les reĢponses individuelles se sont pas caracteĢriseĢes par une dominance vers la N400 ou la P600: au contraire, une onde biphasique est preĢsente chez la majoriteĢ des participants. Cette activation peut donc eĢtre consideĢreĢe comme un index fiable des meĢcanismes qui sous-tendent le traitement des structures syntagmatiques.
La deuxieĢme eĢtude se concentre sur les meĢme processus chez les apprenants tardifs du francĢ§ais L2. LāhypotheĢse de la convergence (Green, 2003 ; Steinhauer, 2014) preĢdit que les apprenants dāune L2, sāils atteignent un niveau avanceĢ, mettent en place des processus de traitement en ligne similaires aux locuteurs natifs. Cependant, il est difficile de consideĢrer en meĢme temps un grand nombre de facteurs qui se rapportent aĢ leurs compeĢtences linguistiques, aĢ lāexposition aĢ la L2 et aĢ lāaĢge dāacquisition. Cette eĢtude continue dāexplorer les diffeĢrences inter-individuelles en modeĢlisant les donneĢes de potentiels-eĢvoqueĢs avec les ForeĢts aleĢatoires, qui ont reĢveĢleĢ que le pourcentage dāexplosition au francĢ§ais ansi que le niveau de langue sont les preĢdicteurs les plus fiables pour expliquer les reĢponses eĢlectrophysiologiques des participants. Plus ceux-ci sont eĢleveĢs, plus lāamplitude des composantes N400 et P600 augmente, ce qui confirme en partie les preĢdictions faites par lāhypotheĢse de la convergence.
En conclusion, le modeĢle de la āsyntaxe en premierā nāest pas viable et doit eĢtre remplaceĢ. Nous suggeĢrons un nouveau paradigme baseĢ sur une approche preĢdictive, ouĢ les informations seĢmantiques et syntaxiques sont activeĢes en paralleĢle dans un premier temps, puis inteĢgreĢes via un recrutement de meĢcanismes controĢleĢs. Ces derniers sont modeĢreĢs par les capaciteĢs inter-individuelles refleĢteĢes par lāexposition et la performance.The present thesis uses event-related potentials (ERPs) to investigate neurocognitve mechanisms underlying sentence comprehension. In particular, these two experiments seek to clarify the interplay between syntactic and semantic processes in native speakers and second language learners. Friedericiās (2002, 2011) āsyntax-firstā model predicts that syntactic categories are analyzed at the earliest stages of speech perception reflected by the ELAN (Early left anterior negativity), reported for syntactic category violations. Further, syntactic category violations seem to prevent the appearance of N400s (linked to lexical-semantic processing), a phenomenon known as āsemantic blockingā (Friederici et al., 1999). However, a review article by Steinhauer and Drury (2012) argued that most ELAN studies used flawed designs, where pre-target context differences may have caused ELAN-like artifacts as well as the absence of N400s.
The first study reevaluates syntax-first approaches to sentence processing by implementing a novel paradigm in French that included correct sentences, pure syntactic category violations, lexical-semantic anomalies, and combined anomalies. This balanced design systematically controlled for target word (noun vs. verb) and the context immediately preceding it. Group results from native speakers of Quebec French revealed an N400-P600 complex in response to all anomalous conditions, providing strong evidence against the syntax-first and semantic blocking hypotheses. Additive effects of syntactic category and lexical-semantic anomalies on the N400 may reflect a mismatch detection between a predicted word-stem and the actual target, in parallel with lexical-semantic retrieval. An interactive rather than additive effect on the P600 reveals that the same neurocognitive resources are recruited for syntactic and semantic integration. Analyses of individual data showed that participants did not rely on one single cognitive mechanism reflected by either the N400 or the P600 effect but on both, suggesting that the biphasic N400-P600 ERP wave can indeed be considered to be an index of phrase-structure violation processing in most individuals.
The second study investigates the underlying mechanisms of phrase-structure building in late second language learners of French. The convergence hypothesis (Green, 2003; Steinhauer, 2014) predicts that second language learners can achieve native-like online- processing with sufficient proficiency. However, considering together different factors that relate to proficiency, exposure, and age of acquisition has proven challenging. This study further explores individual data modeling using a Random Forests approach. It revealed that daily usage and proficiency are the most reliable predictors in explaining the ERP responses, with N400 and P600 effects getting larger as these variables increased, partly confirming and extending the convergence hypothesis.
This thesis demonstrates that the āsyntax-firstā model is not viable and should be replaced. A new account is suggested, based on predictive approaches, where semantic and syntactic information are first used in parallel to facilitate retrieval, and then controlled mechanisms are recruited to analyze sentences at the interface of syntax and semantics. Those mechanisms are mediated by inter-individual abilities reflected by language exposure and performance
- ā¦