230,436 research outputs found

    Learning Scheduling Algorithms for Data Processing Clusters

    Full text link
    Efficiently scheduling data processing jobs on distributed compute clusters requires complex algorithms. Current systems, however, use simple generalized heuristics and ignore workload characteristics, since developing and tuning a scheduling policy for each workload is infeasible. In this paper, we show that modern machine learning techniques can generate highly-efficient policies automatically. Decima uses reinforcement learning (RL) and neural networks to learn workload-specific scheduling algorithms without any human instruction beyond a high-level objective such as minimizing average job completion time. Off-the-shelf RL techniques, however, cannot handle the complexity and scale of the scheduling problem. To build Decima, we had to develop new representations for jobs' dependency graphs, design scalable RL models, and invent RL training methods for dealing with continuous stochastic job arrivals. Our prototype integration with Spark on a 25-node cluster shows that Decima improves the average job completion time over hand-tuned scheduling heuristics by at least 21%, achieving up to 2x improvement during periods of high cluster load

    Distributed real-time operating system (DRTOS) modeling in SpecC

    Get PDF
    System level design of an embedded computing system involves a multi-step process to refine the system from an abstract specification to an actual implementation by defining and modeling the system at various levels of abstraction. System level design supports evaluating and optimizing the system early in design exploration.;Embedded computing systems may consist of multiple processing elements, memories, I/O devices, sensors, and actors. The selection of processing elements includes instruction-set processors and custom hardware units, such as application specific integrated circuit (ASIC) and field programmable gate array (FPGA). Real-time operating systems (RTOS) have been used in embedded systems as an industry standard for years and can offer embedded systems the characteristics such as concurrency and time constraints. Some of the existing system level design languages, such as SpecC, provide the capability to model an embedded system including an RTOS for a single processor. However, there is a need to develop a distributed RTOS modeling mechanism as part of the system level design methodology due to the increasing number of processing elements in systems and to embedded platforms having multiple processors. A distributed RTOS (DRTOS) provides services such as multiprocessor tasks scheduling, interprocess communication, synchronization, and distributed mutual exclusion, etc.;In this thesis, we develop a DRTOS model as the extension of the existing SpecC single RTOS model to provide basic functionalities of a DRTOS implementation, and present the refinement methodology for using our DRTOS model during system level synthesis. The DRTOS model and refinement process are demonstrated in the SpecC SCE environment. The capabilities and limitations of the DRTOS modeling approach are presented

    Comparative effectiveness of input-based instructions on L2 grammar knowledge : textual enhancement and processing instruction

    Get PDF
    06.03.2018 tarihli ve 30352 sayılı Resmi Gazetede yayımlanan “Yükseköğretim Kanunu İle Bazı Kanun Ve Kanun Hükmünde Kararnamelerde Değişiklik Yapılması Hakkında Kanun” ile 18.06.2018 tarihli “Lisansüstü Tezlerin Elektronik Ortamda Toplanması, Düzenlenmesi ve Erişime Açılmasına İlişkin Yönerge” gereğince tam metin erişime açılmıştır.Bu tez çalışması iki farklı girdi temelli öğretim yönteminin, Metinsel Girdi Geliştirme ve Yapılandırılmış Girdi Alıştırmaları, İngilizce geniş zaman 3. tekil şahıs ekinin edinimine etkisini araştırmıştır. Bu çalışma yarı deneysel bir çalışma olup başlangıçta 43 katılımcıdan oluşan iki deney grubu içermektedir. Her iki deney grubuna da eğitimden bir hafta önce bir ön test uygulanmış ve sonrasında ikişer ders saati boyunca Yapılandırılmış Girdi ve Metinsel Girdi Alıştırmalarını içeren iki farklı uygulama yürütülmüştür. Eğitimden bir gün sonra her iki gruba da son test uygulanmıştır. Son olarak, edinimin kalıcı olup olmadığını ortaya koymak adına dört hafta sonra farklı bir son test uygulanmıştır. Bu çalışmada, iki farklı girdi temelli öğretim yönteminin ortaokul düzeyinde İngilizce'yi yabancı dil olarak öğrenen öğrencilerin hedef dildeki geniş zaman tekil şahıs ekini kavrama ve (eğitim süresince üretim yaptırılmamasına rağmen) üretim düzeylerine ne kadar katkıda bulunacağının ortaya çıkarılması amaçlanmaktadır. Sonuçlar, her iki yöntemin de katılımcıya kavrama düzeyinde katkıda bulunduğunu ancak ekin üretimine ilişkin olarak aynı etkiye sahip olmadığını göstermektedir. Anahtar Kelimeler: Girdi-temelli Öğretim Yöntemi, Metinsel Girdi Geliştirme Alıştırmaları, Yapılandırılmış Girdi Alıştırmaları, Çocuklara Yabancı Dil ÖğretimiThis quasi-experimental study investigated the effects of two different types of input-based instructions, namely Textual Enhancement (TE) and Processing Instruction (PI) on the acquisition of English Simple Present Tense third person singular form (–s). To this end, elementary level young learners (n = 43) learning English as a Foreign Language (EFL) were employed for the study, and then randomly distributed into two experimental groups as TE and PI groups. Each group received its own specific instruction for two regular classroom hours: the TE group received textual enhancement; the PI group received processing instruction. The groups were assessed within a pretest, an immediate posttest and a delayed posttest design. The assessment materials included one interpretation task (grammaticality judgment task) and two production tasks (form correction and written production tasks). All the instructional and assessment materials used in the study were piloted twice on a similar group of students prior to the main study to check the difficulty level of the instructional materials, the reliability of the tests and the clarity of the instruction. The overall findings showed that both TE and PI groups improved their performance on the interpretation-level task; however, they failed to improve their performance on the production-level tasks. Key words: Input-based Instructions, Focus-on-form, Textual Enhancement,Processing Instruction, Teaching English to Young Learner

    The "MIND" Scalable PIM Architecture

    Get PDF
    MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing. It is a Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die. MIND is multicore with multiple memory/processor nodes on each chip and supports global shared memory across systems of MIND components. MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction processing. MIND is designed to operate either in conjunction with other conventional microprocessors or in standalone arrays of like devices. It also incorporates mechanisms for fault tolerance, real time execution, and active power management. This paper describes the major elements and operational methods of the MIND architecture

    Parallelization of cycle-based logic simulation

    Get PDF
    Verification of digital circuits by Cycle-based simulation can be performed in parallel. The parallel implementation requires two phases: the compilation phase, that sets up the data needed for the execution of the simulation, and the simulation phase, that consists in executing the parallel simulation of the considered circuit for a certain number of cycles. During the early phase of design, compilation phase has to be repeated each time a bug is found. Thus, if the time of the compilation phase is too high, the advantages stemming from the parallel approach may be lost. In this work we propose an effective version of the compilation phase and compute the corresponding execution time. We also analyze the percentage of execution time required by the different steps of the compilation phase for a set of literature benchmarks. Further, we implemented the simulation phase exploiting the GPU architecture, and we computed the execution times for a set of benchmarks obtaining values comparable with literature ones. Finally, we implemented the sequential version of the Cycle-based simulation in such a way that the execution time is optimized. We used the sequential values to compute the speedup of the parallel version for the considered set of benchmarks

    Hierarchical clustered register file organization for VLIW processors

    Get PDF
    Technology projections indicate that wire delays will become one of the biggest constraints in future microprocessor designs. To avoid long wire delays and therefore long cycle times, processor cores must be partitioned into components so that most of the communication is done locally. In this paper, we propose a novel register file organization for VLIW cores that combines clustering with a hierarchical register file organization. Functional units are organized in clusters, each one with a local first level register file. The local register files are connected to a global second level register file, which provides access to memory. All intercluster communications are done through the second level register file. This paper also proposes MIRS-HC, a novel modulo scheduling technique that simultaneously performs instruction scheduling, cluster selection, inserts communication operations, performs register allocation and spill insertion for the proposed organization. The results show that although more cycles are required to execute applications, the execution time is reduced due to a shorter cycle time. In addition, the combination of clustering and hierarchy provides a larger design exploration space that trades-off performance and technology requirements.Peer ReviewedPostprint (published version

    Inherently workload-balanced clustered microarchitecture

    Get PDF
    The performance of clustered microarchitectures relies on steering schemes that try to find the best trade-off between workload balance and inter-cluster communication penalties. In previously proposed clustered processors, reducing communication penalties and balancing the workload are opposite targets, since improving one usually implies a detriment in the other. In this paper we propose a new clustered microarchitecture that can minimize communication penalties without compromising workload balance. The key idea is to arrange the clusters in a ring topology in such a way that results of one cluster can be forwarded to the neighbor cluster with a very short latency. In this way, minimizing communication penalties is favored when the producer of a value and its consumer are placed in adjacent clusters, which also favors workload balance. The proposed microarchitecture is shown to outperform a state-of-the-art clustered processor. For instance, for an 8-cluster configuration and just one fully pipelined unidirectional bus, 15% speedup is achieved on average for FP programs.Peer ReviewedPostprint (published version
    corecore