7 research outputs found

    Penerapan Model Pembelajaran Problem Based Learning (Pbl) untuk Meningkatkan Hasil Belajar Siswa pada Pokok Bahasan Kelarutan dan Hasil Kali Kelarutan di Kelas XI IPA SMA Negeri 1 Kampar

    Get PDF
    Research was aimed to improve study result of student on solubility and result times solubility subject has been doing in class XI science SMAN 1 Kampar. This research was a form of experiments research with design pretest-posttest. Sample of the research were student of class XI science 2 as experimental class and class XI science 3 as control class. The experimental class was applied learning model problem base learning while the control class using discussion method. Data were analized using t- test. Result from the data analysis showed t count > t table (1,6923 > 1,68). It means learning model problem based learning can increase study result of student on solubility and result times solubility subject in class XI science SMAN 1 Kampar with category of increase study results on the solubility and result times solubility in class XI science is high category

    Scalable and Broad Hardware Acceleration through Practical Speculative Parallelism

    No full text
    With the slowing down of Moore’s Law, silicon fabrication technology is not yielding the performance improvements it once did. Hardware accelerators, which tailor their architecture to a specific application or domain, have emerged as an attractive approach to improve performance. Unfortunately, current accelerators have been limited to domains such as deep learning, where parallelism is easy to exploit. Many applications do not have such easy-to-extract parallelism and have remained off-limits to accelerators. This thesis presents techniques to build accelerators for applications with speculative parallelism. These applications consist of atomic tasks, sometimes with order constraints, and need speculative execution to extract parallelism. In speculative execution, tasks are executed in parallel assuming they are independent. A runtime system monitors their execution to see if they are. If a task produces a conflict during execution, i.e., if it may violate a data dependence, then it is aborted and re-executed. This thesis proposes Chronos, a framework-based approach for building accelerators that use speculation to extract parallelism. Under Chronos, accelerator designers express the algorithm as a set of ordered tasks, and then design processing elements (PEs) to execute each of these tasks. The framework provides reusable components for task management and speculative execution, saving most of the developer effort in creating accelerators for new applications. Prior general-purpose architectures have leveraged already existing techniques, like cache-coherence protocols, for conflict detection, but implementing coherence would add complexity, latency and significant on-chip storage requirement, making these techniques expensive on accelerators. To tackle this challenge, we first propose a new execution model, Spatially Located Ordered Tasks (SLOT), that uses order as the only synchronization mechanism and limits task accesses to a single read-write object. We then use SLOT to implement the Chronos framework. This implementation avoids the need for cache coherence and makes speculative execution cheap and distributed. This reduces overheads and improves performance by up to 2× over prior conflict detection techniques. While SLOT achieves excellent performance on many algorithms, it is sometimes desirable to allow a single task to access multiple objects. Thus, we extend Chronos to support the more general Swarm execution model, which allows this and is also easier to program. This Chronos-Swarm implementation improves performance when Swarm’s features are needed, but it hurts performance when they are not, as the Swarm execution model requires more expensive conflict checks on each memory access. To bridge this gap, we introduce a hybrid SLOT/Swarm execution model that combines the generality and ease-of-programming of Swarm with the performance of SLOT. We develop FPGA implementations of Chronos and use them to build accelerators for several challenging applications. When run on cloud FPGA instances, these accelerators outperform state-of-the-art software versions running on a higher-priced multicore instance by 3.5× to 15.3×.Ph.D

    Optimizing throughput architectures for speculative parallelism

    No full text
    Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.Cataloged from PDF version of thesis.Includes bibliographical references (pages 57-62).Throughput-oriented architectures, like GPUs, use a large number of simple cores and rely on application-level parallelism, using multithreading to keep the cores busy. These architectures work well when parallelism is plentiful but work poorly when its not. Therefore, it is important to combine these techniques with other hardware support for parallelizing challenging applications. Recent work has shown that speculative parallelism is plentiful for a large class of applications that have traditionally been hard to parallelize. However, adding hardware support for speculative parallelism to a throughput-oriented system leads to a severe pathology: aborted work consumes scarce resources and hurts the throughput of useful work. This thesis develops a technique to optimize throughput-oriented architectures for speculative parallelism: tasks should be prioritized according to how speculative they are. This focuses resources on work that is more likely to commit, reducing aborts and using speculation resources more efficiently. We identify two on-chip resources where this prioritization is most likely to help, the core pipeline and the memory controller. First, this thesis presents speculation-aware multithreading (SAM), a simple policy that modifies a multithreaded processor pipeline to prioritize instructions from less speculative tasks. Second, we modify the on-chip memory controller to prioritize requests issued by tasks that are earlier in the conflict resolution order. We evaluate SAM on systems with up to 64 SMT cores. With SAM, 8-threaded in-order cores outperform single-threaded cores by 2.41 x on average, while a speculation-oblivious policy yields a 1.91 x speedup. SAM also reduces wasted work by 43%. Unlike at the core, we find little performance benefit from prioritizing requests at the memory controller. The reason is that speculative execution works as a very effective prefetching mechanism, and most requests, even those from tasks that are ultimately aborted, do end up being useful.by Weeraratna Patabendige Maleen Hasanka Abeydeera.S.M

    Chronos: Efficient Speculative Parallelism for Accelerators

    Get PDF

    Exploiting Locality in Graph Analytics through Hardware-Accelerated Traversal Scheduling

    No full text
    Graph processing is increasingly bottlenecked by main memory accesses. On-chip caches are of little help because the irregular structure of graphs causes seemingly random memory references. However, most real-world graphs offer significant potential locality-it is just hard to predict ahead of time. In practice, graphs have well-connected regions where relatively few vertices share edges with many common neighbors. If these vertices were processed together, graph processing would enjoy significant data reuse. Hence, a graph's traversal schedule largely determines its locality. This paper explores online traversal scheduling strategies that exploit the community structure of real-world graphs to improve locality. Software graph processing frameworks use simple, locality-oblivious scheduling because, on general-purpose cores, the benefits of locality-Aware scheduling are outweighed by its overheads. Software frameworks rely on offline preprocessing to improve locality. Unfortunately, preprocessing is so expensive that its costs often negate any benefits from improved locality. Recent graph processing accelerators have inherited this design. Our insight is that this misses an opportunity: Hardware acceleration allows for more sophisticated, online locality-Aware scheduling than can be realized in software, letting systems significantly improve locality without any preprocessing. To exploit this insight, we present bounded depth-first scheduling (BDFS), a simple online locality-Aware scheduling strategy. BDFS restricts each core to explore one small, connected region of the graph at a time, improving locality on graphs with good community structure. We then present HATS, a hardware-Accelerated traversal scheduler that adds just 0.4% area and 0.2% power over general-purpose cores. We evaluate BDFS and HATS on several algorithms using large real-world graphs. On a simulated 16-core system, BDFS reduces main memory accesses by up to 2.4x and by 30% on average. However, BDFS is too expensive in software and degrades performance by 21% on average. HATS eliminates these overheads, allowing BDFS to improve performance by 83% on average (up to 3.1x) over a locality-oblivious software implementation and by 31% on average (up to 2.1x) over specialized prefetchers.National Science Foundation (Grant CAREER-1452994

    Fractal: An Execution Model for Fine-Grain Nested Speculative Parallelism

    No full text
    Most systems that support speculative parallelization, like hardware transactional memory (HTM), do not support nested parallelism. This sacrifices substantial parallelism and precludes composing parallel algorithms. And the few HTMs that do support nested parallelism focus on parallelizing at the coarsest (shallowest) levels, incurring large overheads that squander most of their potential. We present FRACTAL, a new execution model that supports unordered and timestamp-ordered nested parallelism. FRACTAL lets programmers seamlessly compose speculative parallel algorithms, and lets the architecture exploit parallelism at all levels. FRACTAL can parallelize a broader range of applications than prior speculative execution models. We design a FRACTAL implementation that extends the Swarm architecture and focuses on parallelizing at the finest (deepest) levels. Our approach sidesteps the issues of nested parallel HTMs and uncovers abundant fine-grain parallelism. As a result, FRACTAL outperforms prior speculative architectures by up to 88x at 256 cores.</jats:p

    Optimizing ordered graph algorithms with GraphIt

    No full text
    © 2020 Copyright held by the owner/author(s). Many graph problems can be solved using ordered parallel graph algorithms that achieve significant speedup over their unordered counterparts by reducing redundant work. This paper introduces a new priority-based extension to GraphIt, a domain-specific language for writing graph applications, to simplify writing high-performance parallel ordered graph algorithms. The extension enables vertices to be processed in a dynamic order while hiding low-level implementation details from the user. We extend the compiler with new program analyses, transformations, and code generation to produce fast implementations of ordered parallel graph algorithms. We also introduce bucket fusion, a new performance optimization that fuses together different rounds of ordered algorithms to reduce synchronization overhead, resulting in 1.2×-3× speedup over the fastest existing ordered algorithm implementations on road networks with large diameters. With the extension, GraphIt achieves up to 3× speedup on six ordered graph algorithms over state-of-the-art frameworks and hand-optimized implementations (Julienne, Galois, and GAPBS) that support ordered algorithms
    corecore