46,333 research outputs found

    Speeding-up graph processing on shared-memory platforms by optimizing scheduling and compute

    No full text
    Graph processing workloads are being widely used in many domains such as computational biology, social network analysis, and financial analysis. As DRAM technology scales down into higher densities, shared-memory platforms gain increasing importance in handling large graph sizes. We study two main categories of graph algorithms from an implementation perspective. Topology-driven algorithms process all vertices of the graph at each iteration, while data-driven algorithms only process those vertices that make a substantial contribution to the output. Furthermore, the performance of a graph algorithm execution can be broken down into three components, namely, pre-processing, compute, and scheduling. For data-driven algorithms, the work of each thread is driven by the dependencies between vertex values that are known only at run-time. Hence, the scheduling will take a significant portion of execution. However, for topology-driven algorithms, the scheduling time is negligible since the work of each thread can be determined at compile-time. In this dissertation, we present three techniques to address the performance bottlenecks in both data-driven and topology-driven algorithms. First, we present Snug, which is a chip-level architecture that mitigates the trade-off between synchronization and wasted work in data-driven algorithms. Second, we present V-Combiner, which is a software-only technique to mitigate the trade-off between performance and accuracy of topology-driven algorithms using novel vertex-merging and recovery mechanisms. Finally, we present KeepCompressed, which is a set of algorithms to speed-up compute for topology-driven algorithms using vertex clustering for dynamic graphs.LimitedAuthor requested closed access (OA after 2yrs) in Vireo ETD syste

    Key technologies for safe and autonomous drones

    Get PDF
    Drones/UAVs are able to perform air operations that are very difficult to be performed by manned aircrafts. In addition, drones' usage brings significant economic savings and environmental benefits, while reducing risks to human life. In this paper, we present key technologies that enable development of drone systems. The technologies are identified based on the usages of drones (driven by COMP4DRONES project use cases). These technologies are grouped into four categories: U-space capabilities, system functions, payloads, and tools. Also, we present the contributions of the COMP4DRONES project to improve existing technologies. These contributions aim to ease dronesÔÇÖ customization, and enable their safe operation.This project has received funding from the ECSEL Joint Undertaking (JU) under grant agreement No 826610. The JU receives support from the European UnionÔÇÖs Horizon 2020 research and innovation programme and Spain, Austria, Belgium, Czech Republic, France, Italy, Latvia, Netherlands. The total project budget is 28,590,748.75 EUR (excluding ESIF partners), while the requested grant is 7,983,731.61 EUR to ECSEL JU, and 8,874,523.84 EUR of National and ESIF Funding. The project has been started on 1st October 2019

    Reconfigurable and heterogeneous architectures for efficient computing

    No full text
    The saturation of single-thread performance, along with the advent of the power wall, has resulted in the need for efficient use of area and power budgets. With the end of Dennard scaling, and the slow down of Moore's law, scaling from one process node to another no longer delivers gains in performance or power for general-purpose computing. Thus, there is an increase in the adoption of specialized hardware, tuned to the requirements of the application or domain. These accelerators promise high performance and energy efficiency. However, with the increasing complexity and resource requirements of applications and algorithms, there is also a need for more flexibility in these accelerator platforms. Along with high performance and energy efficiency, they must be able to cope with changes at an application and algorithmic level. In the face of these challenges, this dissertation explores the use of reconfiguration to balance flexibility, performance, and energy efficiency. We begin by presenting three novel approaches that explore the use of reconfiguration in the three dominant computing devices -- CPUs, GPUs, and FPGAs. First, we consider general-purpose GPU (GPGPU) computing and highlight the inefficiencies in GPGPU, and identify opportunities to leverage reconfiguration to address these inefficiencies. Our solution is novel reconfigurable GPU architecture that can adapt to the needs of GPUs by dynamically allocating computational and memory resources among GPU cores (SMs). Second, we consider the limitations of dynamic partial reconfiguration (DPR) in modern FPGAs. We observe that while DPR is a potentially powerful technique, it is difficult to leverage. Thus, we propose an end-to-end methodology to leverage dynamic partial reconfiguration in FPGAs. The approach scales from edge to cloud devices, and presents an overlay architecture and an integer linear programming (ILP) based scheduler and mapper. We also demonstrate the ability to simultaneously map multiple applications to one FPGA, and explore different scheduling and sharing strategies. Third, we attempt to bridge the gap between the efficiency of reconfigurable computing and near-memory computing for general-purpose computing. Thus, we consider a modern multi-core CPU, and propose a novel architecture that uses SRAM arrays in the last level cache to create a reconfigurable computing fabric. Our approach is cheap, fast, energy-efficient, non-invasive, and flexible. Finally, this dissertation concludes by considering the lessons learned from exploiting reconfiguration on CPUs, GPUs, and FPGAs, and asks how a modern reconfigurable computing device should be designed. With the explosion of data, large computational workloads, and increasing demands of efficiency, we propose a new memory-centric reconfigurable architecture, capable of fast dynamic reconfiguration and altering its compute to memory ratio and organization. We demonstrate significantly higher performance, density, and memory capacity than modern FPGAs.U of I OnlyAuthor requested U of Illinois access only (OA after 2yrs) in Vireo ETD syste

    Operating systems scheduling algorithms

    No full text
    Cilj zavr┼ínog rada je implementacija i usporedba algoritama vremenskog upravljanja.. Na po─Źetku teorijskog dijela rada, opisani su procesi u operacijskim sustavima i vremensko upravljanje. Osim implementiranih algoritama, opisani su i ostali algoritmi vremenskog upravljanja u operacijskim sustavima. Na kraju teorijskog dijela rada navedeni su i op─çeniti i specifi─Źni ciljevi za pojedine operacijske sustave. U prakti─Źnom dijelu rada implementirani su algoritmi First-Come First-Served, Shortest Job Next, Round Robin i Priority Scheduling. Programski dio rada je izra─Ĺen u okru┼żenju Google Colaboratory koje podr┼żava programski jezik Python. Navedeni algoritmi su analizirani i uspore─Ĺeni na temelju prosje─Źnog vremena obrade i prosje─Źnog vremena ─Źekanja. U privitku se nalazi programski kod za svaki od navedenih algoritama.This thesis deals with the implementation and comparison of operating systems scheduling algorithms. At the beginning of the theoretical part of the work, processes in operating systems and scheduling are described. In addition to the implemented algorithms, other operating systems scheduling algorithms are also described. At the end of the theoretical part of the paper, both general and specific goals for individual operating systems are listed. In the practical part of the work, the First-Come First-Served, Shortest Job Next, Round Robin and Priority Scheduling algorithms were implemented. The programming part of the work was created in the Google Colaboratory environment, which supports the Python programming language. The mentioned algorithms were analyzed and compared based on average processing time and average waiting time. The program code for each of the mentioned algorithms is in the attachment

    The Metaverse: Survey, Trends, Novel Pipeline Ecosystem & Future Directions

    Full text link
    The Metaverse offers a second world beyond reality, where boundaries are non-existent, and possibilities are endless through engagement and immersive experiences using the virtual reality (VR) technology. Many disciplines can benefit from the advancement of the Metaverse when accurately developed, including the fields of technology, gaming, education, art, and culture. Nevertheless, developing the Metaverse environment to its full potential is an ambiguous task that needs proper guidance and directions. Existing surveys on the Metaverse focus only on a specific aspect and discipline of the Metaverse and lack a holistic view of the entire process. To this end, a more holistic, multi-disciplinary, in-depth, and academic and industry-oriented review is required to provide a thorough study of the Metaverse development pipeline. To address these issues, we present in this survey a novel multi-layered pipeline ecosystem composed of (1) the Metaverse computing, networking, communications and hardware infrastructure, (2) environment digitization, and (3) user interactions. For every layer, we discuss the components that detail the steps of its development. Also, for each of these components, we examine the impact of a set of enabling technologies and empowering domains (e.g., Artificial Intelligence, Security & Privacy, Blockchain, Business, Ethics, and Social) on its advancement. In addition, we explain the importance of these technologies to support decentralization, interoperability, user experiences, interactions, and monetization. Our presented study highlights the existing challenges for each component, followed by research directions and potential solutions. To the best of our knowledge, this survey is the most comprehensive and allows users, scholars, and entrepreneurs to get an in-depth understanding of the Metaverse ecosystem to find their opportunities and potentials for contribution

    Joint Activity Detection, Channel Estimation, and Data Decoding for Grant-free Massive Random Access

    Full text link
    In the massive machine-type communication (mMTC) scenario, a large number of devices with sporadic traffic need to access the network on limited radio resources. While grant-free random access has emerged as a promising mechanism for massive access, its potential has not been fully unleashed. In particular, the common sparsity pattern in the received pilot and data signal has been ignored in most existing studies, and auxiliary information of channel decoding has not been utilized for user activity detection. This paper endeavors to develop advanced receivers in a holistic manner for joint activity detection, channel estimation, and data decoding. In particular, a turbo receiver based on the bilinear generalized approximate message passing (BiG-AMP) algorithm is developed. In this receiver, all the received symbols will be utilized to jointly estimate the channel state, user activity, and soft data symbols, which effectively exploits the common sparsity pattern. Meanwhile, the extrinsic information from the channel decoder will assist the joint channel estimation and data detection. To reduce the complexity, a low-cost side information-aided receiver is also proposed, where the channel decoder provides side information to update the estimates on whether a user is active or not. Simulation results show that the turbo receiver is able to reduce the activity detection, channel estimation, and data decoding errors effectively, while the side information-aided receiver notably outperforms the conventional method with a relatively low complexity

    A Spatio-temporal Decomposition Method for the Coordinated Economic Dispatch of Integrated Transmission and Distribution Grids

    Full text link
    With numerous distributed energy resources (DERs) integrated into the distribution networks (DNs), the coordinated economic dispatch (C-ED) is essential for the integrated transmission and distribution grids. For large scale power grids, the centralized C-ED meets high computational burden and information privacy issues. To tackle these issues, this paper proposes a spatio-temporal decomposition algorithm to solve the C-ED in a distributed and parallel manner. In the temporal dimension, the multi-period economic dispatch (ED) of transmission grid (TG) is decomposed to several subproblems by introducing auxiliary variables and overlapping time intervals to deal with the temporal coupling constraints. Besides, an accelerated alternative direction method of multipliers (A-ADMM) based temporal decomposition algorithm with the warm-start strategy, is developed to solve the ED subproblems of TG in parallel. In the spatial dimension, a multi-parametric programming projection based spatial decomposition algorithm is developed to coordinate the ED problems of TG and DNs in a distributed manner. To further improve the convergence performance of the spatial decomposition algorithm, the aggregate equivalence approach is used for determining the feasible range of boundary variables of TG and DNs. Moreover, we prove that the proposed spatio-temporal decomposition method can obtain the optimal solution for bilevel convex optimization problems with continuously differentiable objectives and constraints. Numerical tests are conducted on three systems with different scales, demonstrating the high computational efficiency and scalability of the proposed spatio-temporal decomposition method

    One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era

    Full text link
    OpenAI has recently released GPT-4 (a.k.a. ChatGPT plus), which is demonstrated to be one small step for generative AI (GAI), but one giant leap for artificial general intelligence (AGI). Since its official release in November 2022, ChatGPT has quickly attracted numerous users with extensive media coverage. Such unprecedented attention has also motivated numerous researchers to investigate ChatGPT from various aspects. According to Google scholar, there are more than 500 articles with ChatGPT in their titles or mentioning it in their abstracts. Considering this, a review is urgently needed, and our work fills this gap. Overall, this work is the first to survey ChatGPT with a comprehensive review of its underlying technology, applications, and challenges. Moreover, we present an outlook on how ChatGPT might evolve to realize general-purpose AIGC (a.k.a. AI-generated content), which will be a significant milestone for the development of AGI.Comment: A Survey on ChatGPT and GPT-4, 29 pages. Feedback is appreciated ([email protected]

    A hybrid quantum algorithm to detect conical intersections

    Full text link
    Conical intersections are topologically protected crossings between the potential energy surfaces of a molecular Hamiltonian, known to play an important role in chemical processes such as photoisomerization and non-radiative relaxation. They are characterized by a non-zero Berry phase, which is a topological invariant defined on a closed path in atomic coordinate space, taking the value ¤Ç\pi when the path encircles the intersection manifold. In this work, we show that for real molecular Hamiltonians, the Berry phase can be obtained by tracing a local optimum of a variational ansatz along the chosen path and estimating the overlap between the initial and final state with a control-free Hadamard test. Moreover, by discretizing the path into NN points, we can use NN single Newton-Raphson steps to update our state non-variationally. Finally, since the Berry phase can only take two discrete values (0 or ¤Ç\pi), our procedure succeeds even for a cumulative error bounded by a constant; this allows us to bound the total sampling cost and to readily verify the success of the procedure. We demonstrate numerically the application of our algorithm on small toy models of the formaldimine molecule (\ce{H2C=NH}).Comment: 15 + 10 pages, 4 figure

    Barren plateaus in quantum tensor network optimization

    Get PDF
    We analyze the barren plateau phenomenon in the variational optimization of quantum circuits inspired by matrix product states (qMPS), tree tensor networks (qTTN), and the multiscale entanglement renormalization ansatz (qMERA). We consider as the cost function the expectation value of a Hamiltonian that is a sum of local terms. For randomly chosen variational parameters we show that the variance of the cost function gradient decreases exponentially with the distance of a Hamiltonian term from the canonical centre in the quantum tensor network. Therefore, as a function of qubit count, for qMPS most gradient variances decrease exponentially and for qTTN as well as qMERA they decrease polynomially. We also show that the calculation of these gradients is exponentially more efficient on a classical computer than on a quantum computer
    • ÔÇŽ
    corecore