42 research outputs found

    Towards Optimal Graph Coloring Using Rydberg Atoms

    Get PDF
    Quantum mechanics is expected to revolutionize the computing landscape in the near future. Among the many candidate technologies for building universal quantum computers, Rydberg atoms-based systems stand out for being capable of performing both quantum simulations and working as gate-based universal quantum computers while operating at room temperature through an optical system. Moreover, they can potentially scale up to hundreds of quantum bits (qubits). In this work, we solve a Graph Coloring problem by iteratively computing the solutions of Maximal Independent Set (MIS) problems, exploiting the Rydberg blockade phenomenon. Experimental results using a simulation framework on the CINECA Marconi-100 supercomputer demonstrate the validity of the proposed approach

    Comparison of heuristic approaches to PCI planning for Quantum Computers

    Get PDF
    Quantum Computing (QC) provides the possibility to develop new approaches to tackle complex problems. Real-world applications, however, cannot yet be managed directly due to the limitation of present and near-future noisy intermediate-scale quantum (NISQ) computers. Decomposition into smaller and manageable subproblems is often needed to take advantage of QC even when using hybrid (classical-quantum) solvers or solvers that already apply decomposition techniques. In this paper, heuristic decomposition algorithms to solve the Physical Cell Identifier (PCI) problem in 4G cellular networks in a way suitable for QC are presented. The PCI problem can be viewed as a map coloring problem with additional constraints and has been represented in a Quadratic Unconstrained Binary Optimization (QUBO) model, a form that, for instance, a quantum annealing machine could crunch. We propose two strategies, with variable decomposition granularity. The first one solves the problem recursively through bisection (max-cut problem), to use only one qubit to represent the status of the objects, avoiding one-hot encoding and thus minimizing the qubit requirement. The second is a multi-step approach, finally solving sets of randomized modified max-k-cut problems of customizable qubit size. We executed the algorithms on real cellular networks of one of the main Italian national telecom operators (TIM). The results show that all proposed QUBO approaches can be effectively applied to very large problems with similar or better performance of the reference classical algorithm, paving the way for the use on NISQ computers

    Deep Learning for real-time neural decoding of grasp

    Full text link
    Neural decoding involves correlating signals acquired from the brain to variables in the physical world like limb movement or robot control in Brain Machine Interfaces. In this context, this work starts from a specific pre-existing dataset of neural recordings from monkey motor cortex and presents a Deep Learning-based approach to the decoding of neural signals for grasp type classification. Specifically, we propose here an approach that exploits LSTM networks to classify time series containing neural data (i.e., spike trains) into classes representing the object being grasped. The main goal of the presented approach is to improve over state-of-the-art decoding accuracy without relying on any prior neuroscience knowledge, and leveraging only the capability of deep learning models to extract correlations from data. The paper presents the results achieved for the considered dataset and compares them with previous works on the same dataset, showing a significant improvement in classification accuracy, even if considering simulated real-time decoding

    Towards a Scalable Software Defined Network-on-Chip for Next Generation Cloud

    Get PDF
    The rapid evolution of Cloud-based services and the growing interest in deep learning (DL)-based applications is putting increasing pressure on hyperscalers and general purpose hardware designers to provide more efficient and scalable systems. Cloud-based infrastructures must consist of more energy efficient components. The evolution must take place from the core of the infrastructure (i.e., data centers (DCs)) to the edges (Edge computing) to adequately support new/future applications. Adaptability/elasticity is one of the features required to increase the performance-to-power ratios. Hardware-based mechanisms have been proposed to support system reconfiguration mostly at the processing elements level, while fewer studies have been carried out regarding scalable, modular interconnected sub-systems. In this paper, we propose a scalable Software Defined Network-on-Chip (SDNoC)-based architecture. Our solution can easily be adapted to support devices ranging from low-power computing nodes placed at the edge of the Cloud to high-performance many-core processors in the Cloud DCs, by leveraging on a modular design approach. The proposed design merges the benefits of hierarchical network-on-chip (NoC) topologies (via fusing the ring and the 2D-mesh topology), with those brought by dynamic reconfiguration (i.e., adaptation). Our proposed interconnect allows for creating different types of virtualised topologies aiming at serving different communication requirements and thus providing better resource partitioning (virtual tiles) for concurrent tasks. To further allow the software layer controlling and monitoring of the NoC subsystem, a few customised instructions supporting a data-driven program execution model (PXM) are added to the processing element’s instruction set architecture (ISA). In general, the data-driven programming and execution models are suitable for supporting the DL applications. We also introduce a mechanism to map a high-level programming language embedding concurrent execution models into the basic functionalities offered by our SDNoC for easing the programming of the proposed system. In the reported experiments, we compared our lightweight reconfigurable architecture to a conventional flattened 2D-mesh interconnection subsystem. Results show that our design provides an increment of the data traffic throughput of 9.5% and a reduction of 2.2× of the average packet latency, compared to the flattened 2D-mesh topology connecting the same number of processing elements (PEs) (up to 1024 cores). Similarly, power and resource (on FPGA devices) consumption is also low, confirming good scalability of the proposed architecture

    Architectural Simulation in the Kilo-core Era

    No full text
    The continuos improvements offered by the silicon technology make possible the integration of always increasing number of cores on a single chip. Following this trend, it is expected to approach microprocessor architectures composed of thousands of cores (i.e., kilo-cores architectures) in the next future. In this context simulation tools represent a crucial factor for designing architecture at such core scale. This paper proposes a framework based on the COTSon simulator [2], and able of scaling towards heterogeous kilo-cores architectures. Compared with current state-of the-art architectural simulators, the proposed framework provides full-system simulation, a well balanced trade-off between simulation speed and accuracy, and the support for power consumption estimation. Experimental results confirm the ability of the framework to scale up at least to 2000 cores

    Dynamic power reduction in self-adaptive embedded systems through benchmark analysis

    No full text
    Discovering the most appropriate reconfiguration instants for improving performance and lowering power consumption is not a trivial problem. In this paper we show the benefit in terms of performance gain and power reduction of the dynamic adaptation (e.g., cache size, clock frequency, and core issue-width) of an embedded platform, through a design space exploration campaign, and focusing on a relevant case study. To this end, we analyze a set of benchmarks belonging to the embedded application domain with the aim of illustrating how the appropriate selection of reconfiguration instants can positively influence system performance and power consumption. Experimental results using the cjpeg benchmark show that power consumption can be reduced by an average of 22%. Our methodology can be used to create a set of run-time management policies for driving the adaptation process

    Enabling Massive Multi-Threading with Fast Hashing

    No full text
    © 2017 IEEE. The next generation of high-performance computers is expected to execute threads in orders of magnitude higher than today\u27s systems. Improper management of such huge amount of threads can create resource contention, leading to overall degraded system performance. By leveraging more practical approaches to distribute threads on the available resources, execution models and manycore chips are expected to overcome limitations of current systems. Here, we present DELTA - a Data-Enabled muLti-Threaded Architecture, where a producer-consumer scheme is used to execute threads via complete distributed thread management mechanism. We consider a manycore tiled-chip architecture where Network-on-Chip (NoC) routers are extended to support our execution model. The proposed extension is analysed, while simulation results confirm that DELTA can manage a large number of simultaneous threads, relying on a simple hardware structure
    corecore