27,264 research outputs found
QAmplifyNet: Pushing the Boundaries of Supply Chain Backorder Prediction Using Interpretable Hybrid Quantum - Classical Neural Network
Supply chain management relies on accurate backorder prediction for
optimizing inventory control, reducing costs, and enhancing customer
satisfaction. However, traditional machine-learning models struggle with
large-scale datasets and complex relationships, hindering real-world data
collection. This research introduces a novel methodological framework for
supply chain backorder prediction, addressing the challenge of handling large
datasets. Our proposed model, QAmplifyNet, employs quantum-inspired techniques
within a quantum-classical neural network to predict backorders effectively on
short and imbalanced datasets. Experimental evaluations on a benchmark dataset
demonstrate QAmplifyNet's superiority over classical models, quantum ensembles,
quantum neural networks, and deep reinforcement learning. Its proficiency in
handling short, imbalanced datasets makes it an ideal solution for supply chain
management. To enhance model interpretability, we use Explainable Artificial
Intelligence techniques. Practical implications include improved inventory
control, reduced backorders, and enhanced operational efficiency. QAmplifyNet
seamlessly integrates into real-world supply chain management systems, enabling
proactive decision-making and efficient resource allocation. Future work
involves exploring additional quantum-inspired techniques, expanding the
dataset, and investigating other supply chain applications. This research
unlocks the potential of quantum computing in supply chain optimization and
paves the way for further exploration of quantum-inspired machine learning
models in supply chain management. Our framework and QAmplifyNet model offer a
breakthrough approach to supply chain backorder prediction, providing superior
performance and opening new avenues for leveraging quantum-inspired techniques
in supply chain management
A Tutorial on Clique Problems in Communications and Signal Processing
Since its first use by Euler on the problem of the seven bridges of
K\"onigsberg, graph theory has shown excellent abilities in solving and
unveiling the properties of multiple discrete optimization problems. The study
of the structure of some integer programs reveals equivalence with graph theory
problems making a large body of the literature readily available for solving
and characterizing the complexity of these problems. This tutorial presents a
framework for utilizing a particular graph theory problem, known as the clique
problem, for solving communications and signal processing problems. In
particular, the paper aims to illustrate the structural properties of integer
programs that can be formulated as clique problems through multiple examples in
communications and signal processing. To that end, the first part of the
tutorial provides various optimal and heuristic solutions for the maximum
clique, maximum weight clique, and -clique problems. The tutorial, further,
illustrates the use of the clique formulation through numerous contemporary
examples in communications and signal processing, mainly in maximum access for
non-orthogonal multiple access networks, throughput maximization using index
and instantly decodable network coding, collision-free radio frequency
identification networks, and resource allocation in cloud-radio access
networks. Finally, the tutorial sheds light on the recent advances of such
applications, and provides technical insights on ways of dealing with mixed
discrete-continuous optimization problems
OS Scheduling Algorithms for Memory Intensive Workloads in Multi-socket Multi-core servers
Major chip manufacturers have all introduced multicore microprocessors.
Multi-socket systems built from these processors are routinely used for running
various server applications. Depending on the application that is run on the
system, remote memory accesses can impact overall performance. This paper
presents a new operating system (OS) scheduling optimization to reduce the
impact of such remote memory accesses. By observing the pattern of local and
remote DRAM accesses for every thread in each scheduling quantum and applying
different algorithms, we come up with a new schedule of threads for the next
quantum. This new schedule potentially cuts down remote DRAM accesses for the
next scheduling quantum and improves overall performance. We present three such
new algorithms of varying complexity followed by an algorithm which is an
adaptation of Hungarian algorithm. We used three different synthetic workloads
to evaluate the algorithm. We also performed sensitivity analysis with respect
to varying DRAM latency. We show that these algorithms can cut down DRAM access
latency by up to 55% depending on the algorithm used. The benefit gained from
the algorithms is dependent upon their complexity. In general higher the
complexity higher is the benefit. Hungarian algorithm results in an optimal
solution. We find that two out of four algorithms provide a good trade-off
between performance and complexity for the workloads we studied
Performance Models for Split-execution Computing Systems
Split-execution computing leverages the capabilities of multiple
computational models to solve problems, but splitting program execution across
different computational models incurs costs associated with the translation
between domains. We analyze the performance of a split-execution computing
system developed from conventional and quantum processing units (QPUs) by using
behavioral models that track resource usage. We focus on asymmetric processing
models built using conventional CPUs and a family of special-purpose QPUs that
employ quantum computing principles. Our performance models account for the
translation of a classical optimization problem into the physical
representation required by the quantum processor while also accounting for
hardware limitations and conventional processor speed and memory. We conclude
that the bottleneck in this split-execution computing system lies at the
quantum-classical interface and that the primary time cost is independent of
quantum processor behavior.Comment: Presented at 18th Workshop on Advances in Parallel and Distributed
Computational Models [APDCM2016] on 23 May 2016; 10 page
Flight Gate Assignment with a Quantum Annealer
Optimal flight gate assignment is a highly relevant optimization problem from
airport management. Among others, an important goal is the minimization of the
total transit time of the passengers. The corresponding objective function is
quadratic in the binary decision variables encoding the flight-to-gate
assignment. Hence, it is a quadratic assignment problem being hard to solve in
general. In this work we investigate the solvability of this problem with a
D-Wave quantum annealer. These machines are optimizers for quadratic
unconstrained optimization problems (QUBO). Therefore the flight gate
assignment problem seems to be well suited for these machines. We use real
world data from a mid-sized German airport as well as simulation based data to
extract typical instances small enough to be amenable to the D-Wave machine. In
order to mitigate precision problems, we employ bin packing on the passenger
numbers to reduce the precision requirements of the extracted instances. We
find that, for the instances we investigated, the bin packing has little effect
on the solution quality. Hence, we were able to solve small problem instances
extracted from real data with the D-Wave 2000Q quantum annealer.Comment: Updated figure
Noise-Adaptive Compiler Mappings for Noisy Intermediate-Scale Quantum Computers
A massive gap exists between current quantum computing (QC) prototypes, and
the size and scale required for many proposed QC algorithms. Current QC
implementations are prone to noise and variability which affect their
reliability, and yet with less than 80 quantum bits (qubits) total, they are
too resource-constrained to implement error correction. The term Noisy
Intermediate-Scale Quantum (NISQ) refers to these current and near-term systems
of 1000 qubits or less. Given NISQ's severe resource constraints, low
reliability, and high variability in physical characteristics such as coherence
time or error rates, it is of pressing importance to map computations onto them
in ways that use resources efficiently and maximize the likelihood of
successful runs.
This paper proposes and evaluates backend compiler approaches to map and
optimize high-level QC programs to execute with high reliability on NISQ
systems with diverse hardware characteristics. Our techniques all start from an
LLVM intermediate representation of the quantum program (such as would be
generated from high-level QC languages like Scaffold) and generate QC
executables runnable on the IBM Q public QC machine. We then use this framework
to implement and evaluate several optimal and heuristic mapping methods. These
methods vary in how they account for the availability of dynamic machine
calibration data, the relative importance of various noise parameters, the
different possible routing strategies, and the relative importance of
compile-time scalability versus runtime success. Using real-system
measurements, we show that fine grained spatial and temporal variations in
hardware parameters can be exploited to obtain an average x (and up to
x) improvement in program success rate over the industry standard IBM
Qiskit compiler.Comment: To appear in ASPLOS'1
Towards Lattice Quantum Chromodynamics on FPGA devices
In this paper we describe a single-node, double precision Field Programmable
Gate Array (FPGA) implementation of the Conjugate Gradient algorithm in the
context of Lattice Quantum Chromodynamics. As a benchmark of our proposal we
invert numerically the Dirac-Wilson operator on a 4-dimensional grid on three
Xilinx hardware solutions: Zynq Ultrascale+ evaluation board, the Alveo U250
accelerator and the largest device available on the market, the VU13P device.
In our implementation we separate software/hardware parts in such a way that
the entire multiplication by the Dirac operator is performed in hardware, and
the rest of the algorithm runs on the host. We find out that the FPGA
implementation can offer a performance comparable with that obtained using
current CPU or Intel's many core Xeon Phi accelerators. A possible multiple
node FPGA-based system is discussed and we argue that power-efficient High
Performance Computing (HPC) systems can be implemented using FPGA devices only.Comment: 17 pages, 4 figure
- …