4,822 research outputs found
Resource Optimized Quantum Architectures for Surface Code Implementations of Magic-State Distillation
Quantum computers capable of solving classically intractable problems are
under construction, and intermediate-scale devices are approaching completion.
Current efforts to design large-scale devices require allocating immense
resources to error correction, with the majority dedicated to the production of
high-fidelity ancillary states known as magic-states. Leading techniques focus
on dedicating a large, contiguous region of the processor as a single
"magic-state distillation factory" responsible for meeting the magic-state
demands of applications. In this work we design and analyze a set of optimized
factory architectural layouts that divide a single factory into spatially
distributed factories located throughout the processor. We find that
distributed factory architectures minimize the space-time volume overhead
imposed by distillation. Additionally, we find that the number of distributed
components in each optimal configuration is sensitive to application
characteristics and underlying physical device error rates. More specifically,
we find that the rate at which T-gates are demanded by an application has a
significant impact on the optimal distillation architecture. We develop an
optimization procedure that discovers the optimal number of factory
distillation rounds and number of output magic states per factory, as well as
an overall system architecture that interacts with the factories. This yields
between a 10x and 20x resource reduction compared to commonly accepted single
factory designs. Performance is analyzed across representative application
classes such as quantum simulation and quantum chemistry.Comment: 16 pages, 14 figure
Physical Representation-based Predicate Optimization for a Visual Analytics Database
Querying the content of images, video, and other non-textual data sources
requires expensive content extraction methods. Modern extraction techniques are
based on deep convolutional neural networks (CNNs) and can classify objects
within images with astounding accuracy. Unfortunately, these methods are slow:
processing a single image can take about 10 milliseconds on modern GPU-based
hardware. As massive video libraries become ubiquitous, running a content-based
query over millions of video frames is prohibitive.
One promising approach to reduce the runtime cost of queries of visual
content is to use a hierarchical model, such as a cascade, where simple cases
are handled by an inexpensive classifier. Prior work has sought to design
cascades that optimize the computational cost of inference by, for example,
using smaller CNNs. However, we observe that there are critical factors besides
the inference time that dramatically impact the overall query time. Notably, by
treating the physical representation of the input image as part of our query
optimization---that is, by including image transforms, such as resolution
scaling or color-depth reduction, within the cascade---we can optimize data
handling costs and enable drastically more efficient classifier cascades.
In this paper, we propose Tahoma, which generates and evaluates many
potential classifier cascades that jointly optimize the CNN architecture and
input data representation. Our experiments on a subset of ImageNet show that
Tahoma's input transformations speed up cascades by up to 35 times. We also
find up to a 98x speedup over the ResNet50 classifier with no loss in accuracy,
and a 280x speedup if some accuracy is sacrificed.Comment: Camera-ready version of the paper submitted to ICDE 2019, In
Proceedings of the 35th IEEE International Conference on Data Engineering
(ICDE 2019
An Assurance Framework for Independent Co-assurance of Safety and Security
Integrated safety and security assurance for complex systems is difficult for
many technical and socio-technical reasons such as mismatched processes,
inadequate information, differing use of language and philosophies, etc.. Many
co-assurance techniques rely on disregarding some of these challenges in order
to present a unified methodology. Even with this simplification, no methodology
has been widely adopted primarily because this approach is unrealistic when met
with the complexity of real-world system development.
This paper presents an alternate approach by providing a Safety-Security
Assurance Framework (SSAF) based on a core set of assurance principles. This is
done so that safety and security can be co-assured independently, as opposed to
unified co-assurance which has been shown to have significant drawbacks. This
also allows for separate processes and expertise from practitioners in each
domain. With this structure, the focus is shifted from simplified unification
to integration through exchanging the correct information at the right time
using synchronisation activities
Validating a timing simulator for the NGMP multicore processor
Timing simulation is a key element in multicore systems design. It enables a fast and cost effective design space exploration, allowing to simulate new architectural improvements without requiring RTL abstraction levels. Timing simulation also allows software developers to perform early testing of the timing behavior of their software without the need of buying the actual physical board, which can be very expensive when the board uses non-COTS technology. In this paper we present the validation of a timing simulator for the NGMP multicore processor, which is a 4 core processor being developed to become the reference platform for future missions of the European Space Agency.The research leading to these results has received funding from the European Space Agency under contract NPI 4000102880 and the Ministry of Science and Technology of
Spain under contract TIN-2015-65316-P. Jaume Abella has been partially supported by the Ministry of Economy and Competitiveness under Ramon y Cajal postdoctoral fellowship
number RYC-2013-14717.Peer ReviewedPostprint (author's final draft
Multitask Learning Deep Neural Networks to Combine Revealed and Stated Preference Data
It is an enduring question how to combine revealed preference (RP) and stated
preference (SP) data to analyze travel behavior. This study presents a
framework of multitask learning deep neural networks (MTLDNNs) for this
question, and demonstrates that MTLDNNs are more generic than the traditional
nested logit (NL) method, due to its capacity of automatic feature learning and
soft constraints. About 1,500 MTLDNN models are designed and applied to the
survey data that was collected in Singapore and focused on the RP of four
current travel modes and the SP with autonomous vehicles (AV) as the one new
travel mode in addition to those in RP. We found that MTLDNNs consistently
outperform six benchmark models and particularly the classical NL models by
about 5% prediction accuracy in both RP and SP datasets. This performance
improvement can be mainly attributed to the soft constraints specific to
MTLDNNs, including its innovative architectural design and regularization
methods, but not much to the generic capacity of automatic feature learning
endowed by a standard feedforward DNN architecture. Besides prediction, MTLDNNs
are also interpretable. The empirical results show that AV is mainly the
substitute of driving and AV alternative-specific variables are more important
than the socio-economic variables in determining AV adoption. Overall, this
study introduces a new MTLDNN framework to combine RP and SP, and demonstrates
its theoretical flexibility and empirical power for prediction and
interpretation. Future studies can design new MTLDNN architectures to reflect
the speciality of RP and SP and extend this work to other behavioral analysis
Near-Memory Address Translation
Memory and logic integration on the same chip is becoming increasingly cost
effective, creating the opportunity to offload data-intensive functionality to
processing units placed inside memory chips. The introduction of memory-side
processing units (MPUs) into conventional systems faces virtual memory as the
first big showstopper: without efficient hardware support for address
translation MPUs have highly limited applicability. Unfortunately, conventional
translation mechanisms fall short of providing fast translations as
contemporary memories exceed the reach of TLBs, making expensive page walks
common.
In this paper, we are the first to show that the historically important
flexibility to map any virtual page to any page frame is unnecessary in today's
servers. We find that while limiting the associativity of the
virtual-to-physical mapping incurs no penalty, it can break the
translate-then-fetch serialization if combined with careful data placement in
the MPU's memory, allowing for translation and data fetch to proceed
independently and in parallel. We propose the Distributed Inverted Page Table
(DIPTA), a near-memory structure in which the smallest memory partition keeps
the translation information for its data share, ensuring that the translation
completes together with the data fetch. DIPTA completely eliminates the
performance overhead of translation, achieving speedups of up to 3.81x and
2.13x over conventional translation using 4KB and 1GB pages respectively.Comment: 15 pages, 9 figure
- …