518 research outputs found
BiSon-e: A Lightweight and High-Performance Accelerator for Narrow Integer Linear Algebra Computing on the Edge
Linear algebra computational kernels based on byte and sub-byte integer data formats are at the base of many classes of applications, ranging from Deep Learning to Pattern Matching. Porting the computation of these applications from cloud to edge and mobile devices would enable significant improvements in terms of security, safety, and energy efficiency. However, despite their low memory and energy demands, their intrinsically high computational intensity makes the execution of these workloads challenging on highly resource-constrained devices. In this paper, we present BiSon-e, a novel RISC-V based architecture that accelerates linear algebra kernels based on narrow integer computations on edge processors by performing Single Instruction Multiple Data (SIMD) operations on off-The-shelf scalar Functional Units (FUs). Our novel architecture is built upon the binary segmentation technique, which allows to significantly reduce the memory footprint and the arithmetic intensity of linear algebra kernels requiring narrow data sizes. We integrate BiSon-e into a complete System-on-Chip (SoC) based on RISC-V, synthesized and Place Routed in 65nm and 22nm technologies, introducing a negligible 0.07% area overhead with respect to the baseline architecture. Our experimental evaluation shows that, when computing the Convolution and Fully-Connected layers of the AlexNet and VGG-16 Convolutional Neural Networks (CNNs) with 8-, 4-, and 2-bit, our solution gains up to 5.6Ă, 13.9Ă and 24Ă in execution time compared to the scalar implementation of a single RISC-V core, and improves the energy efficiency of string matching tasks by 5Ă when compared to a RISC-V-based Vector Processing Unit (VPU)
Domain Walls and Metastable Vacua in Hot Orientifold Field Theories
We consider "Orientifold field theories", namely SU(N) gauge theories with
Dirac fermions in the two-index representation at high temperature. When N is
even these theories exhibit a spontaneously broken Z2 centre symmetry. We study
aspects of the domain wall that interpolates between the two vacua of the
theory. In particular we calculate its tension to two-loop order. We compare
its tension to the corresponding domain wall in a SU(N) gauge theory with
adjoint fermions and find an agreement at large-N, as expected from planar
equivalence between the two theories. Moreover, we provide a non-perturbative
proof for the coincidence of the tensions at large-N. We also discuss the
vacuum structure of the theory when the fermion is given a large mass and argue
that there exist N-2 metastable vacua. We calculate the lifetime of those vacua
in the thin wall approximation.Comment: 29 pages, 4 figures. v2: minor changes in the introduction section.
to appear in JHE
Discretized Bayesian pursuit â A new scheme for reinforcement learning
The success of Learning Automata (LA)-based estimator algorithms over the classical, Linear Reward-Inaction ( L RI )-like schemes, can be explained by their ability to pursue the actions with the highest reward probability estimates. Without access to reward probability estimates, it makes sense for schemes like the L RI to first make large exploring steps, and then to gradually turn exploration into exploitation by making progressively smaller learning steps. However, this behavior becomes counter-intuitive when pursuing actions based on their estimated reward probabilities. Learning should then ideally proceed in progressively larger steps, as the reward probability estimates turn more accurate. This paper introduces a new estimator algorithm, the Discretized Bayesian Pursuit Algorithm (DBPA), that achieves this. The DBPA is implemented by linearly discretizing the action probability space of the Bayesian Pursuit Algorithm (BPA) [1]. The key innovation is that the linear discrete updating rules mitigate the counter-intuitive behavior of the corresponding linear continuous updating rules, by augmenting them with the reward probability estimates. Extensive experimental results show the superiority of DBPA over previous estimator algorithms. Indeed, the DBPA is probably the fastest reported LA to date
Induced chromosome deletions cause hypersociability and other features of Williams-Beuren syndrome in mice
The neurodevelopmental disorder Williams-Beuren syndrome is caused by spontaneous similar to 1.5 Mb deletions comprising 25 genes on human chromosome 7q11.23. To functionally dissect the deletion and identify dosage-sensitive genes, we created two half-deletions of the conserved syntenic region on mouse chromosome 5G2. Proximal deletion (PD) mice lack Gtf2i to Limk1, distal deletion (DD) mice lack Limk1 to Fkbp6, and the double heterozygotes (D/P) model the complete human deletion. Gene transcript levels in brain are generally consistent with gene dosage. Increased sociability and acoustic startle response are associated with PD, and cognitive defects with DD. Both PD and D/P males are growth-retarded, while skulls are shortened and brains are smaller in DD and D/P. Lateral ventricle (LV) volumes are reduced, and neuronal cell density in the somatosensory cortex is increased, in PD and D/P. Motor skills are most impaired in D/P. Together, these partial deletion mice replicate crucial aspects of the human disorder and serve to identify genes and gene networks contributing to the neural substrates of complex behaviours and behavioural disorders
What can we learn from the implementation of monetary and macroprudential policies: a systematic literature review
The emergence of macroprudential policies, implemented by central banks as a means of promoting financial stability, has raised many questions regarding the interaction between monetary and macroprudential policies. Given the limited number of studies available, this paper sheds light on this issue by providing a critical and systematic review of the literature. To this end, we divide the theoretical and empirical studies into two broad channels of borrowers - consisting of the cost of funds and the collateral constraint - and financial intermediaries - consisting of risk-taking and payment systems. In spite of the existing ambiguity surrounding coordination issues between monetary and macroprudential policies, it is argued that monetary policy alone is not sufficient to maintain macroeconomic and financial stability. Hence, macroprudential policies are needed to supplement monetary. Additionally, we find that the role of the exchange rate is critical in the implementation of monetary and macroprudential policies in emerging markets, whilst volatile capital flows pose another challenge. In so far as how the arrangement of monetary and macroprudential policies varies across countries, key theoretical and policy implications have been identified
Potential conservation of circadian clock proteins in the phylum Nematoda as revealed by bioinformatic searches
Although several circadian rhythms have been described in C. elegans, its molecular clock remains elusive. In this work we employed a novel bioinformatic approach, applying probabilistic methodologies, to search for circadian clock proteins of several of the best studied circadian model organisms of different taxa (Mus musculus, Drosophila melanogaster, Neurospora crassa, Arabidopsis thaliana and Synechoccocus elongatus) in the proteomes of C. elegans and other members of the phylum Nematoda. With this approach we found that the Nematoda contain proteins most related to the core and accessory proteins of the insect and mammalian clocks, which provide new insights into the nematode clock and the evolution of the circadian system.Fil: Romanowski, AndrĂ©s. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas. Oficina de CoordinaciĂłn Administrativa Parque Centenario. Instituto de Investigaciones BioquĂmicas de Buenos Aires. FundaciĂłn Instituto Leloir. Instituto de Investigaciones BioquĂmicas de Buenos Aires; Argentina. Universidad Nacional de Quilmes. Departamento de Ciencia y TecnologĂa. Laboratorio de CronobiologĂa; ArgentinaFil: Garavaglia, MatĂas Javier. Universidad Nacional de Quilmes. Departamento de Ciencia y TecnologĂa. Laboratorio de Ing.genĂ©tica y Biolog.molecular y Celular. Area Virus de Insectos; Argentina. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas; ArgentinaFil: Goya, MarĂa Eugenia. Universidad Nacional de Quilmes. Departamento de Ciencia y TecnologĂa. Laboratorio de CronobiologĂa; Argentina. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas; ArgentinaFil: Ghiringhelli, Pablo Daniel. Universidad Nacional de Quilmes. Departamento de Ciencia y TecnologĂa. Laboratorio de Ing.genĂ©tica y Biolog.molecular y Celular. Area Virus de Insectos; Argentina. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas; ArgentinaFil: Golombek, Diego Andres. Universidad Nacional de Quilmes. Departamento de Ciencia y TecnologĂa. Laboratorio de CronobiologĂa; Argentina. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas; Argentin
Vitruvius+: An Area-Efficient RISC-V Decoupled Vector Coprocessor for High Performance Computing Applications
The maturity level of RISC-V and the availability of domain-specific instruction set extensions, like vector processing, make RISC-V a good candidate for supporting the integration of specialized hardware in processor cores for the High Performance Computing (HPC) application domain. In this article,1 we present Vitruvius+, the vector processing acceleration engine that represents the core of vector instruction execution in the HPC challenge that comes within the EuroHPC initiative. It implements the RISC-V vector extension (RVV) 0.7.1 and can be easily connected to a scalar core using the Open Vector Interface standard. Vitruvius+ natively supports long vectors: 256 double precision floating-point elements in a single vector register. It is composed of a set of identical vector pipelines (lanes), each containing a slice of the Vector Register File and functional units (one integer, one floating point). The vector instruction execution scheme is hybrid in-order/out-of-order and is supported by register renaming and arithmetic/memory instruction decoupling. On a stand-Alone synthesis, Vitruvius+ reaches a maximum frequency of 1.4 GHz in typical conditions (TT/0.80V/25°C) using GlobalFoundries 22FDX FD-SOI. The silicon implementation has a total area of 1.3 mm2 and maximum estimated power of g1/4920 mW for one instance of Vitruvius+ equipped with eight vector lanes
- âŠ