926 research outputs found

    Self-reducing copper nanocrystals : how surface chemistry affects sintering

    Get PDF
    Copper nanocrystals (Cu NC) are actively investigated as substitutes for costly silver nanocrystals in conductive inks. However, to be of any use for printed conductors, oxidation of Cu NCs to non-conductive copper oxides must be avoided. Here, we analyze the interplay between the Cu NC surface termination, oxidation suppression and bulk copper formation through thermal annealing using 3 nm Cu NCs synthesized via thermal decomposition of copper formate in oleylamine (OLA). By adapting the method introduced by Sun et al,1 we obtain stable Cu NC dispersions that do not oxidize when stored under inert atmosphere, while showing a rapid conversion into copper oxide when exposed to air or deposited to form a thin NC film. Using solution 1H NMR spectroscopy, we demonstrate that as-synthesized Cu NCs are capped by OLA. OLA is tightly bound at NMR time scales, yet slowly desorbes during storage of the Cu dispersions, a process that is accelerated by oxygen exposure. Addition of carboxylic acids leads to the displacement of OLA from the Cu NCs and the formation of a denser ligand shell, probably consisting of dissociated carboxylic acids. We demonstrate that carboxylic acid ligands make Cu NCs more oxidation proof and facilitate the conversion of films of oxidized Cu NCs into a dense copper film. This offers the prospects of using colloidal Cu NCs as main constituent in conductive, nano-copper inks for applications in printed electronics

    A low-power, high-performance speech recognition accelerator

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Automatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile segment. Fast and accurate ASR comes at high energy cost, not being affordable for the tiny power-budgeted mobile devices. Hardware acceleration reduces energy-consumption of ASR systems, while delivering high-performance. In this paper, we present an accelerator for largevocabulary, speaker-independent, continuous speech-recognition. It focuses on the Viterbi search algorithm representing the main bottleneck in an ASR system. The proposed design consists of innovative techniques to improve the memory subsystem, since memory is the main bottleneck for performance and power in these accelerators' design. It includes a prefetching scheme tailored to the needs of ASR systems that hides main memory latency for a large fraction of the memory accesses, negligibly impacting area. Additionally, we introduce a novel bandwidth-saving technique that removes off-chip memory accesses by 20 percent. Finally, we present a power saving technique that significantly reduces the leakage power of the accelerators scratchpad memories, providing between 8.5 and 29.2 percent reduction in entire power dissipation. Overall, the proposed design outperforms implementations running on the CPU by orders of magnitude, and achieves speedups between 1.7x and 5.9x for different speech decoders over a highly optimized CUDA implementation running on Geforce-GTX-980 GPU, while reducing the energy by 123-454x.Peer ReviewedPostprint (author's final draft

    Leveraging run-time feedback for efficient ASR acceleration

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.In this work, we propose Locality-AWare-Scheme (LAWS) for an Automatic Speech Recognition (ASR) accelerator in order to significantly reduce its energy consumption and memory requirements, by leveraging the locality among consecutive segments of the speech signal. LAWS diminishes ASR's workload by up to 60% by removing most of the off-chip accesses during the ASR's decoding process. We furthermore improve LAWS's effectiveness by selectively adapting the amount of ASR's workload, based on run-time feedback. In particular, we exploit the fact that the confidence of the ASR system varies along the recognition process. When confidence is high, the ASR system can be more restrictive and reduce the amount of work. The end design provides a saving of 87% in memory requests, 2.3x reduction in energy consumption, and a speedup of 2.1x with respect to the state-of-the-art ASR accelerator.Peer ReviewedPostprint (author's final draft

    Absolutely Maximally Entangled states, combinatorial designs and multi-unitary matrices

    Get PDF
    Absolutely Maximally Entangled (AME) states are those multipartite quantum states that carry absolute maximum entanglement in all possible partitions. AME states are known to play a relevant role in multipartite teleportation, in quantum secret sharing and they provide the basis novel tensor networks related to holography. We present alternative constructions of AME states and show their link with combinatorial designs. We also analyze a key property of AME, namely their relation to tensors that can be understood as unitary transformations in every of its bi-partitions. We call this property multi-unitarity.Comment: 18 pages, 2 figures. Comments are very welcom

    Online advertising: analysis of privacy threats and protection approaches

    Get PDF
    Online advertising, the pillar of the “free” content on the Web, has revolutionized the marketing business in recent years by creating a myriad of new opportunities for advertisers to reach potential customers. The current advertising model builds upon an intricate infrastructure composed of a variety of intermediary entities and technologies whose main aim is to deliver personalized ads. For this purpose, a wealth of user data is collected, aggregated, processed and traded behind the scenes at an unprecedented rate. Despite the enormous value of online advertising, however, the intrusiveness and ubiquity of these practices prompt serious privacy concerns. This article surveys the online advertising infrastructure and its supporting technologies, and presents a thorough overview of the underlying privacy risks and the solutions that may mitigate them. We first analyze the threats and potential privacy attackers in this scenario of online advertising. In particular, we examine the main components of the advertising infrastructure in terms of tracking capabilities, data collection, aggregation level and privacy risk, and overview the tracking and data-sharing technologies employed by these components. Then, we conduct a comprehensive survey of the most relevant privacy mechanisms, and classify and compare them on the basis of their privacy guarantees and impact on the Web.Peer ReviewedPostprint (author's final draft

    Performance analysis and optimization of automatic speech recognition

    Get PDF
    © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Fast and accurate Automatic Speech Recognition (ASR) is emerging as a key application for mobile devices. Delivering ASR on such devices is challenging due to the compute-intensive nature of the problem and the power constraints of embedded systems. In this paper, we provide a performance and energy characterization of Pocketsphinx, a popular toolset for ASR that targets mobile devices. We identify the computation of the Gaussian Mixture Model (GMM) as the main bottleneck, consuming more than 80 percent of the execution time. The CPI stack analysis shows that branches and main memory accesses are the main performance limiting factors for GMM computation. We propose several software-level optimizations driven by the power/performance analysis. Unlike previous proposals that trade accuracy for performance by reducing the number of Gaussians evaluated, we maintain accuracy and improve performance by effectively using the underlying CPU microarchitecture. First, we use a refactored implementation of the innermost loop of the GMM evaluation code to ameliorate the impact of branches. Second, we exploit the vector unit available on most modern CPUs to boost GMM computation, introducing a novel memory layout for storing the means and variances of the Gaussians in order to maximize the effectiveness of vectorization. Third, we compute the Gaussians for multiple frames in parallel, so means and variances can be fetched once in the on-chip caches and reused across multiple frames, significantly reducing memory bandwidth usage. We evaluate our optimizations using both hardware counters on real CPUs and simulations. Our experimental results show that the proposed optimizations provide 2.68x speedup over the baseline Pocketsphinx decoder on a high-end Intel Skylake CPU, while achieving 61 percent energy savings. On a modern ARM Cortex-A57 mobile processor our techniques improve performance by 1.85x, while providing 59 percent energy savings without any loss in the accuracy of the ASR system.Peer ReviewedPostprint (author's final draft

    An ultra low-power hardware accelerator for automatic speech recognition

    Get PDF
    Automatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile segment. Fast and accurate ASR comes at a high energy cost which is not affordable for the tiny power budget of mobile devices. Hardware acceleration can reduce power consumption of ASR systems, while delivering high-performance. In this paper, we present an accelerator for large-vocabulary, speaker-independent, continuous speech recognition. It focuses on the Viterbi search algorithm, that represents the main bottleneck in an ASR system. The proposed design includes innovative techniques to improve the memory subsystem, since memory is identified as the main bottleneck for performance and power in the design of these accelerators. We propose a prefetching scheme tailored to the needs of an ASR system that hides main memory latency for a large fraction of the memory accesses with a negligible impact on area. In addition, we introduce a novel bandwidth saving technique that removes 20% of the off-chip memory accesses issued during the Viterbi search. The proposed design outperforms software implementations running on the CPU by orders of magnitude and achieves 1.7x speedup over a highly optimized CUDA implementation running on a high-end Geforce GTX 980 GPU, while reducing by two orders of magnitude (287x) the energy required to convert the speech into text.Peer ReviewedPostprint (author's final draft

    Neuron-level fuzzy memoization in RNNs

    Get PDF
    The final publication is available at ACM via http://dx.doi.org/10.1145/3352460.3358309Recurrent Neural Networks (RNNs) are a key technology for applications such as automatic speech recognition or machine translation. Unlike conventional feed-forward DNNs, RNNs remember past information to improve the accuracy of future predictions and, therefore, they are very effective for sequence processing problems. For each application run, each recurrent layer is executed many times for processing a potentially large sequence of inputs (words, images, audio frames, etc.). In this paper, we make the observation that the output of a neuron exhibits small changes in consecutive invocations. We exploit this property to build a neuron-level fuzzy memoization scheme, which dynamically caches the output of each neuron and reuses it whenever it is predicted that the current output will be similar to a previously computed result, avoiding in this way the output computations. The main challenge in this scheme is determining whether the new neuron's output for the current input in the sequence will be similar to a recently computed result. To this end, we extend the recurrent layer with a much simpler Bitwise Neural Network (BNN), and show that the BNN and RNN outputs are highly correlated: if two BNN outputs are very similar, the corresponding outputs in the original RNN layer are likely to exhibit negligible changes. The BNN provides a low-cost and effective mechanism for deciding when fuzzy memoization can be applied with a small impact on accuracy. We evaluate our memoization scheme on top of a state-of-the-art accelerator for RNNs, for a variety of different neural networks from multiple application domains. We show that our technique avoids more than 24.2% of computations, resulting in 18.5% energy savings and 1.35x speedup on average.Peer ReviewedPostprint (author's final draft

    La influencia de proporcionar los nombres de las cantidades en la resolución aritmética de problemas verbales

    Get PDF
    Cuando un profesor supervisa la resolución de un problema que lleva a cabo un estudiante, debe valorar el potencial de las ayudas que proporciona. Presentamos parte de una investigación que tenía como objetivo estudiar el papel de las ayudas expresadas en lenguaje natural en la resolución aritmética de problemas. En el experimento participaron 32 estudiantes de quinto curso de primaria (10-11 años). En concreto, hemos analizado cómo influye, en el proceso de resolución de un problema, el hecho de proporcionar un conjunto de nombres (o etiquetas) que hacen referencia a un número suficiente de cantidades para resolverlo. Los nombres usados eran del tipo “kilos de naranjas”, “precio de las naranjas que se han comprado”, etc. El análisis de los resultados nos ha permitido elaborar un catálogo de actuaciones en el que se reflejan los procesos de gestión y las dificultades de los estudiantes para integrar estas etiquetas en el proceso de resolución
    corecore