165 research outputs found

    Ultra low-power, high-performance accelerator for speech recognition

    Get PDF
    Automatic Speech Recognition (ASR) is undoubtedly one of the most important and interesting applications in the cutting-edge era of Deep-learning deployment, especially in the mobile segment. Fast and accurate ASR comes at a high energy cost, requiring huge memory storage and computational power, which is not affordable for the tiny power budget of mobile devices. Hardware acceleration can reduce power consumption of ASR systems as well as reducing its memory pressure, while delivering high-performance. In this thesis, we present a customized accelerator for large-vocabulary, speaker-independent, continuous speech recognition. A state-of-the-art ASR system consists of two major components: acoustic-scoring using DNN and speech-graph decoding using Viterbi search. As the first step, we focus on the Viterbi search algorithm, that represents the main bottleneck in the ASR system. The accelerator includes some innovative techniques to improve the memory subsystem, which is the main bottleneck for performance and power, such as a prefetching scheme and a novel bandwidth saving technique tailored to the needs of ASR. Furthermore, as the speech graph is vast taking more than 1-Gigabyte memory space, we propose to change its representation by partitioning it into several sub-graphs and perform an on-the-fly composition during the Viterbi run-time. This approach together with some simple yet efficient compression techniques result in 31x memory footprint reduction, providing 155x real-time speedup and orders of magnitude power and energy saving compared to CPUs and GPUs. In the next step, we propose a novel hardware-based ASR system that effectively integrates a DNN accelerator for the pruned/quantized models with the Viterbi accelerator. We show that, when either pruning or quantizing the DNN model used for acoustic scoring, ASR accuracy is maintained but the execution time of the ASR system is increased by 33%. Although pruning and quantization improves the efficiency of the DNN, they result in a huge increase of activity in the Viterbi search since the output scores of the pruned model are less reliable. In order to avoid the aforementioned increase in Viterbi search workload, our system loosely selects the N-best hypotheses at every time step, exploring only the N most likely paths. Our final solution manages to efficiently combine both DNN and Viterbi accelerators using all their optimizations, delivering 222x real-time ASR with a small power budget of 1.26 Watt, small memory footprint of 41 MB, and a peak memory bandwidth of 381 MB/s, being amenable for low-power mobile platforms.Los sistemas de reconocimiento automático del habla (ASR por sus siglas en inglés, Automatic Speech Recognition) son sin lugar a dudas una de las aplicaciones más relevantes en el área emergente de aprendizaje profundo (Deep Learning), specialmente en el segmento de los dispositivos móviles. Realizar el reconocimiento del habla de forma rápida y precisa tiene un elevado coste en energía, requiere de gran capacidad de memoria y de cómputo, lo cual no es deseable en sistemas móviles que tienen severas restricciones de consumo energético y disipación de potencia. El uso de arquitecturas específicas en forma de aceleradores hardware permite reducir el consumo energético de los sistemas de reconocimiento del habla, al tiempo que mejora el rendimiento y reduce la presión en el sistema de memoria. En esta tesis presentamos un acelerador específicamente diseñado para sistemas de reconocimiento del habla de gran vocabulario, independientes del orador y que funcionan en tiempo real. Un sistema de reconocimiento del habla estado del arte consiste principalmente en dos componentes: el modelo acústico basado en una red neuronal profunda (DNN, Deep Neural Network) y la búsqueda de Viterbi basada en un grafo que representa el lenguaje. Como primer objetivo nos centramos en la búsqueda de Viterbi, ya que representa el principal cuello de botella en los sistemas ASR. El acelerador para el algoritmo de Viterbi incluye técnicas innovadoras para mejorar el sistema de memoria, que es el mayor cuello de botella en rendimiento y energía, incluyendo técnicas de pre-búsqueda y una nueva técnica de ahorro de ancho de banda a memoria principal específicamente diseñada para sistemas ASR. Además, como el grafo que representa el lenguaje requiere de gran capacidad de almacenamiento en memoria (más de 1 GB), proponemos cambiar su representación y dividirlo en distintos grafos que se componen en tiempo de ejecución durante la búsqueda de Viterbi. De esta forma conseguimos reducir el almacenamiento en memoria principal en un factor de 31x, alcanzar un rendimiento 155 veces superior a tiempo real y reducir el consumo energético y la disipación de potencia en varios órdenes de magnitud comparado con las CPUs y las GPUs. En el siguiente paso, proponemos un novedoso sistema hardware para reconocimiento del habla que integra de forma efectiva un acelerador para DNNs podadas y cuantizadas con el acelerador de Viterbi. Nuestros resultados muestran que podar y/o cuantizar el DNN para el modelo acústico permite mantener la precisión pero causa un incremento en el tiempo de ejecución del sistema completo de hasta el 33%. Aunque podar/cuantizar mejora la eficiencia del DNN, éstas técnicas producen un gran incremento en la carga de trabajo de la búsqueda de Viterbi ya que las probabilidades calculadas por el DNN son menos fiables, es decir, se reduce la confianza en las predicciones del modelo acústico. Con el fin de evitar un incremento inaceptable en la carga de trabajo de la búsqueda de Viterbi, nuestro sistema restringe la búsqueda a las N hipótesis más probables en cada paso de la búsqueda. Nuestra solución permite combinar de forma efectiva un acelerador de DNNs con un acelerador de Viterbi incluyendo todas las optimizaciones de poda/cuantización. Nuestro resultados experimentales muestran que dicho sistema alcanza un rendimiento 222 veces superior a tiempo real con una disipación de potencia de 1.26 vatios, unos requisitos de memoria modestos de 41 MB y un uso de ancho de banda a memoria principal de, como máximo, 381 MB/s, ofreciendo una solución adecuada para dispositivos móviles

    An ultra low-power hardware accelerator for automatic speech recognition

    Get PDF
    Automatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile segment. Fast and accurate ASR comes at a high energy cost which is not affordable for the tiny power budget of mobile devices. Hardware acceleration can reduce power consumption of ASR systems, while delivering high-performance. In this paper, we present an accelerator for large-vocabulary, speaker-independent, continuous speech recognition. It focuses on the Viterbi search algorithm, that represents the main bottleneck in an ASR system. The proposed design includes innovative techniques to improve the memory subsystem, since memory is identified as the main bottleneck for performance and power in the design of these accelerators. We propose a prefetching scheme tailored to the needs of an ASR system that hides main memory latency for a large fraction of the memory accesses with a negligible impact on area. In addition, we introduce a novel bandwidth saving technique that removes 20% of the off-chip memory accesses issued during the Viterbi search. The proposed design outperforms software implementations running on the CPU by orders of magnitude and achieves 1.7x speedup over a highly optimized CUDA implementation running on a high-end Geforce GTX 980 GPU, while reducing by two orders of magnitude (287x) the energy required to convert the speech into text.Peer ReviewedPostprint (author's final draft

    Ultra low-power, high-performance accelerator for speech recognition

    Get PDF
    Automatic Speech Recognition (ASR) is undoubtedly one of the most important and interesting applications in the cutting-edge era of Deep-learning deployment, especially in the mobile segment. Fast and accurate ASR comes at a high energy cost, requiring huge memory storage and computational power, which is not affordable for the tiny power budget of mobile devices. Hardware acceleration can reduce power consumption of ASR systems as well as reducing its memory pressure, while delivering high-performance. In this thesis, we present a customized accelerator for large-vocabulary, speaker-independent, continuous speech recognition. A state-of-the-art ASR system consists of two major components: acoustic-scoring using DNN and speech-graph decoding using Viterbi search. As the first step, we focus on the Viterbi search algorithm, that represents the main bottleneck in the ASR system. The accelerator includes some innovative techniques to improve the memory subsystem, which is the main bottleneck for performance and power, such as a prefetching scheme and a novel bandwidth saving technique tailored to the needs of ASR. Furthermore, as the speech graph is vast taking more than 1-Gigabyte memory space, we propose to change its representation by partitioning it into several sub-graphs and perform an on-the-fly composition during the Viterbi run-time. This approach together with some simple yet efficient compression techniques result in 31x memory footprint reduction, providing 155x real-time speedup and orders of magnitude power and energy saving compared to CPUs and GPUs. In the next step, we propose a novel hardware-based ASR system that effectively integrates a DNN accelerator for the pruned/quantized models with the Viterbi accelerator. We show that, when either pruning or quantizing the DNN model used for acoustic scoring, ASR accuracy is maintained but the execution time of the ASR system is increased by 33%. Although pruning and quantization improves the efficiency of the DNN, they result in a huge increase of activity in the Viterbi search since the output scores of the pruned model are less reliable. In order to avoid the aforementioned increase in Viterbi search workload, our system loosely selects the N-best hypotheses at every time step, exploring only the N most likely paths. Our final solution manages to efficiently combine both DNN and Viterbi accelerators using all their optimizations, delivering 222x real-time ASR with a small power budget of 1.26 Watt, small memory footprint of 41 MB, and a peak memory bandwidth of 381 MB/s, being amenable for low-power mobile platforms.Los sistemas de reconocimiento automático del habla (ASR por sus siglas en inglés, Automatic Speech Recognition) son sin lugar a dudas una de las aplicaciones más relevantes en el área emergente de aprendizaje profundo (Deep Learning), specialmente en el segmento de los dispositivos móviles. Realizar el reconocimiento del habla de forma rápida y precisa tiene un elevado coste en energía, requiere de gran capacidad de memoria y de cómputo, lo cual no es deseable en sistemas móviles que tienen severas restricciones de consumo energético y disipación de potencia. El uso de arquitecturas específicas en forma de aceleradores hardware permite reducir el consumo energético de los sistemas de reconocimiento del habla, al tiempo que mejora el rendimiento y reduce la presión en el sistema de memoria. En esta tesis presentamos un acelerador específicamente diseñado para sistemas de reconocimiento del habla de gran vocabulario, independientes del orador y que funcionan en tiempo real. Un sistema de reconocimiento del habla estado del arte consiste principalmente en dos componentes: el modelo acústico basado en una red neuronal profunda (DNN, Deep Neural Network) y la búsqueda de Viterbi basada en un grafo que representa el lenguaje. Como primer objetivo nos centramos en la búsqueda de Viterbi, ya que representa el principal cuello de botella en los sistemas ASR. El acelerador para el algoritmo de Viterbi incluye técnicas innovadoras para mejorar el sistema de memoria, que es el mayor cuello de botella en rendimiento y energía, incluyendo técnicas de pre-búsqueda y una nueva técnica de ahorro de ancho de banda a memoria principal específicamente diseñada para sistemas ASR. Además, como el grafo que representa el lenguaje requiere de gran capacidad de almacenamiento en memoria (más de 1 GB), proponemos cambiar su representación y dividirlo en distintos grafos que se componen en tiempo de ejecución durante la búsqueda de Viterbi. De esta forma conseguimos reducir el almacenamiento en memoria principal en un factor de 31x, alcanzar un rendimiento 155 veces superior a tiempo real y reducir el consumo energético y la disipación de potencia en varios órdenes de magnitud comparado con las CPUs y las GPUs. En el siguiente paso, proponemos un novedoso sistema hardware para reconocimiento del habla que integra de forma efectiva un acelerador para DNNs podadas y cuantizadas con el acelerador de Viterbi. Nuestros resultados muestran que podar y/o cuantizar el DNN para el modelo acústico permite mantener la precisión pero causa un incremento en el tiempo de ejecución del sistema completo de hasta el 33%. Aunque podar/cuantizar mejora la eficiencia del DNN, éstas técnicas producen un gran incremento en la carga de trabajo de la búsqueda de Viterbi ya que las probabilidades calculadas por el DNN son menos fiables, es decir, se reduce la confianza en las predicciones del modelo acústico. Con el fin de evitar un incremento inaceptable en la carga de trabajo de la búsqueda de Viterbi, nuestro sistema restringe la búsqueda a las N hipótesis más probables en cada paso de la búsqueda. Nuestra solución permite combinar de forma efectiva un acelerador de DNNs con un acelerador de Viterbi incluyendo todas las optimizaciones de poda/cuantización. Nuestro resultados experimentales muestran que dicho sistema alcanza un rendimiento 222 veces superior a tiempo real con una disipación de potencia de 1.26 vatios, unos requisitos de memoria modestos de 41 MB y un uso de ancho de banda a memoria principal de, como máximo, 381 MB/s, ofreciendo una solución adecuada para dispositivos móviles.Postprint (published version

    Can galvanic skin conductance be used as an objective indicator of children?s anxiety in the dental setting?

    Get PDF
    Assessment of procedural distress is essential at assisting children during invasive dental treatments. This study aimed to determine the validity and reliability of galvanic skin response as a measure for assessment of dental anxiety in children. 151 children, aged 5-7 years, participated in this study. Similar dental treatments were rendered to all subjects. At the beginning and end of the session, modified child dental anxiety scale (MCDAS), clinical anxiety rating scale (CARS) and galvanic skin response (GSR) were used to determine children?s anxiety. GSR was significantly correlated with both MCDAS (rs=0.62, p=0.02) and CARS (rs=0.44, p=0.032). The correlation between MCDAS and CARS was also significant (rs = 0.9, P<0.001). Anxiety decreased during the session in both GSR (rs=0.52, p=0.001) and MCDAS scales (rs=0.77, p=0.001). CARS also showed a reduction between the initial and second assessment, but it was not statistically significant (rs=0.12, P=0.36). The findings suggest that GSR is a reliable and valid measure for assessment of children?s dental anxiety in the clinical context. GSR may help to identify clinically anxious children before dental treatment to provide appropriate interventions

    The impact of maternal emotional intelligence and parenting style on child anxiety and behavior in the dental setting

    Get PDF
    Objective. The present study investigated the correlations between maternal emotional intelligence (EQ), parenting style, child trait anxiety and child behavior in the dental setting. Study design. One-hundred seventeen children, aged 4-6 years old (mean 5.24 years), and their mothers participated in the study. The BarOn Emotional Quotient Inventory and Bumrind's parenting style questionnaire were used to quantify maternal emotional intelligence and parenting style. Children's anxiety and behavior was evaluated using the Spence Children's Anxiety Scale (SCAS) and Frankl behavior scale. Results. Significant correlation was found between maternal EQ and child behavior (r=0.330; p<0.01); but not between parenting style and child behavior. There was no significant correlation between mother's total EQ and child's total anxiety; however, some subscales of EQ and anxiety showed significant correlations. There were significant correlations between authoritarian parenting style and separation anxiety (r=0.186; p<0.05) as well as authoritative parenting style and mother's EQ (r=0.286; p<0.01). There was no significant correlation between child anxiety and behavior (r = -0.81). Regression analysis revealed maternal EQ is effective in predicting child behavior (B=0.340; p<0.01). Conclusion. This study provides preliminary evidence that the child's behavior in the dental setting is correlated to mother's emotional intelligence. Emotionally intelligent mothers were found to have predominantly authoritative parenting styl

    Physical Layer Techniques for OFDM-Based Cognitive Radios

    Get PDF
    Cognitive radio has recently been proposed as a promising approach for efficient utilization of radio spectrum. However, there are several challenges to be addressed across all layers of a cognitive radio system design, from application to hardware implementation. From the physical layer point-of-view, two key challenges are spectrum sensing and an appropriate signaling scheme for data transmission. The modulation techniques used in cognitive radio not only should be efficient and flexible but also must not cause (harmful) interference to the primary (licensed) users. Among all the proposed signaling schemes for cognitive radio, orthogonal frequency division multiplexing (OFDM) has emerged as a promising one due to its robustness against multipath fading, high spectral efficiency, and capacity for dynamic spectrum use. However, OFDM suffers from high out-of-band radiation which is due to high sidelobes of subcarriers. In this thesis, we consider spectral shaping in OFDM-based cognitive radio systems with focus on reducing interference to primary users created by by out-of-band radiation of secondary users' OFDM signal. In the first part of this research, we first study the trade-o between time-based and frequency-based methods proposed for sidelobe suppression in OFDM. To this end, two recently proposed techniques, active interference cancellation (AIC) and adaptive symbol transition (AST), are considered and a new joint time-frequency scheme is developed for both single-antenna and multi-antenna systems. Furthermore, knowledge of wireless channel is used in the setting of the proposed joint scheme to better minimize interference to the primary user. This scheme enables us to evaluate the trade-o between the degrees of freedom provided by each of the two aforementioned methods. In the second part of this research, a novel low-complexity technique for reducing out-of-band radiation power of OFDM subcarriers for both single-antenna and multi-antenna systems is proposed. In the new technique, referred to as a phase adjustment technique, each OFDM symbol is rotated in the complex plane by an optimal phase such that the interference to primary users is minimized. It is shown that the phase adjustment technique neither reduces the system throughput, nor does increase the bit-error-rate of the system. Moreover, the performance of the technique in interference reduction is evaluated analytically in some special cases and is verified using numerical simulations. Due to high sensitivity of OFDM systems to time and frequency synchronization errors, performance of spectral shaping techniques in OFDM is significantly affected by timing jitter in practical systems. In the last part of this research, we investigate the impact of timing jitter on sidelobe suppression techniques. Considering AIC as the base method of sidelobe suppression, we first propose a mathematical model for OFDM spectrum in presence of timing jitter and evaluate the performance degradation to AIC due to timing jitter. Then, a precautionary scheme based on a minimax approach is proposed to make the technique robust against random timing jitter.4 month

    ZeroQuant-HERO: Hardware-Enhanced Robust Optimized Post-Training Quantization Framework for W8A8 Transformers

    Full text link
    Quantization techniques are pivotal in reducing the memory and computational demands of deep neural network inference. Existing solutions, such as ZeroQuant, offer dynamic quantization for models like BERT and GPT but overlook crucial memory-bounded operators and the complexities of per-token quantization. Addressing these gaps, we present a novel, fully hardware-enhanced robust optimized post-training W8A8 quantization framework, ZeroQuant-HERO. This framework uniquely integrates both memory bandwidth and compute-intensive operators, aiming for optimal hardware performance. Additionally, it offers flexibility by allowing specific INT8 modules to switch to FP16/BF16 mode, enhancing accuracy.Comment: 8 pages, 2 figure

    Zeolite-silver-zinc nanoparticles : biocompatibility and their effect on the compressive strength of mineral trioxide aggregate

    Get PDF
    This study was carried out to evaluate the biocompatibility of zeolite-silver-zinc (Ze-Ag-Zn) nanoparticles and their effect on the compressive strength of Mineral Trioxide Aggregate (MTA). Biocompatibility was evaluated by an MTT assay on the pulmonary adenocarcinoma cells with 0.05, 0.1, 0.25, 0.5, 1 and 5 mg/mL concentrations of Ze-Ag-Zn. For compressive strength test, four groups containing 15 stainless-steel cylinders with an internal diameter of 4 and a height of 6 mm were prepared and MTA (groups 1 and 2) or MTA + 2% Ze-Ag-Zn (groups 3 and 4) were placed in the cylinders. The compressive strength was evaluated using a universal testing machine 4 days after mixing in groups 1 and 3, and 21 days after mixing in groups 2 and 4. There was no significant difference between cytotoxicity of different concentrations. The highest (52.22±18.92 MPa) and lowest (19.57±5.76 MPa) compressive strength were observed in MTA group after 21 days and in MTA + 2% Ze-Ag-Zn group after four days, respectively. The effect of time and 2% Ze-Ag-Zn on the compressive strength were significant (P<0.05). Mixing MTA with Ze-Ag-Zn significantly reduced and passage of time from day four to 21 significantly increased the compressive strength. Mixing MTA with 2% Ze-Ag-Zn had an adverse effect on the compressive strength of MTA, but this combination had no cytotoxic effects

    Study of Identities' Role in International Crises; Case Study: Syrian Crisis

    Get PDF
    Identity can be considered as a link between Constructivism theory and theoretical studies related to international crises. According to a Constructivist Point of View, identities are the basis of interests and roles, and actors determine their friend and Enemy based on the identity they envision for themselves. Accordingly, the main purpose of the present article is to answer this question: What role does Identity play in International Crises? In answer to this question, using the descriptive-analytical methods, we seek to test Hypothesis that if actors with inconsistent identities are involved in a crisis the crisis will have a high potential for intensification and expansion, and will most likely lead to the use of violent methods of crisis management. The best application of this hypothesis is the Syrian crisis. The beginning of the Syrian crisis was strongly influenced by the identity crisis within the Syrian government, and for this reason, two identity conflicts (Neo-Salafi-Alavi) and (Kurdish-Arabic) were highlighted in this crisis., Subsequently, two revolutionary and conservative axes led by Iran and Saudi Arabia, as well as the third actor, Erdogan, entered the crisis and used all their power and proxy forces and allies to eliminate their "other" identities in Syria

    Effect of the TiO2 nanoparticles on the selected physical properties of mineral trioxide aggregate

    Get PDF
    Some of the efforts to improve the properties of Mineral Trioxide Aggregate (MTA) include incorporation of some nanoparticles such as Titanium dioxide (TiO2). The aim of this study was to evaluate the effect of TiO2 nanoparticles on the setting time, working time, push-out bond strength and compressive strength of MTA. The physical properties to be evaluated were determined using the ISO 6786:2001 and 9917 specifications. Fifteen samples of each material (MTA or MTA with 1% weight ratio of TiO2 Nanoparticles) were prepared for any evaluated physical property. Data were analyzed using descriptive statistics and T-test. Statistical significance was set at P<0.05. There was the significant effect of the material type (presence and absence of TiO2 nanoparticles) on the push-out bond strength, compressive strength, working time and setting time, with significantly higher values achieved in the group with TiO2 nanoparticles than the group without these particles (P=0.01 for the setting time and compressive strength, P=0.03 for the working time and P=0.001 for the bond strength). Based on the findings of this in vitro study, incorporation of the TiO2 nanoparticles with weight ratio of 1% increased the setting time, working time, compressive strength and push out bond strength of MTA
    • …
    corecore