39 research outputs found

    Voice inactivity ranking for enhancement of speech on microphone arrays

    Full text link
    Motivated by the problem of improving the performance of speech enhancement algorithms in non-stationary acoustic environments with low SNR, a framework is proposed for identifying signal frames of noisy speech that are unlikely to contain voice activity. Such voice-inactive frames can then be incorporated into an adaptation strategy to improve the performance of existing speech enhancement algorithms. This adaptive approach is applicable to single-channel as well as multi-channel algorithms for noisy speech. In both cases, the adaptive versions of the enhancement algorithms are observed to improve SNR levels by 20dB, as indicated by PESQ and WER criteria. In advanced speech enhancement algorithms, it is often of interest to identify some regions of the signal that have a high likelihood of being noise only i.e. no speech present. This is in contrast to advanced speech recognition, speaker recognition, and pitch tracking algorithms in which we are interested in identifying all regions that have a high likelihood of containing speech, as well as regions that have a high likelihood of not containing speech. In other terms, this would mean minimizing the false positive and false negative rates, respectively. In the context of speech enhancement, the identification of some speech-absent regions prompts the minimization of false positives while setting an acceptable tolerance on false negatives, as determined by the performance of the enhancement algorithm. Typically, Voice Activity Detectors (VADs) are used for identifying speech absent regions for the application of speech enhancement. In recent years a myriad of Deep Neural Network (DNN) based approaches have been proposed to improve the performance of VADs at low SNR levels by training on combinations of speech and noise. Training on such an exhaustive dataset is combinatorically explosive. For this dissertation, we propose a voice inactivity ranking framework, where the identification of voice-inactive frames is performed using a machine learning (ML) approach that only uses clean speech utterances for training and is robust to high levels of noise. In the proposed framework, input frames of noisy speech are ranked by ‘voice inactivity score’ to acquire definitely speech inactive (DSI) frame-sequences. These DSI regions serve as a noise estimate and are adaptively used by the underlying speech enhancement algorithm to enhance speech from a speech mixture. The proposed voice-inactivity ranking framework was used to perform speech enhancement in single-channel and multi-channel systems. In the context of microphone arrays, the proposed framework was used to determine parameters for spatial filtering using adaptive beamformers. We achieved an average Word Error Rate (WER) improvement of 50% at SNR levels below 0dB compared to the noisy signal, which is 7±2.5% more than the framework where state-of-the-art VAD decision was used for spatial filtering. For monaural signals, we propose a multi-frame multiband spectral-subtraction (MF-MBSS) speech enhancement system utilizing the voice inactivity framework to compute and update the noise statistics on overlapping frequency bands. The proposed MF-MBSS not only achieved an average PESQ improvement of 16% with a maximum improvement of 56% when compared to the state-of-the-art Spectral Subtraction but also a 5 ± 1.5% improvement in the Word Error Rate (WER) of the spatially filtered output signal, in non-stationary acoustic environments

    Inverse designing surface curvatures by deep learning

    Full text link
    Smooth and curved microstructural topologies found in nature - from soap films to trabecular bone - have inspired several mimetic design spaces for architected metamaterials and bio-scaffolds. However, the design approaches so far have been ad hoc, raising the challenge: how to systematically and efficiently inverse design such artificial microstructures with targeted topological features? Here, we explore surface curvature as a design modality and present a deep learning framework to produce topologies with as-desired curvature profiles. The inverse design framework can generalize to diverse topological features such as tubular, membranous, and particulate features. Moreover, we demonstrate successful generalization beyond both the design and data space by inverse designing topologies that mimic the curvature profile of trabecular bone, spinodoid topologies, and periodic nodal surfaces for application in bio-scaffolds and implants. Lastly, we bridge curvature and mechanics by showing how topological curvature can be designed to promote mechanically beneficial stretching-dominated deformation over bending-dominated deformation.Comment: 23 pages, 12 figure

    Automated exploration of prebiotic chemical reaction space: progress and perspectives

    Get PDF
    Prebiotic chemistry often involves the study of complex systems of chemical reactions that form large networks with a large number of diverse species. Such complex systems may have given rise to emergent phenomena that ultimately led to the origin of life on Earth. The environmental conditions and processes involved in this emergence may not be fully recapitulable, making it difficult for experimentalists to study prebiotic systems in laboratory simulations. Computational chemistry offers efficient ways to study such chemical systems and identify the ones most likely to display complex properties associated with life. Here, we review tools and techniques for modelling prebiotic chemical reaction networks and outline possible ways to identify self-replicating features that are central to many origin-of-life models

    Occurrence of neurocysticercosis in patients presenting with seizure and its serological evaluation

    Get PDF
    Background: Aims and objectives was to diagnose neurocysticercosis among the patient admitted with seizure in Pediatric Department of TMMRC and to correlate the serological and radiological findings.Methods: A total of 100 patients presenting with recent onset seizures were recruited from Pediatric department of a local major tertiary care teaching hospital during the period 2016-2017. Brain imaging was performed in all the above cases. Serological assessment was done using ELISA kit. Diagnosis of neurrocysticercois was done using Del Brutto’s criteria.Results: The recruited patients presented with generalized, simple partial, and focal seizures (68%, 21% and 11% respectively). NCC was diagnosed in 37 of 100 (37.0%) seizure cases based on imaging characteristics. There were 13% cases in whom, MRI showed calcified NCC/granuloma suggestive of NCC. In 24% cases, scolex suggestive of NCC was seen. A total of 15 (15%) cases were seropositive for Taenia. Using Del Brutto criteria, a total of 23% cases were diagnosed as probable NCC and 15% as definite neurocysticercosis. A total of 62% cases were confirmed as not having neurocysticercosis.Conclusions: The findings of present study showed that neurocysticercosis still is a major diagnosis among children presenting with seizure from this endemic area. Neuroimaging was a useful tool in diagnosis and characterisation of NCC than serological tool. The study highlighted the need to create awareness regarding maintenance of hygiene and cleanliness

    Technology Landscape for Epidemiological Prediction and Diagnosis of COVID-19

    Get PDF
    The COVID-19 outbreak initiated from the Chinese city of Wuhan and eventually affected almost every nation around the globe From China, the disease started spreading to the rest of the world After China, Italy became the next epicentre of the virus and witnessed a very high death toll Soon nations like the USA became severely hit by SARS-CoV-2 virus The World Health Organisation, on 11th March 2020, declared COVID-19 a pandemic To combat the epidemic, the nations from every corner of the world has instituted various policies like physical distancing, isolation of infected population and researching on the potential vaccine of SARS-CoV-2 To identify the impact of various policies implemented by the affected countries on the pandemic spread, a myriad of AI-based models have been presented to analyse and predict the epidemiological trends of COVID-19 In this work, the authors present a detailed study of different artificial intelligence frameworks applied for predictive analysis of COVID-19 patient record The forecasting models acquire information from records to detect the pandemic spreading and thus enabling an opportunity to take immediate actions to reduce the spread of the virus This paper addresses the research issues and corresponding solutions associated with the prediction and detection of infectious diseases like COVID-19 It further focuses on the study of vaccinations to cope with the pandemic Finally, the research challenges in terms of data availability, reliability, the accuracy of the existing prediction models and other open issues are discussed to outline the future course of this stud

    Reinforced Self-Training (ReST) for Language Modeling

    Full text link
    Reinforcement learning from human feedback (RLHF) can improve the quality of large language model's (LLM) outputs by aligning them with human preferences. We propose a simple algorithm for aligning LLMs with human preferences inspired by growing batch reinforcement learning (RL), which we call Reinforced Self-Training (ReST). Given an initial LLM policy, ReST produces a dataset by generating samples from the policy, which are then used to improve the LLM policy using offline RL algorithms. ReST is more efficient than typical online RLHF methods because the training dataset is produced offline, which allows data reuse. While ReST is a general approach applicable to all generative learning settings, we focus on its application to machine translation. Our results show that ReST can substantially improve translation quality, as measured by automated metrics and human evaluation on machine translation benchmarks in a compute and sample-efficient manner.Comment: 23 pages, 16 figure

    Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

    Full text link
    Pre-training speech models on large volumes of data has achieved remarkable success. OpenAI Whisper is a multilingual multitask model trained on 680k hours of supervised speech data. It generalizes well to various speech recognition and translation benchmarks even in a zero-shot setup. However, the full pipeline for developing such models (from data collection to training) is not publicly accessible, which makes it difficult for researchers to further improve its performance and address training-related issues such as efficiency, robustness, fairness, and bias. This work presents an Open Whisper-style Speech Model (OWSM), which reproduces Whisper-style training using an open-source toolkit and publicly available data. OWSM even supports more translation directions and can be more efficient to train. We will publicly release all scripts used for data preparation, training, inference, and scoring as well as pre-trained models and training logs to promote open science.Comment: Accepted at ASRU 202

    RoSETZ: Roman Survey of the Earth Transit Zone -- a SETI-optimized survey for habitable-zone exoplanets

    Full text link
    In this White Paper for Nancy Grace Roman Space Telescope (Roman) science, we propose the Roman Survey of the Earth Transit Zone (RoSETZ), a transit search for rocky planets within the habitable zones (HZs) of stars located within the Earth Transit Zone (ETZ). The ETZ holds special interest in the search for extra-terrestrial intelligence (SETI) - observers on planets within the ETZ can see Earth as a transiting planet. RoSETZ would augment the Roman Galactic Bulge Time Domain Survey (GBTDS) as an additional field located ∼5\sim 5~degrees away from other GBTDS fields. Our simulations show that RoSETZ alone can find from 120 to 630 Earth-sized HZ planets around K- and M-type hosts, with the range reflecting different survey design assumptions. These yields are 5-20 times the number currently known. Such a sample will transform our knowledge of ``Eta-Earth'' (η⊕\eta_{\oplus}) -- the occurrence of Earth-sized HZ planets -- and would be the first catalogue of exoplanets selected in a manner optimized according to the Mutual Detectability targetted-SETI strategy. If it can be accommodated alongside the existing GBTDS design, we favour a RoSETZ-Max design that is observed for the duration of the GBTDS. If not, we show that a slimmed-down RoSETZ-Lite design, occupying two GBTDS seasons, would not significantly impact overall GBTDS exoplanet yields, even if time allocated to it had to come from time allocations to other fields. We argue that the angular separation of RoSETZ from other GBTDS fields permits self-calibration of systematic uncertainties that would otherwise hamper exoplanet demographic modelling of both microlensing and transit datasets. Other science possible with RoSETZ data include studies of small solar system bodies and high resolution 3D extinction mapping.Comment: 20 pages. Submission to the NASA Roman Core Community Surveys White Paper Cal
    corecore