65 research outputs found

    Audio source separation into the wild

    Get PDF
    International audienceThis review chapter is dedicated to multichannel audio source separation in real-life environment. We explore some of the major achievements in the field and discuss some of the remaining challenges. We will explore several important practical scenarios, e.g. moving sources and/or microphones, varying number of sources and sensors, high reverberation levels, spatially diffuse sources, and synchronization problems. Several applications such as smart assistants, cellular phones, hearing aids and robots, will be discussed. Our perspectives on the future of the field will be given as concluding remarks of this chapter

    ベイズ法によるマイクロフォンアレイ処理

    Get PDF
    京都大学0048新制・課程博士博士(情報学)甲第18412号情博第527号新制||情||93(附属図書館)31270京都大学大学院情報学研究科知能情報学専攻(主査)教授 奥乃 博, 教授 河原 達也, 准教授 CUTURI CAMETO Marco, 講師 吉井 和佳学位規則第4条第1項該当Doctor of InformaticsKyoto UniversityDFA

    Acceleration Techniques for Sparse Recovery Based Plane-wave Decomposition of a Sound Field

    Get PDF
    Plane-wave decomposition by sparse recovery is a reliable and accurate technique for plane-wave decomposition which can be used for source localization, beamforming, etc. In this work, we introduce techniques to accelerate the plane-wave decomposition by sparse recovery. The method consists of two main algorithms which are spherical Fourier transformation (SFT) and sparse recovery. Comparing the two algorithms, the sparse recovery is the most computationally intensive. We implement the SFT on an FPGA and the sparse recovery on a multithreaded computing platform. Then the multithreaded computing platform could be fully utilized for the sparse recovery. On the other hand, implementing the SFT on an FPGA helps to flexibly integrate the microphones and improve the portability of the microphone array. For implementing the SFT on an FPGA, we develop a scalable FPGA design model that enables the quick design of the SFT architecture on FPGAs. The model considers the number of microphones, the number of SFT channels and the cost of the FPGA and provides the design of a resource optimized and cost-effective FPGA architecture as the output. Then we investigate the performance of the sparse recovery algorithm executed on various multithreaded computing platforms (i.e., chip-multiprocessor, multiprocessor, GPU, manycore). Finally, we investigate the influence of modifying the dictionary size on the computational performance and the accuracy of the sparse recovery algorithms. We introduce novel sparse-recovery techniques which use non-uniform dictionaries to improve the performance of the sparse recovery on a parallel architecture

    Single channel overlapped-speech detection and separation of spontaneous conversations

    Get PDF
    PhD ThesisIn the thesis, spontaneous conversation containing both speech mixture and speech dialogue is considered. The speech mixture refers to speakers speaking simultaneously (i.e. the overlapped-speech). The speech dialogue refers to only one speaker is actively speaking and the other is silent. That Input conversation is firstly processed by the overlapped-speech detection. Two output signals are then segregated into dialogue and mixture formats. The dialogue is processed by speaker diarization. Its outputs are the individual speech of each speaker. The mixture is processed by speech separation. Its outputs are independent separated speech signals of the speaker. When the separation input contains only the mixture, blind speech separation approach is used. When the separation is assisted by the outputs of the speaker diarization, it is informed speech separation. The research presents novel: overlapped-speech detection algorithm, and two speech separation algorithms. The proposed overlapped-speech detection is an algorithm to estimate the switching instants of the input. Optimization loop is adapted to adopt the best capsulated audio features and to avoid the worst. The optimization depends on principles of the pattern recognition, and k-means clustering. For of 300 simulated conversations, averages of: False-Alarm Error is 1.9%, Missed-Speech Error is 0.4%, and Overlap-Speaker Error is 1%. Approximately, these errors equal the errors of best recent reliable speaker diarization corpuses. The proposed blind speech separation algorithm consists of four sequential techniques: filter-bank analysis, Non-negative Matrix Factorization (NMF), speaker clustering and filter-bank synthesis. Instead of the required speaker segmentation, effective standard framing is contributed. Average obtained objective tests (SAR, SDR and SIR) of 51 simulated conversations are: 5.06dB, 4.87dB and 12.47dB respectively. For the proposed informed speech separation algorithm, outputs of the speaker diarization are a generated-database. The database associated the speech separation by creating virtual targeted-speech and mixture. The contributed virtual signals are trained to facilitate the separation by homogenising them with the NMF-matrix elements of the real mixture. Contributed masking optimized the resulting speech. Average obtained SAR, SDR and SIR of 341 simulated conversations are 9.55dB, 1.12dB, and 2.97dB respectively. Per the objective tests of the two speech separation algorithms, they are in the mid-range of the well-known NMF-based audio and speech separation methods

    Ultra-high-speed imaging of bubbles interacting with cells and tissue

    Get PDF
    Ultrasound contrast microbubbles are exploited in molecular imaging, where bubbles are directed to target cells and where their high-scattering cross section to ultrasound allows for the detection of pathologies at a molecular level. In therapeutic applications vibrating bubbles close to cells may alter the permeability of cell membranes, and these systems are therefore highly interesting for drug and gene delivery applications using ultrasound. In a more extreme regime bubbles are driven through shock waves to sonoporate or kill cells through intense stresses or jets following inertial bubble collapse. Here, we elucidate some of the underlying mechanisms using the 25-Mfps camera Brandaris128, resolving the bubble dynamics and its interactions with cells. We quantify acoustic microstreaming around oscillating bubbles close to rigid walls and evaluate the shear stresses on nonadherent cells. In a study on the fluid dynamical interaction of cavitation bubbles with adherent cells, we find that the nonspherical collapse of bubbles is responsible for cell detachment. We also visualized the dynamics of vibrating microbubbles in contact with endothelial cells followed by fluorescent imaging of the transport of propidium iodide, used as a membrane integrity probe, into these cells showing a direct correlation between cell deformation and cell membrane permeability

    Perceptual Organization

    Get PDF
    Perceiving the world of real objects seems so easy that it is difficult to grasp just how complicated it is. Not only do we need to construct the objects quickly, the objects keep changing even though we think of them as having a consistent, independent existence (Feldman, 2003). Yet, we usually get it right, there are few failures. We can perceive a tree in a blinding snowstorm, a deer bounding across a tree line, dodge a snowball, catch a baseball, detect the crack of a branch breaking in a strong windstorm amidst the rustling of trees, predict the sounds of a dripping faucet, or track a street musician strolling down the road
    corecore