10 research outputs found

    딥러닝 기반 음향 이상 강도 추정

    Get PDF
    학위논문 (석사)-- 서울대학교 대학원 : 공과대학 기계항공공학부, 2019. 2. 윤병동.이 연구는 극단적인 정상과 이상 음향 신호만을 학습하여, 임의의 음향 신호의 이상 정도를 추정할 수 있는 딥러닝 알고리즘 기반 방법론에 대한 것이다. 우선 연속적으로 강도가 변화하는 이상 음향을 구현하기 위해 두 종류의 이상 신호를 실험적으로 합성하였다. 정상과 심한 이상 음향을 스펙트로그램으로 변환하여 다른 데이터로 이미 가중치가 학습된 CNN 모델로 분류를 시도한 결과, 아주 높은 수준의 정확도로 분류가 가능한 것이 확인되었다. 그러나 이 과정에서 학습된 모델로도 중간 정도의 이상 음향을 구분해낼 수 없었다. 이 한계점을 극복하기 위해서 우리는 잠재 공간의 특징 인자를 추출하였다. 우리는 특징 인자의 차원을 축소한 결과, 이상 정도의 증가에 따라 차원 축소된 인자 값이 서서히 변하는 현상을 관찰하였다. 이 현상은 정상 상태와 이상 상태의 특징 인자 군집 사이에 중간 정도의 이상을 가진 음향의 특징 인자를 위치시킬 수 있음을 시사한다. 마지막으로 이 방법론은 비음향 진동 데이터를 포함한 실제 환경에서 계측된 데이터들에 적용되었다. 제시된 방법론은 실제 데이터에 대해서도 유의미한 결과를 보였으며, 시주파수 영역 상에서 상태가 변화하는 이상에 공통적으로 적용될 수 있음을 제시하였다.This research proposes a deep learning-based method to estimate an intermediate severity fault state of acoustic data using a model trained only with normal and severe fault labels. First, two types of synthesized acoustic faults with five parameters were designed to simulate a gradually increasing fault. Then, a pretrained CNN model was applied to spectrogram images built from the data. The results from this model prove that classification of both normal and severe faults is possible with high accuracy. However, distinguishing intermediate faults was not possible, even with a fine-tuned model of highest accuracy. To overcome this limitation, latent space features were extracted using the model. Based on this information, the feature values were shown to gradually change as the severity of the fault increased in the reduced-dimension space. This phenomenon suggests that it is possible to map data with intermediate-level faults in the space somewhere between normal and severe fault clusters. The method was tested on real data, including non-acoustic vibrational data. It is anticipated that the proposed method can be applied not only to acoustic signals but also to any signals with a fault characteristic that gradually changes in the time-frequency domain as the fault propagates.Table of Contents Abstract i List of Tables vi List of Figures vii Chapter 1. Introduction 1 1.1 Motivation 1 1.2 Scope of the Research 3 1.3 Thesis Layout 6 Chapter 2. Research Background 7 2.1 Types of Acoustic Faults 7 2.2 Spectrogram 8 2.3 CNN Models 10 2.3.1 VGG-16 and VGG-19 11 2.3.2 ResNet-50 12 2.3.3 InceptionV3 13 2.3.4 Xception 15 2.4 Transfer Learning 16 2.5 Latent Space 17 2.5.1 Latent Space Visualization 17 Chapter 3. Proposed Estimation Method 18 3.1 Simulating Acoustic Fault 18 3.1.1 Modulation Fault 21 3.1.2 Impulsive Fault 22 3.2 Spectrogram Parameters 23 3.3 Transfer Learning and Fine-tuning 25 3.4 Latent Space Visualization 26 Chapter 4. Experiment Result 27 4.1 Synthesized Data 27 4.1.1 Transfer Learning Result 27 4.1.2 Prediction Result 28 4.1.3 Latent Space Visualization Result 32 4.2 Case Western Reserve University Bearing Dataset 35 4.2.1 Latent Space Visualization Result 36 4.3 Unbalanced Fan Data 37 4.3.1 Latent Space Visualization Result 37 Chapter 5. Conclusion and Future Work 39 5.1 Conclusion 39 5.2 Contribution 40 5.3 Future Work 41Maste

    Active Learning for Auditory Hierarchy

    Get PDF
    Much audio content today is rendered as a static stereo mix: fundamentally a fixed single entity. Object-based audio envisages the delivery of sound content using a collection of individual sound ‘objects’ controlled by accompanying metadata. This offers potential for audio to be delivered in a dynamic manner providing enhanced audio for consumers. One example of such treatment is the concept of applying varying levels of data compression to sound objects thereby reducing the volume of data to be transmitted in limited bandwidth situations. This application motivates the ability to accurately classify objects in terms of their ‘hierarchy’. That is, whether or not an object is a foreground sound, which should be reproduced at full quality if possible, or a background sound, which can be heavily compressed without causing a deterioration in the listening experience. Lack of suitably labelled data is an acknowledged problem in the domain. Active Learning is a method that can greatly reduce the manual effort required to label a large corpus by identifying the most effective instances to train a model to high accuracy levels. This paper compares a number of Active Learning methods to investigate which is most effective in the context of a hierarchical labelling task on an audio dataset. Results show that the number of manual labels required can be reduced to 1.7% of the total dataset while still retaining high prediction accuracy

    Learning spectro-temporal representations of complex sounds with parameterized neural networks

    Get PDF
    Deep Learning models have become potential candidates for auditory neuroscience research, thanks to their recent successes on a variety of auditory tasks. Yet, these models often lack interpretability to fully understand the exact computations that have been performed. Here, we proposed a parametrized neural network layer, that computes specific spectro-temporal modulations based on Gabor kernels (Learnable STRFs) and that is fully interpretable. We evaluated predictive capabilities of this layer on Speech Activity Detection, Speaker Verification, Urban Sound Classification and Zebra Finch Call Type Classification. We found out that models based on Learnable STRFs are on par for all tasks with different toplines, and obtain the best performance for Speech Activity Detection. As this layer is fully interpretable, we used quantitative measures to describe the distribution of the learned spectro-temporal modulations. The filters adapted to each task and focused mostly on low temporal and spectral modulations. The analyses show that the filters learned on human speech have similar spectro-temporal parameters as the ones measured directly in the human auditory cortex. Finally, we observed that the tasks organized in a meaningful way: the human vocalizations tasks closer to each other and bird vocalizations far away from human vocalizations and urban sounds tasks

    Voice Activity Detection Using Deep Neural Network

    Get PDF
    13301甲第4842号博士(工学)金沢大学博士論文要旨Abstract 以下に掲載:Eurasip Journal on Audio, Speech and Music Processing 2018(1) pp.1-15 2018. Springer International Publishing. 共著者:Suci Dwijayanti, Kei Yamamori, Masato Miyosh

    Application of artificial intelligence in factory maintenance

    Get PDF
    The work that will be presented in the rest of the document deals with the application of artificial intelligence AI, neural networks, machine learning on machine maintenance, which is a key resource for production in industry. It is a specific machine that must not have an interruption longer than 30 minutes during one shift. Due to the specific nature of the job of inserting fresh air into the blast furnace, the machine must work continuously during the entire furnace operation campaign. This campaign can last up to 12 months. By looking at the situation before the introduction of AI into the system, it was established that the stoppage is mainly caused by damage to the rolling bearings, which are the basis for starting the fan turbines. Further research led to the startling conclusion that bearings ran shorter when they were more lubricated than when they were not lubricated at all. Based on these observations, it was decided that it is necessary to create a program that will collect data on the sensors and based on this data, create an AI that will decide when and how much it is necessary to lubricate the bearings. The advantages of the system are related to the application of algorithms that significantly improve the efficiency of the software in the maintenance application, which significantly reduces the downtime of the machine, and increases its timeliness, availability and efficiency. The method of learning with incentives was applied. The program receives data from the sensors (pressure, temperature, vibrations and ultra sound), then performs an action on the machine via the actuator. The machine returns feedback via sensors to the program, which corrects the settings depending on the results (good or bad). The goal is for the program to learn during operation to have as high a percentage of good results as possible. Due to the complexity of the machine, there are limited limit values in the program, so that the program cannot cause damage to the machine during learning. The research results are presented using statistical methods in the paper. Specifically, the paper deals with the application of the Convolutional neural network CNN. The data measured on the sensors are sent to the database located on the server. The program groups this data and selects them based on the results - good and bad. The data is then used to train the network and create an optimal algorithm that, with its timely actions, should extend the service life of the rolling bearings on the machine, which is a key resource for the complete production of the factory. Based on the learning, the AI can generate reports based on which the procurement and replacement plan of critical components can be planned. By using the mentioned solution, the service life of the rolling bearings was increased by 20%, while the emergency outages of the plant were reduced to 0. The advantage of the used solution is reflected in high timeliness, availability, reliability, since there were no emergency outages since the implementation of the mentioned solution.Rad koji ce biti predstavljen u nastavku dokumenta bavi se prime-nom veštačke inteligencije AI, neuronskih mreža, mašinskog učenja na odrzavanju mašine koja je kjucni resurs za proizvodnju u industriji.Radi se o specificnoj masini koja u toku jedne smene ne sme imati prekid veci od 30min. Zbog specificnosti posla ubacivanje svezeg vazduha u visoku pec masina mora da radi kontinulno tokom cele kampanje rada peci. Ova kampanja moze da traje i do 12 meseci. Sagledavanjem stanja pre uvodjenja AI u sistem ustanovili smo da do zastoja uglavnom dolazi zbog ostecenja kotrljajucih lezajeva koje su osnov za pokretanje turbina ventilatora. Daljim istrazivanjima dosli smo do zapanjujucih zakljucaka da su lezajevi krace radili kada su bili vise podmazani nego kada uopste nije ni bilo podmazivanja. Na osnovu ovih zapazanja odlucili smo da je potrebno napraviti program koji ce vrsiti prikupljanje podataka na senzorima i na osnovu ovih podataka uraditi AI koja ce odlucivati kada i koliko je potrebno podmazati lezajeve. Prednosti sistema se odnose na primenu algoritama koji znatno poboljšavaju efikasnost softwera u aplikaciji održavanja čime se znatno smanjuje vreme otkaza mašine, a povećava njena ažurnost, dostupnost i efikasnost. Primenjena je metoda učenja uz podsticaje. Program prima podatke sa senzora (pritisak, temperatura, vibracije i ultra zvuk), zatim preko aktuatora vrši akciju na 97mašini. Mašina vraća povratnu informaciju preko senzora programu, koji koriguje podešavanja u zavisnosti od rezultata (dobri ili loši). Cilj je da program tokom rada nauči da ima što veći procenat dobrih rezultata. Zbog složenosti mašine u programu su ograničene granične vrednosti tako da program ne može da prouzrokuje oštećenje mašine prilikom učenja. Rezultati istraživanja prikazani su statističkim metodama u radu. Konkretno rad se bavi primenom neuronske mreze Convolutional neural network CNN. Podatke izmerenih na senzorima salju se u bazu podatka koja se nalazi na serveru. Program grupise ove podatke i selektuje ih na osnovu rezultata dobri I losi. Podaci se zatim koriste da se izvrsi ucenje mreze i napravi optimalan algoritam koji ce svojim pravovremenim akcijama treba da produzi radni vek kotrljajucih lezajeva na masini koja predstavlja kljucni resurs za kompletnu proizvodnju fabrike. Na osnovu ucenja AI moze da vrsi generisanje izvestaja na osnovu kojih se moze planirati nabavka I plan zamene kriticnih komponenata. Upotrebom pomenutog resenja radni vek kotrljajucih lezajeva je povecan za 20%, dok su havarijski ispadi postrojenja svedeni na 0. Prednost upotrebljenog resenja ogleda se u velikoj azurnosti, dostupnosti pouzadnosti posto nije bilo havarijskih ispada od implementacije pomenutog resenja

    Clasificación automática de sonidos utilizando aprendizaje máquina

    Get PDF
    En los últimos años, el aprendizaje máquina se ha venido utilizando intensamente para el reconocimiento de sonidos. Algunos son fácilmente distinguibles, como una risa, pero otros en cambio pueden ser muy similares entre sí, como una batidora y una motosierra. Además, la variabilidad inherente a estos audios hace que este problema sea bastante complicado de resolver mediante técnicas de procesado clásicas, pero supone un desafío apropiado para los altos niveles de abstracción que se pueden conseguir con las técnicas de aprendizaje máquina. En este trabajo se presentan dos modelos de red neuronal convolucional (CNN) para resolver un problema de clasificación de sonidos ambientales en siete categorías distintas. Los extractos de audio usados son los proporcionados por la base de datos UrbanSound8K. El rendimiento de ambos modelos llega a alcanzar el 90% de precisión en la clasificación de estos sonidos.Machine learning has been used intensively for sound recognition in recent years. Some sounds are easily distinguishable, like a laugh, but others can be very similar to each other, like a blender and a chainsaw. Furthermore, the inherent variability in these audios makes this problem quite difficult to solve using classical processing techniques, but it is an appropriate challenge for the high levels of abstraction that can be achieved with machine learning techniques. In this work, two convolutional neural network (CNN) models are presented to solve a problem of environmental sound classification in seven different labels. The audio excerpts used are those provided by the UrbanSound8K database. The performance of both models reaches 90% accuracy in the classification of these sounds.Universidad de Sevilla. Grado en Ingeniería de las Tecnologías de Telecomunicació

    深層学習に基づく音源情報推定のための確率論的目的関数の研究

    Get PDF
     本研究は,マイクロホンで観測した音響信号から,源信号や音源の種類や状態などの音に関係する情報である「音源情報」を推定する研究である.音源情報推定の題材として,源信号と雑音が重畳した観測信号から源信号を推定する「音源強調」と,観測信号に含まれる環境音の種類や状態を推定して周囲の危険を予測/察知する「異常音検知」に焦点を当てる.音源の種類や状態などの潜在的な音源情報を考慮しながら音源強調ができれば,大歓声に包まれたサッカースタジアムで,特定の選手の声やボールのキック音を推定でき,まるでサッカースタジアムに潜り込んだようなコンテンツ視聴の方法をユーザに提供可能になる.観測信号に含まれる環境音の種類や状態を推定する異常音検知が実現すれば,機器の動作音から,その機器の動作が正常か異常か(状態)を推定できるようになり,製造/保守業務の効率化ができる. 音源情報を推定するための手法として,統計的機械学習に基づくアプローチが研究されており,近年では深層学習を音源情報推定に適用することで,その推定精度が大きく向上している.深層学習に基づく音源情報推定では,ニューラルネットワークを観測信号から所望の音源情報への非線形写像関数として用いる.そしてニューラルネットワークを音源情報の推定精度を評価する「目的関数」の値を最大化/最小化するように求める.多くの深層学習において目的関数には,二乗誤差関数や交差エントロピー関数などの決定論的な目的関数が用いられる. 音源情報推定において目的関数の設計とは,所望の音源情報の性質や推定精度を定義することと等価である.音源情報の中は,決定論的な目的関数では音源情報の性質や推定精度を定義できないものや,もしくは定義することが妥当ではないものも存在する.例えば,人間の主観的な音質評価を最大化する源信号や,異常音(ラベルデータ)が収集できない音源の状態の推定のための目的関数には,決定論的な目的関数は採用できない.この問題を解決するためには,ネットワークの構造だけでなく,ニューラルネットワークの学習に用いる目的関数を高度化しなくてはならない. 本研究では,決定論的な関数で目的関数を設計できない音源情報を推定するために,深層学習に基づく音源情報推定のための目的関数の研究を行う.所望の音源情報の性質や推定精度を,推定したい音源情報の特性や解きたい問題に応じて入出力値がとるべき値の確率分布や集合として定義し,ニューラルネットワークの入出力が満たすべき統計的な性質を目的関数として記述するという着想からこの問題に取り組む. 3 章では,スポーツの競技音など,ラベルデータが十分に存在しない源信号を強調するための手法を提案する.少量の学習データでニューラルネットワークを学習するためには,事前に設計/選択した音響特徴量を観測信号から抽出し,小規模なニューラルネットワークで音源強調を行う必要がある.3 章では,所望の音源を強調するための適切な音響特徴量を,相互情報量最大化に基づき選択する方法を検討した.この際,特徴量候補の次元数が大きい音響特徴量選択に相互情報量を正確に計算する "カーネル次元圧縮法" を適用することを考え,スパース正則化法に基づく微分可能な目的関数を導出し,大量な音響特徴量候補から適切な音響特徴量を勾配法により選択できる音響特徴量選択法を提案した.定量評価試験では,従来の音響特徴量選択法と比べSDR が向上することを示し,また主観評価試験では,提案法を用いて音響特徴量を選択することで従来法と比べ源信号の明瞭性が向上することを示した.この成果により,これまで推定が困難とされていた,学習データが十分に得られないような源信号や,これまで源信号の推定対象とされてこず,適切な音響特徴量が未知な源信号も推定できるようになった. 4 章では,音源強調の出力音の主観品質を向上させるために,ラベルデータを一意に定めることができず,二乗誤差などの目的関数で推定精度を定義することが妥当でない源信号を強調するための手法を提案する.従来の深層学習に基づく音源強調では,源信号の振幅スペクトルなどをラベルデータとし,ニューラルネットワークの出力とラベルデータの二乗誤差を最小化するように学習をしてきた.このため,出力音に歪が生じて主観品質が低下するという問題があった.そこで4 章では,ラベルデータを用意する代わりに主観評価値と相関の高い音質評価値(聴感評点)を最大化するようための目的関数を提案した.定量評価試験では,提案する目的関数を利用することで,聴感評点を最大化するようにニューラルネットワークを学習できることを確認した.また主観評価試験では,提案法は従来の二乗誤差最小化に基づく目的関数を利用した音源強調よりも高い主観品質で音源強調できることを示した.この成果により,これまで音源強調の学習に利用できなかった聴感評点や人間の評価などの,より\高次" な評価尺度を目的関数として利用できるようになり,ニューラルネットワークを用いた音源強調の応用範囲を広げることができる. 5 章では,モーターの異常回転音やベアリングのぶつかり音などの普段発生しない音(異常音)を検知し,機器動作の状態が正常か異常かを判定することで機器の故障を検知する「異常音検知」の実現を目指す.この問題の難しさは,機器の故障頻度がきわめて低いため,機器の異常動作音(ラベルデータ)が収集できず,一般的な識別のためのニューラルネットワークの目的関数である交差エントロピーが利用できない点にある.そこで5 章では,正常音が従う確率分布と統計的に差異がある音を異常音と定義することで異常音検知を仮説検定とみなし,異常音検知器を最適化するための目的関数として,仮説検定の最適化基準であるネイマン・ピアソンの補題から"ネイマン・ピアソン指標" を導出した.定量評価試験では,従来法と比べ調和平均が向上したことから,提案法が従来法よりも安定して異常音検知できることを示した.また実環境実験では3D プリンタや送風ポンプの突発的な異常音や,ベアリングの傷などに起因する持続的な異常音を検知できることを示した.この成果により,異常音データの集まらない状態識別問題を安定的に解くことが可能になり,銃声検知や未知話者検出などのセキュリティのための音源情報推定技術など,負例データの収集が困難な様々な音源情報推定へと応用ができる.電気通信大学201
    corecore