4 research outputs found

    Automatic classification of speaker characteristics

    Get PDF

    Automatic estimation of one's age with his/her speech based upon acoustic modeling techniques of speakers

    No full text

    Age Classification based on Machine Learning

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ธ๋ฌธ๋Œ€ํ•™ ์–ธ์–ดํ•™๊ณผ, 2018. 8. ์ •๋ฏผํ™”.๋ณธ ์—ฐ๊ตฌ๋Š” ๋Œ€๊ฒ€์ฐฐ์ฒญ์—์„œ ์ˆ˜์ง‘ํ•œ ํ•œ๊ตญ์ธ ๋Œ€๊ทœ๋ชจ ์Œ์„ฑ ์ฝ”ํผ์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ธฐ๊ณ„ ํ•™์Šต ๋ชจ๋ธ์„ ํ†ตํ•ด ์—ฐ๋ น์„ ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•œ๋‹ค. ํ•™์Šต ๋ชจ๋ธ์€ 20๋Œ€, 30~40๋Œ€, 50๋Œ€ ์ด์ƒ์œผ๋กœ 3๋ถ„๋ฅ˜๋ฅผ ํ•œ๋‹ค. ์‹คํ—˜์„ ์œ„ํ•ด ๋ฌต์Œ(silence)์„ ๊ธฐ์ค€์œผ๋กœ ์Œ์„ฑ ์ฝ”ํผ์Šค๋ฅผ 378,684๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋กœ ๋ถ„์ ˆํ•˜์˜€์œผ๋ฉฐ, ๋ฐœํ™” ์œ ํ˜•๊ณผ ์„ฑ๋ณ„๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๊ตฌ๋ถ„ํ•˜์˜€๋‹ค. ์Œ์„ฑ์œผ๋กœ๋ถ€ํ„ฐ Mel Frequency Cepstral Coefficients(MFCCs), fundamental frequency(F0), i-vector, jitter, shimmer, ๋ฐœํ™”์†๋„๋ฅผ ์ถ”์ถœํ•˜์—ฌ ๊ธฐ๊ณ„ ํ•™์Šต ๋ชจ๋ธ์ธ Long Short Term Memory(LSTM) ๋ชจ๋ธ์„ ํ†ตํ•ด ์—ฐ๋ น์„ ๋ถ„๋ฅ˜ํ•˜์˜€๋‹ค. ๋˜ํ•œ, feature selection ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ†ตํ•ด ๊ฐ ์Œ์„ฑ ํŠน์ง•์˜ ์˜ํ–ฅ์„ ํ™•์ธํ•˜์—ฌ ํŠน์ง•๋งˆ๋‹ค ๊ฐ€์ค‘์น˜๋ฅผ ๋‹ฌ๋ฆฌํ•œ ์‹คํ—˜๋„ ์ง„ํ–‰ํ•˜์˜€๋‹ค. ์‹คํ—˜์—์„œ๋Š” ์Œ์„ฑ ํŠน์ง•๋ณ„ ์„ฑ๋Šฅ๊ณผ ์Œ์„ฑ ํŠน์ง•์˜ ์กฐํ•ฉ์˜ ์„ฑ๋Šฅ์œผ๋กœ ๋‚˜๋ˆ„์–ด ํ—˜ํ•˜์˜€๋‹ค. ๊ทธ ๊ฒฐ๊ณผ, ๊ฐœ๋ณ„ ์Œ์„ฑ ํŠน์ง•์˜ ๊ฒฝ์šฐ MFCC๋กœ ํ•™์Šตํ•˜์˜€์„ ๋•Œ 76.01%๋กœ ๊ฐ€์žฅ ๋†’์•˜์œผ๋ฉฐ, ์Œ์„ฑ ํŠน์ง•์˜ ์กฐํ•ฉ์˜ ๊ฒฝ์šฐ ๋ชจ๋“  ์Œ์„ฑ ํŠน์ง•์„ ํ•™์Šตํ•˜์˜€์„ ๋•Œ 80.01%๋กœ ๊ฐ€์žฅ ๋†’์•˜๋‹ค. ๋˜ํ•œ, Recursive Feature Elimination (RFE)๋‚˜ Extra Tree Classifier (ETC)์™€ ๊ฐ™์€ feature selection ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ•˜์˜€์„ ๋•Œ๋Š” 80.87%๋กœ ๋ณธ ์—ฐ๊ตฌ์—์„œ ๊ฐ€์žฅ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค.1. ์„œ๋ก  1 2. ์„ ํ–‰ ์—ฐ๊ตฌ 3 3. ์—ฐ๊ตฌ ๋ฐฉ๋ฒ• 8 3.1 ์Œ์„ฑ ์ฝ”ํผ์Šค 8 3.1.1 ์Œ์„ฑ ์ฝ”ํผ์Šค์˜ ๊ตฌ์„ฑ 8 3.1.2 ๋ฐ์ดํ„ฐ ๋ถ„์ ˆ 12 3.2 ์‹คํ—˜ ๋ชจ๋ธ 15 3.3 ์Œ์„ฑ ํŠน์ง• ์ถ”์ถœ 17 3.3.1 Mel Frequency Cepstral Coefficients (MFCCs) 17 3.3.2 i-vector 21 3.3.3 Fundamental frequency (F0) 24 3.3.4 Jitter 25 3.3.5 Shimmer 27 3.3.6 ๋ฐœํ™”์†๋„ 29 4. ์‹คํ—˜ 31 4.1 ์‹คํ—˜ ์„ค๊ณ„ 31 4.2 ์‹คํ—˜ ๊ฒฐ๊ณผ 33 4.2.1 ์Œ์„ฑ ํŠน์ง•๋ณ„ ์„ฑ๋Šฅ 34 4.2.2 ์กฐํ•ฉ ์„ฑ๋Šฅ 35 4.2.3 feature selection ์ ์šฉ ํ›„ ์„ฑ๋Šฅ 36 4.3 ํ† ์˜ 39 5. ๊ฒฐ๋ก  43 ์ฐธ๊ณ  ๋ฌธํ—Œ 44 ๋ถ€๋ก 48 ์Œ์„ฑ ํŠน์ง•๋ณ„ ํ˜ผ๋™ ํ–‰๋ ฌ 48 ์Œ์„ฑ ์ฝ”ํผ์Šค ๋ฐœํ™” ์œ ํ˜• 52 Abstract 59Maste

    A Framework For Enhancing Speaker Age And Gender Classification By Using A New Feature Set And Deep Neural Network Architectures

    Get PDF
    Speaker age and gender classification is one of the most challenging problems in speech processing. Recently with developing technologies, identifying a speaker age and gender has become a necessity for speaker verification and identification systems such as identifying suspects in criminal cases, improving human-machine interaction, and adapting music for awaiting people queue. Although many studies have been carried out focusing on feature extraction and classifier design for improvement, classification accuracies are still not satisfactory. The key issue in identifying speakerโ€™s age and gender is to generate robust features and to design an in-depth classifier. Age and gender information is concealed in speakerโ€™s speech, which is liable for many factors such as, background noise, speech contents, and phonetic divergences. In this work, different methods are proposed to enhance the speaker age and gender classification based on the deep neural networks (DNNs) as a feature extractor and classifier. First, a model for generating new features from a DNN is proposed. The proposed method uses the Hidden Markov Model toolkit (HTK) tool to find tied-state triphones for all utterances, which are used as labels for the output layer in the DNN. The DNN with a bottleneck layer is trained in an unsupervised manner for calculating the initial weights between layers, then it is trained and tuned in a supervised manner to generate transformed mel-frequency cepstral coefficients (T-MFCCs). Second, the shared class labels method is introduced among misclassified classes to regularize the weights in DNN. Third, DNN-based speakers models using the SDC feature set is proposed. The speakers-aware model can capture the characteristics of the speaker age and gender more effectively than a model that represents a group of speakers. In addition, AGender-Tune system is proposed to classify the speaker age and gender by jointly fine-tuning two DNN models; the first model is pre-trained to classify the speaker age, and second model is pre-trained to classify the speaker gender. Moreover, the new T-MFCCs feature set is used as the input of a fusion model of two systems. The first system is the DNN-based class model and the second system is the DNN-based speaker model. Utilizing the T-MFCCs as input and fusing the final score with the score of a DNN-based class model enhanced the classification accuracies. Finally, the DNN-based speaker models are embedded into an AGender-Tune system to exploit the advantages of each method for a better speaker age and gender classification. The experimental results on a public challenging database showed the effectiveness of the proposed methods for enhancing the speaker age and gender classification and achieved the state of the art on this database
    corecore