17 research outputs found

    Hierarchical Representation and Estimation of Prosody using Continuous Wavelet Transform

    Get PDF
    Prominences and boundaries are the essential constituents of prosodic struc- ture in speech. They provide for means to chunk the speech stream into linguis- tically relevant units by providing them with relative saliences and demarcating them within utterance structures. Prominences and boundaries have both been widely used in both basic research on prosody as well as in text-to-speech syn- thesis. However, there are no representation schemes that would provide for both estimating and modelling them in a unified fashion. Here we present an unsupervised unified account for estimating and representing prosodic promi- nences and boundaries using a scale-space analysis based on continuous wavelet transform. The methods are evaluated and compared to earlier work using the Boston University Radio News corpus. The results show that the proposed method is comparable with the best published supervised annotation methods.Peer reviewe

    Computer-based Blind Diagnostic System for Classification of Healthy and Disordered Voices

    Get PDF
    A large population around the world is suffering from voice-related complications. Computer-based voice disorder detection systems can play a substantial role in the early detection of voice disorders by providing complementary information to early-career otolaryngologists and general practitioners. However, various studies have concluded that the recording environment of voice samples affects disorder detection. This influence of the recording environment is a major obstacle in developing such systems when a local voice disorder database is not available. In addition, sometimes the number of samples is not sufficient for training the system. To overcome these issues, a blind detection system for voice disorders is designed and implemented in this study. Hence, without any prior knowledge of voice disorders, the proposed system has the ability to detect those disorders. The developed system relies only on healthy voice samples which can be recorded locally in the desired environment. The generation of a reference model for healthy subjects and decision criteria to detect voice disorders are two major tasks in the proposed systems. These tasks are implemented with two different types of speech features. Moreover, the unsupervised reference model is created by using DBSCAN and k-means algorithms. The overall performance of the system is 74.9 % in terms of the geometric mean of sensitivity and specificity. The results of the proposed system are encouraging and better than the performance of Multidimensional Voice Program (MDVP) parameters which are widely used for disorder assessment by otolaryngologists in clinics

    ПОРІВНЯЛЬНИЙ АНАЛІЗ МАТЕМАТИЧНИХ МОДЕЛЕЙ МОВНОЇ ІНФОРМАЦІЇ

    Get PDF
    The aggravation of the cyber security situation around Ukraine requires a radical revision of the current approaches to ensuring the cyber security of information and telecommunication systems of the state. Anticipatory pace of development of means and technologies of cyberattack determines the need to find new non-trivial (asymmetric) and at the same time practical ideas aimed at ensuring cyber security of information regardless of the type of its presentation. Recently, speech information that circulates in IP networks has become the object of cyberattacks by unscrupulous competitors, foreign government institutions, and simply interested individuals. As known, one of the most effective measures of cyber security of speech information is its cryptographic protection. Well-known international and national cryptographic protocols provide sufficient cryptographic stability, but despite this, the number of cyber threats to speech information does not decrease, but, on the contrary, increases in proportion to the growth of its value. Therefore, the issue of increasing the level of security of speech information that circulates in IP networks remains relevant. One of the first stages on the way to the creation of the latest cryptographic means of protecting speech information is the analysis of relevant mathematical models. In order to establish the advantages and disadvantages of known mathematical models of speech information and choose among them with the same accuracy the one that will consider the individual features of the source of speech information, as well as have an acceptable realizability for a given system of parameters, the article presents the results of the analysis of two classes of models: dynamic and stochastic. It is shown that the main dynamic models of speech information, which belong to the models of the first class, are wavelet models, pulse-modulated and wave models, models of linear prediction, harmonic mathematical models. In the article, in addition to the well-known mathematical models of the first class, a new type of them is analyzed - Fredholm models of speech information. The second class of models considered in the article includes two of the most common types, namely: acoustic-phonetic models and speech traffic models. For each of the researched models of one or another class and type, developers were established, the mathematical apparatus underlying them was specified, and the researched mathematical model of speech information is formalized. On the basis of the introduced qualitative scale, based on the totality of the determined advantages and disadvantages of the analyzed models, the degree of achievement of the obtained results was assessed in accordance with the goal set in the article. Therefore, the conducted analysis covered the most common classes of mathematical models of speech information and made it possible to choose among them the one that will become the basis for the development of the latest cryptographic means of protection.Загострення кібербезпекової ситуації навколо України потребує кардинального перегляду чинних підходів до забезпечення кібербезпеки інформаційно-комунікаційних систем держави. Випереджальні темпи розвитку засобів та технологій кібернападу обумовлюють необхідність пошуку нових нетривіальних (асиметричних) та одночасно практичних ідей, спрямованих на забезпечення кіберзахисту інформації незалежно від виду її подання. Останнім часом мовна інформація, яка циркулює в IP-мережах, стає об’єктом кібернападу з боку недобросовісних конкурентів, іноземних державних інституцій і просто зацікавлених осіб. Як відомо, одним із найдієвіших заходів кіберзахисту мовної інформації є її криптографічний захист. Відомі міжнародні та національні криптографічні протоколи забезпечують достатню криптографічну стійкість, але  попри це кількість кіберзагроз мовній інформації не зменшується, а ,навпаки, збільшується пропорційно до зростання її цінності. Тому й надалі залишається актуальним питання підвищення рівня захищеності мовної інформації, яка циркулює в IP-мережах. Одним із перших етапів на шляху створення новітніх криптографічних засобів захисту мовної інформації є аналіз відповідних математичних моделей. Для встановлення переваг та недоліків відомих математичних моделей мовної інформації та вибору серед них за однакової точності тієї, яка враховуватиме індивідуальні особливості джерела мовної інформації, а також матиме прийнятну реалізованість для заданої системи параметрів, у статті наведено результати аналізу двох класів моделей: динамічних та стохастичних. Показано, що основними динамічними моделями мовної інформації, які належать до моделей першого класу, є вейвлет-моделі, імпульсно-модульовані та хвильові, моделі лінійного передбачення, гармонічні математичні моделі. У статі окрім відомих математичних моделей першого класу проаналізовано їх новий тип – фредгольмові моделі мовної інформації. До другого класу моделей, розглянутих у статті, включено два типи з найбільш поширених, а саме: акусто-фонетичні моделі та моделі мовного трафіка. Для кожної з досліджених моделей того чи іншого класу і типу було встановлено розробників, наведено математичний апарат, який покладено в їх основу, формалізовано досліджувану математичну модель мовної інформації. На основі введеної якісної шкали за сукупністю визначених переваг та недоліків проаналізованих моделей оцінено ступінь досяжності одержаних результатів відповідно до поставленої в статті мети. Отже, проведений аналіз охопив найбільш поширені класи математичних моделей мовної інформації та дозволив серед них обрати ту, яка стане підґрунтям для розроблення новітніх криптографічних засобів захисту

    MP3 audio steganography technique using extended least significant bit

    Get PDF
    Audio Steganography is the process of concealing secret messages into audio file. The goal for using audio steganography is to avoid drawing suspicion to the transmission of the secret message. Prior research studies have indicated that the main properties in steganography technique are imperceptibility, robustness and capacity. MP3 file is a popular audio media, which provides different compression rate and performing steganography in MP3 format after compression is the most desirable one. To date, there is not much research work that embeds messages after compression. An audio steganographic technique that utilizes Standard Least Significant Bits (SLSB) of the audio stream to embed secret message has gained popularity over the years. Unfortunately the technique suffers from imperceptibility, security and capacity. This research offers an extended Least Significant Bit (XLSB) technique in order to circumvent the weakness. The secret message is scrambled before embedding. Scrambling technique is introduced in two steps; partitioning the secret message (speech) into blocks followed by block permutation, in order to confuse the contents of the secret message. To enhance difficulty for attackers to retrieve the secret message, the message is not embedded in every byte of the audio file. Instead the first position of embedding bit is chosen randomly and the rest of the bits are embedded only in even value of bytes of the audio file. For extracting the secret message, the permutation code book is used to reorder the message blocks into its original form. Md5sum and SHA-256 are used to verify whether the secret message is altered or not during transmission. Experimental results measured by peak signal to noise ratio, bit error rate, Pearson Correlation and chi-square show that the XLSB performs better than SLSB. Moreover, XLSB can embed a maximum of 750KB into MP3 file with 30db average result. This research contributes to the information security community by providing more secure steganography technique which provides message confidentiality and integrity

    Improved steganalysis technique based on least significant bit using artificial neural network for MP3 files

    Get PDF
    MP3 files are one of the most widely used digital audio formats that provide a high compression ratio with reliable quality. Their widespread use has resulted in MP3 audio files becoming excellent covers to carry hidden information in audio steganography on the Internet. Emerging interest in uncovering such hidden information has opened up a field of research called steganalysis that looked at the detection of hidden messages in a specific media. Unfortunately, the detection accuracy in steganalysis is affected by bit rates, sampling rate of the data type, compression rates, file track size and standard, as well as benchmark dataset of the MP3 files. This thesis thus proposed an effective technique to steganalysis of MP3 audio files by deriving a combination of features from MP3 file properties. Several trials were run in selecting relevant features of MP3 files like the total harmony distortion, power spectrum density, and peak signal-to-noise ratio (PSNR) for investigating the correlation between different channels of MP3 signals. The least significant bit (LSB) technique was used in the detection of embedded secret files in stego-objects. This involved reading the stego-objects for statistical evaluation for possible points of secret messages and classifying these points into either high or low tendencies for containing secret messages. Feed Forward Neural Network with 3 layers and traingdx function with an activation function for each layer were also used. The network vector contains information about all features, and is used to create a network for the given learning process. Finally, an evaluation process involving the ANN test that compared the results with previous techniques, was performed. A 97.92% accuracy rate was recorded when detecting MP3 files under 96 kbps compression. These experimental results showed that the proposed approach was effective in detecting embedded information in MP3 files. It demonstrated significant improvement in detection accuracy at low embedding rates compared with previous work

    Daftar Ebook Engineering Science Terbitan Springer Tahun 2018

    Get PDF
    Artikel ini memuat daftar judul ebook bidang ilmu teknik yang diterbitkan oleh Springer pada tahun 2018 yang dimiliki oleh Unand
    corecore