109 research outputs found

    MFCC-GAN Codec: A New AI-based Audio Coding

    Full text link
    In this paper, we proposed AI-based audio coding using MFCC features in an adversarial setting. We combined a conventional encoder with an adversarial learning decoder to better reconstruct the original waveform. Since GAN gives implicit density estimation, therefore, such models are less prone to overfitting. We compared our work with five well-known codecs namely AAC, AC3, Opus, Vorbis, and Speex, performing on bitrates from 2kbps to 128kbps. MFCCGAN_36k achieved the state-of-the-art result in terms of SNR despite a lower bitrate in comparison to AC3_128k, AAC_112k, Vorbis_48k, Opus_48k, and Speex_48K. On the other hand, MFCCGAN_13k also achieved high SNR=27 which is equal to that of AC3_128k, and AAC_112k while having a significantly lower bitrate (13 kbps). MFCCGAN_36k achieved higher NISQA-MOS results compared to AAC_48k while having a 20% lower bitrate. Furthermore, MFCCGAN_13k obtained NISQAMOS= 3.9 which is much higher than AAC_24k, AAC_32k, AC3_32k, and AAC_48k. For future work, we finally suggest adopting loss functions optimizing intelligibility and perceptual metrics in the MFCCGAN structure to improve quality and intelligibility simultaneously.Comment: Accepted in ABU Technical Review journal 2023/

    Road monitoring utilizing cooperative HD Maps maintenance and Linked Data:a case study of road construction monitoring

    Get PDF
    In the context of intelligent traffic systems, the latest developments focus on elevating the infrastructure's capability by creating intelligent solutions unlocked by industrial revolution 4.0. Digitalization elevates the capability of sub-systems in the context of the urban environment and accelerates business-as-usual processes. In this context, practitioners strive to integrate the information streams from sub-systems for a unified, accurate, accessible and coherent view of information from diverse sources, enabling better decision-making and operational efficiency. However, the format of information streams is vendor-specific and introduces a burden while integrating the data streams.In this project, on the one hand, in the automotive domain vehicles rely on high-resolution maps and semantics of the environment. To this end, a growing sector emphasizes smart infrastructure through the sensory collection and analytics of on-site occurrences on the roads and providing it to the end users and the vehicles, entities such as HD Map providers. On the other hand, there exist vital entities in the operation of intelligent traffic infrastructure that share information about the traffic actors and incidents, such as National Access Points (NAPs). Finally, there are entities whose day-to-day operation necessitates information collection and site visits for on-site observations of the road incidents and direct them back to their business processes, entities such as road operators, local authorities, and municipalities, to name a few.The developed prototype strives to perform data integration of sources of information triggering roadwork and construction occurrences while mitigating interoperability issues using graph databases, ontologies and Linked Data. This design project steps in a niche application of data integration and interoperable infrastructure in the context of smart cities, enabling the infrastructure to utilise the efforts of operational actors in their processes

    A Novel Approach for Object Based Audio Broadcasting

    Full text link
    Object Based Audio (OBA) provides a new kind of audio experience, delivered to the audience to personalize and customize their experience of listening and to give them choice of what and how to hear their audio content. OBA can be applied to different platforms such as broadcasting, streaming and cinema sound. This paper presents a novel approach for creating object-based audio on the production side. The approach here presents Sample-by-Sample Object Based Audio (SSOBA) embedding. SSOBA places audio object samples in such a way that allows audiences to easily individualize their chosen audio sources according to their interests and needs. SSOBA is an extra service and not an alternative, so it is also compliant with legacy audio players. The biggest advantage of SSOBA is that it does not require any special additional hardware in the broadcasting chain and it is therefore easy to implement and equip legacy players and decoders with enhanced ability. Input audio objects, number of output channels and sampling rates are three important factors affecting SSOBA performance and specifying it to be lossless or lossy. SSOBA adopts interpolation at the decoder side to compensate for eliminated samples. Both subjective and objective experiments are carried out to evaluate the output results at each step. MUSHRA subjective experiments conducted after the encoding step shows good-quality performance of SSOBA with up to five objects. SNR measurements and objective experiments, performed after decoding and interpolation, show significant successful recovery and separation of audio objects. Experimental results show that a minimum sampling rate of 96 kHz is indicated to encode up to five objects in a Stereo-mode channel to acquire good subjective and objective results simultaneously.Comment: Accepted in ABU Technical Review Journal 2020/

    An overview of text-to-speech systems and media applications

    Full text link
    Producing synthetic voice, similar to human-like sound, is an emerging novelty of modern interactive media systems. Text-To-Speech (TTS) systems try to generate synthetic and authentic voices via text input. Besides, well known and familiar dubbing, announcing and narrating voices, as valuable possessions of any media organization, can be kept forever by utilizing TTS and Voice Conversion (VC) algorithms . The emergence of deep learning approaches has made such TTS systems more accurate and accessible. To understand TTS systems better, this paper investigates the key components of such systems including text analysis, acoustic modelling and vocoding. The paper then provides details of important state-of-the-art TTS systems based on deep learning. Finally, a comparison is made between recently released systems in term of backbone architecture, type of input and conversion, vocoder used and subjective assessment (MOS). Accordingly, Tacotron 2, Transformer TTS, WaveNet and FastSpeech 1 are among the most successful TTS systems ever released. In the discussion section, some suggestions are made to develop a TTS system with regard to the intended application.Comment: Accepted in ABU Technical Review journal 2023/

    MFCCGAN: A Novel MFCC-Based Speech Synthesizer Using Adversarial Learning

    Full text link
    In this paper, we introduce MFCCGAN as a novel speech synthesizer based on adversarial learning that adopts MFCCs as input and generates raw speech waveforms. Benefiting the GAN model capabilities, it produces speech with higher intelligibility than a rule-based MFCC-based speech synthesizer WORLD. We evaluated the model based on a popular intrusive objective speech intelligibility measure (STOI) and quality (NISQA score). Experimental results show that our proposed system outperforms Librosa MFCC- inversion (by an increase of about 26% up to 53% in STOI and 16% up to 78% in NISQA score) and a rise of about 10% in intelligibility and about 4% in naturalness in comparison with conventional rule-based vocoder WORLD that used in the CycleGAN-VC family. However, WORLD needs additional data like F0. Finally, using perceptual loss in discriminators based on STOI could improve the quality more. WebMUSHRA-based subjective tests also show the quality of the proposed approach.Comment: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP

    Knowledge and Attitudes towards AIDS in Mashhad (Northeast of Iran) Female Sex Workers

    Get PDF
    Background: In view of the fact that knowledge and attitudes are important in control of individuals' behaviors and because female sex workers with their multiple sexual relationships are in highly risk to HIV and also an important factor in transmission of AIDS in the community. This study evaluates the female sex workers' knowledge and attitudes toward AIDS.Methods: This research was cross-sectional descriptive study. The statistical population was female sex workers that were being supported by Welfare Organization in Mashhad (central of Razavi Khorasan province in northeast of Iran) in 2011. Sampling method was census. So the whole of 61 female sex workers were completed questionnaires of knowledge and attitude toward AIDS.Results: The results showed that the knowledge of the transmission Incubation period and Prevention of AIDS, 64% of subjects were at the low level. The knowledge about methods of transmission of AIDS in the 57% of female sex workers also was low. The subjects' attitude toward prevention of AIDS, their willingness to cooperate in the implementation of preventive programs and their feeling of risk for 18% were negative and attitude of 61% of them were neutral.Conclusion: Knowledge about transmission, incubation period and prevention of AIDS were in low level. Also their attitude to AIDS's prevention, cooperation in prevention programs and their risk feeling mostly was neutral or negative that can be an alarm

    Rheological characterization of nanostructured material based on Polystyrene-b-poly(ethylene-butylene)-b-polystyrene (SEBS) block copolymer: Effect of block copolymer composition and nanoparticle geometry

    Get PDF
    Block copolymer (BCP) nanocomposite systems are of broad interest; however, reports on the role of nanoparticles on microphase separation behavior are rare. The goal of present study is to investigate the preparation of composite nanostructured materials containing Multi-Walled Carbon Nanotubes (MWCNTs) or graphene nanoplates. BCP nanocomposites based on the linear triblock copolymer, Polystyrene-b-poly(ethylene-butylene)-b-polystyrene (SEBS), with different morphological structure were prepared by melt mixing. The results of temperature sweep experiments showed an enhancing effect of both MWCNT and graphene nanosheets on increasing the microphase separation temperature as well as accelerating its kinetic, resulting from the confinement of BCP segments, with graphene nanosheets providing a more severely confined geometry for polystyrene segments in contrast to MWCNTs. Additionally, DMTA results indicated a promotion of the BCP microphase separation by incorporation of nanoparticles. Transient flow measurements followed by time sweep test suggested the existence of a special 3D network microstructure caused by nanoparticles/domain interactions

    Impact of Highlighting Techniques on the Retention of Unfamiliar Words in L2 Classrooms

    Get PDF
    Since L2 learners cannot learn the mass of words potentially available to them, it would be more useful to teach them specific strategies for dealing with unfamiliar words. One possible solution these days is to provide learners with useful guidance by which they can tackle the problem efficiently. However, the question is how much noticing in the input may result in acquisition and retrieval of information concerning unfamiliar words. Accordingly, the present essay sought to investigate the extent to different types of highlighting techniques can improve the retention of unfamiliar words by L2 learners. To this end, from the population of first graders studying at a boys and a girls high school a sample of 240 pre-intermediate students  (120 girls and 120 boys) were randomly selected based on their scores on an OPT test. The intended words for the experiment were selected through a pre-test containing vocabulary items which were unknown to the participants.  Using   these words, a reading text with three passages each containing 30 words unknown to the participants was given to the targeted participants. The new, unfamiliar words were highlighted for the experimental groups (by color for the first group, underlined for the second group and written in italics for the third group). Apparently, no highlighting techniques were used for the control groups. The Persian translations of the new words were also provided given at the end of the passage. To measure the acquisition of unfamiliar target words three types of tests; namely, recall, recognition, and comprehension tests were administered twice - one immediately after reading passages and the other two weeks later. The findings revealed that the retention of the words was significantly higher in experimental groups than that of the control groups. More specifically, the retention of the words was not only significantly higher in underlining group compared to other types of highlighting techniques but it also  was meaningfully higher in both immediate and delayed tests for  the experimental groups compared to those in the  control groups. Notably, the participants performed better in recognition  than the recall test and  the results did not show any interaction between retention of unfamiliar words and gender.

    Effect of Seed Priming on Some Characteristic of Seedling and Seed Vigor of Tomato (Lycopersicon esculentum)

    Get PDF
    In order to investigate the effect of seed priming on some characteristic of seedling and seed vigor of tomato cultivar: ZD 610, an experiment was conducted base on randomized complete design with four replications at the applied agricultural science education center of Shahrood (Iran) greenhouse during 2010. Traits such as fresh weight of seedling, root length, shoot length, mean germination time (MGT) and final germination percentage (FGP). Treatment included Distilled water (dH2O), sodium chloride (NaCl-2%), salicylic acid (SA-60 ppm), acetylsalicylic acid (ASA-60 ppm), ascorbic acid (AsA-60 ppm), PEG-6000 and potassium nitrate (KNO3-5%), in darkness for 48 hours. The results indicated that the effect of priming treatments was significant. KNO3 in all of traits was better than other priming. The lowest fresh weight of seedling was observed in control and then H2O and NaCl. The minimum of root and shoot length was observed in NaCl treatment. The KNO3 treatment as a superior treatment introducing this experiment
    • …
    corecore