831,265 research outputs found

    LEARNING MEDIA OF SIGNAL AUDIO FILTER FOR AUDIO ENGINEERING SUBJECT

    Get PDF
    This research aims to determine the design, performance, and advisability level of “Learning Media of Signal Audio Filter for Audio Engineering Subject” as a learning medium of audio engineering subjects at Audio Video Engineering department at SMKN 3 Yogyakarta. This research is a Research and Development. Object of this research is the “Learning Media of Signal Audio Filter for Audio Engineering Subject” be equipped learning module. Development steps consist of 1). Analysis, 2). Design, 3). Implementation, 4). Testing, 5). Validation, and 6). Trial usage. The method to collect the data consist of 1). Testing and observation of performance, 2). Questionnaire research. The media validation involving two experts learning media and two experts learning materials and usage trials conducted by 33 students. The results show that the performance of “Learning Media of Signal Audio Filter for Audio Engineering Subject”is fit for purpose as a learning medium of audio filter. Test results of AFG circuit can produce output signal with three waveforms are sine, sawtooth and a square with a frequency between 10 Hz-30 KHz. Circuit frequency counter can count frequencies between 10 Hz-25 KHz and amplitude can be read with a range between 0.3 Vp-p-10 Vp-p. Each filter circuit board can work well in frequency range between 20 Hz-20 KHz. The results validate the content of the learning material experts get a level of validity with the percentage of 81.77% with a very decent category. The validation of the construct by expert learning media get the level of validity with the percentage of 87.5% to the category of very decent. While in used test by students in SMK N 3 Yogyakarta a validity of 78.5% to the category of very decent. Keywords: media, learning, filters, audio signal

    Learning weakly supervised multimodal phoneme embeddings

    Full text link
    Recent works have explored deep architectures for learning multimodal speech representation (e.g. audio and images, articulation and audio) in a supervised way. Here we investigate the role of combining different speech modalities, i.e. audio and visual information representing the lips movements, in a weakly supervised way using Siamese networks and lexical same-different side information. In particular, we ask whether one modality can benefit from the other to provide a richer representation for phone recognition in a weakly supervised setting. We introduce mono-task and multi-task methods for merging speech and visual modalities for phone recognition. The mono-task learning consists in applying a Siamese network on the concatenation of the two modalities, while the multi-task learning receives several different combinations of modalities at train time. We show that multi-task learning enhances discriminability for visual and multimodal inputs while minimally impacting auditory inputs. Furthermore, we present a qualitative analysis of the obtained phone embeddings, and show that cross-modal visual input can improve the discriminability of phonological features which are visually discernable (rounding, open/close, labial place of articulation), resulting in representations that are closer to abstract linguistic features than those based on audio only

    THE INFLUENCE OF FAMILY, PEERS, TEACHING METHODS OF TEACHER, AND THE USE OF INTERNET ON STUDENT’S INTEREST IN LEARNING ELECTRONIC FOR AUDIO VIDEO STUDENTS AT SMK N 3 YOGYAKARTA

    Get PDF
    This study aim to know about 1) the influence of family, peers, teaching methods of teacher, and the use of Internet on student’s interest in learning electronic for audio video students at SMK N 3 Yogyakarta, 2) from family, peers, teaching methods of teacher, and the use of internet, which is the most influential factor on student's interest in learning electronics for audio video students at SMK N 3 Yogyakarta. The subjects are 99 students of competency skills audio video at SMK N 3 Yogyakarta. The variables are the family environment (X1), the peer environment (X2), the teaching method of teachers (X3), the use of internet (X4), and student’s interest in learning electronics (Y).The Techniques of data collection use questionnaire and the analysis method use 4 Predictors of Multiple Regression. The results show that the family, peers, teaching methods of teacher, and the use of internet have a positive and significant impact on students' interest in learning electronics for audio video students at SMK N 3 Yogyakarta, it can be seen from Rhitung (0.616) > Rtabel (0.195). The relative contribution of each variable is the use of Internet at 66.3%, family environment at 12.27%, teaching methods of teachers at 11%, and 10.5% of peer environment. Thus, the factors that most influential on the student's interest in learning electronics for audio video students at SMK N 3 Yogyakarta is the use of Internet. Keywords: family, peers, teaching methods, internet, interests in learning electronics

    Weakly Labelled AudioSet Tagging with Attention Neural Networks

    Full text link
    Audio tagging is the task of predicting the presence or absence of sound classes within an audio clip. Previous work in audio tagging focused on relatively small datasets limited to recognising a small number of sound classes. We investigate audio tagging on AudioSet, which is a dataset consisting of over 2 million audio clips and 527 classes. AudioSet is weakly labelled, in that only the presence or absence of sound classes is known for each clip, while the onset and offset times are unknown. To address the weakly-labelled audio tagging problem, we propose attention neural networks as a way to attend the most salient parts of an audio clip. We bridge the connection between attention neural networks and multiple instance learning (MIL) methods, and propose decision-level and feature-level attention neural networks for audio tagging. We investigate attention neural networks modeled by different functions, depths and widths. Experiments on AudioSet show that the feature-level attention neural network achieves a state-of-the-art mean average precision (mAP) of 0.369, outperforming the best multiple instance learning (MIL) method of 0.317 and Google's deep neural network baseline of 0.314. In addition, we discover that the audio tagging performance on AudioSet embedding features has a weak correlation with the number of training samples and the quality of labels of each sound class.Comment: 13 page

    Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval

    Get PDF
    Deep cross-modal learning has successfully demonstrated excellent performance in cross-modal multimedia retrieval, with the aim of learning joint representations between different data modalities. Unfortunately, little research focuses on cross-modal correlation learning where temporal structures of different data modalities such as audio and lyrics should be taken into account. Stemming from the characteristic of temporal structures of music in nature, we are motivated to learn the deep sequential correlation between audio and lyrics. In this work, we propose a deep cross-modal correlation learning architecture involving two-branch deep neural networks for audio modality and text modality (lyrics). Data in different modalities are converted to the same canonical space where inter modal canonical correlation analysis is utilized as an objective function to calculate the similarity of temporal structures. This is the first study that uses deep architectures for learning the temporal correlation between audio and lyrics. A pre-trained Doc2Vec model followed by fully-connected layers is used to represent lyrics. Two significant contributions are made in the audio branch, as follows: i) We propose an end-to-end network to learn cross-modal correlation between audio and lyrics, where feature extraction and correlation learning are simultaneously performed and joint representation is learned by considering temporal structures. ii) As for feature extraction, we further represent an audio signal by a short sequence of local summaries (VGG16 features) and apply a recurrent neural network to compute a compact feature that better learns temporal structures of music audio. Experimental results, using audio to retrieve lyrics or using lyrics to retrieve audio, verify the effectiveness of the proposed deep correlation learning architectures in cross-modal music retrieval

    Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems

    Full text link
    Voice Processing Systems (VPSes), now widely deployed, have been made significantly more accurate through the application of recent advances in machine learning. However, adversarial machine learning has similarly advanced and has been used to demonstrate that VPSes are vulnerable to the injection of hidden commands - audio obscured by noise that is correctly recognized by a VPS but not by human beings. Such attacks, though, are often highly dependent on white-box knowledge of a specific machine learning model and limited to specific microphones and speakers, making their use across different acoustic hardware platforms (and thus their practicality) limited. In this paper, we break these dependencies and make hidden command attacks more practical through model-agnostic (blackbox) attacks, which exploit knowledge of the signal processing algorithms commonly used by VPSes to generate the data fed into machine learning systems. Specifically, we exploit the fact that multiple source audio samples have similar feature vectors when transformed by acoustic feature extraction algorithms (e.g., FFTs). We develop four classes of perturbations that create unintelligible audio and test them against 12 machine learning models, including 7 proprietary models (e.g., Google Speech API, Bing Speech API, IBM Speech API, Azure Speaker API, etc), and demonstrate successful attacks against all targets. Moreover, we successfully use our maliciously generated audio samples in multiple hardware configurations, demonstrating effectiveness across both models and real systems. In so doing, we demonstrate that domain-specific knowledge of audio signal processing represents a practical means of generating successful hidden voice command attacks
    corecore