831,265 research outputs found
LEARNING MEDIA OF SIGNAL AUDIO FILTER FOR AUDIO ENGINEERING SUBJECT
This research aims to determine the design, performance, and advisability level of “Learning Media of Signal Audio Filter for Audio Engineering Subject” as a learning medium of audio engineering subjects at Audio Video Engineering department at SMKN 3 Yogyakarta.
This research is a Research and Development. Object of this research is the “Learning Media of Signal Audio Filter for Audio Engineering Subject” be equipped learning module. Development steps consist of 1). Analysis, 2). Design, 3). Implementation, 4). Testing, 5). Validation, and 6). Trial usage. The method to collect the data consist of 1). Testing and observation of performance, 2). Questionnaire research. The media validation involving two experts learning media and two experts learning materials and usage trials conducted by 33 students.
The results show that the performance of “Learning Media of Signal Audio Filter for Audio Engineering Subject”is fit for purpose as a learning medium of audio filter. Test results of AFG circuit can produce output signal with three waveforms are sine, sawtooth and a square with a frequency between 10 Hz-30 KHz. Circuit frequency counter can count frequencies between 10 Hz-25 KHz and amplitude can be read with a range between 0.3 Vp-p-10 Vp-p. Each filter circuit board can work well in frequency range between 20 Hz-20 KHz. The results validate the content of the learning material experts get a level of validity with the percentage of 81.77% with a very decent category. The validation of the construct by expert learning media get the level of validity with the percentage of 87.5% to the category of very decent. While in used test by students in SMK N 3 Yogyakarta a validity of 78.5% to the category of very decent.
Keywords: media, learning, filters, audio signal
Learning weakly supervised multimodal phoneme embeddings
Recent works have explored deep architectures for learning multimodal speech
representation (e.g. audio and images, articulation and audio) in a supervised
way. Here we investigate the role of combining different speech modalities,
i.e. audio and visual information representing the lips movements, in a weakly
supervised way using Siamese networks and lexical same-different side
information. In particular, we ask whether one modality can benefit from the
other to provide a richer representation for phone recognition in a weakly
supervised setting. We introduce mono-task and multi-task methods for merging
speech and visual modalities for phone recognition. The mono-task learning
consists in applying a Siamese network on the concatenation of the two
modalities, while the multi-task learning receives several different
combinations of modalities at train time. We show that multi-task learning
enhances discriminability for visual and multimodal inputs while minimally
impacting auditory inputs. Furthermore, we present a qualitative analysis of
the obtained phone embeddings, and show that cross-modal visual input can
improve the discriminability of phonological features which are visually
discernable (rounding, open/close, labial place of articulation), resulting in
representations that are closer to abstract linguistic features than those
based on audio only
THE INFLUENCE OF FAMILY, PEERS, TEACHING METHODS OF TEACHER, AND THE USE OF INTERNET ON STUDENT’S INTEREST IN LEARNING ELECTRONIC FOR AUDIO VIDEO STUDENTS AT SMK N 3 YOGYAKARTA
This study aim to know about 1) the influence of family, peers, teaching
methods of teacher, and the use of Internet on student’s interest in learning
electronic for audio video students at SMK N 3 Yogyakarta, 2) from family,
peers, teaching methods of teacher, and the use of internet, which is the most
influential factor on student's interest in learning electronics for audio video
students at SMK N 3 Yogyakarta.
The subjects are 99 students of competency skills audio video at SMK N 3
Yogyakarta. The variables are the family environment (X1), the peer environment
(X2), the teaching method of teachers (X3), the use of internet (X4), and student’s
interest in learning electronics (Y).The Techniques of data collection use
questionnaire and the analysis method use 4 Predictors of Multiple Regression.
The results show that the family, peers, teaching methods of teacher, and the
use of internet have a positive and significant impact on students' interest in
learning electronics for audio video students at SMK N 3 Yogyakarta, it can be
seen from Rhitung (0.616) > Rtabel (0.195). The relative contribution of each variable
is the use of Internet at 66.3%, family environment at 12.27%, teaching methods
of teachers at 11%, and 10.5% of peer environment. Thus, the factors that most
influential on the student's interest in learning electronics for audio video students
at SMK N 3 Yogyakarta is the use of Internet.
Keywords: family, peers, teaching methods, internet, interests in learning
electronics
Weakly Labelled AudioSet Tagging with Attention Neural Networks
Audio tagging is the task of predicting the presence or absence of sound
classes within an audio clip. Previous work in audio tagging focused on
relatively small datasets limited to recognising a small number of sound
classes. We investigate audio tagging on AudioSet, which is a dataset
consisting of over 2 million audio clips and 527 classes. AudioSet is weakly
labelled, in that only the presence or absence of sound classes is known for
each clip, while the onset and offset times are unknown. To address the
weakly-labelled audio tagging problem, we propose attention neural networks as
a way to attend the most salient parts of an audio clip. We bridge the
connection between attention neural networks and multiple instance learning
(MIL) methods, and propose decision-level and feature-level attention neural
networks for audio tagging. We investigate attention neural networks modeled by
different functions, depths and widths. Experiments on AudioSet show that the
feature-level attention neural network achieves a state-of-the-art mean average
precision (mAP) of 0.369, outperforming the best multiple instance learning
(MIL) method of 0.317 and Google's deep neural network baseline of 0.314. In
addition, we discover that the audio tagging performance on AudioSet embedding
features has a weak correlation with the number of training samples and the
quality of labels of each sound class.Comment: 13 page
Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval
Deep cross-modal learning has successfully demonstrated excellent performance in cross-modal multimedia retrieval, with the aim of learning joint representations between different data modalities. Unfortunately, little research focuses on cross-modal correlation learning where temporal structures of different data modalities such as audio and lyrics should be taken into account. Stemming from the characteristic of temporal structures of music in nature, we are motivated to learn the deep sequential correlation between audio and lyrics. In this work, we propose a deep cross-modal correlation learning architecture involving two-branch deep neural networks for audio modality and text modality (lyrics). Data in different modalities are converted to the same canonical space where inter modal canonical correlation analysis is utilized as an objective function to calculate the similarity of temporal structures. This is the first study that uses deep architectures for learning the temporal correlation between audio and lyrics. A pre-trained Doc2Vec model followed by fully-connected layers is used to represent lyrics. Two significant contributions are made in the audio branch, as follows: i) We propose an end-to-end network to learn cross-modal correlation between audio and lyrics, where feature extraction and correlation learning are simultaneously performed and joint representation is learned by considering temporal structures. ii) As for feature extraction, we further represent an audio signal by a short sequence of local summaries (VGG16 features) and apply a recurrent neural network to compute a compact feature that better learns temporal structures of music audio. Experimental results, using audio to retrieve lyrics or using lyrics to retrieve audio, verify the effectiveness of the proposed deep correlation learning architectures in cross-modal music retrieval
Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems
Voice Processing Systems (VPSes), now widely deployed, have been made
significantly more accurate through the application of recent advances in
machine learning. However, adversarial machine learning has similarly advanced
and has been used to demonstrate that VPSes are vulnerable to the injection of
hidden commands - audio obscured by noise that is correctly recognized by a VPS
but not by human beings. Such attacks, though, are often highly dependent on
white-box knowledge of a specific machine learning model and limited to
specific microphones and speakers, making their use across different acoustic
hardware platforms (and thus their practicality) limited. In this paper, we
break these dependencies and make hidden command attacks more practical through
model-agnostic (blackbox) attacks, which exploit knowledge of the signal
processing algorithms commonly used by VPSes to generate the data fed into
machine learning systems. Specifically, we exploit the fact that multiple
source audio samples have similar feature vectors when transformed by acoustic
feature extraction algorithms (e.g., FFTs). We develop four classes of
perturbations that create unintelligible audio and test them against 12 machine
learning models, including 7 proprietary models (e.g., Google Speech API, Bing
Speech API, IBM Speech API, Azure Speaker API, etc), and demonstrate successful
attacks against all targets. Moreover, we successfully use our maliciously
generated audio samples in multiple hardware configurations, demonstrating
effectiveness across both models and real systems. In so doing, we demonstrate
that domain-specific knowledge of audio signal processing represents a
practical means of generating successful hidden voice command attacks
- …
