5 research outputs found

    A dataset for Audio-Visual Sound Event Detection in Movies

    Full text link
    Audio event detection is a widely studied audio processing task, with applications ranging from self-driving cars to healthcare. In-the-wild datasets such as Audioset have propelled research in this field. However, many efforts typically involve manual annotation and verification, which is expensive to perform at scale. Movies depict various real-life and fictional scenarios which makes them a rich resource for mining a wide-range of audio events. In this work, we present a dataset of audio events called Subtitle-Aligned Movie Sounds (SAM-S). We use publicly-available closed-caption transcripts to automatically mine over 110K audio events from 430 movies. We identify three dimensions to categorize audio events: sound, source, quality, and present the steps involved to produce a final taxonomy of 245 sounds. We discuss the choices involved in generating the taxonomy, and also highlight the human-centered nature of sounds in our dataset. We establish a baseline performance for audio-only sound classification of 34.76% mean average precision and show that incorporating visual information can further improve the performance by about 5%. Data and code are made available for research at https://github.com/usc-sail/mica-subtitle-aligned-movie-sound

    Relationship between spoken Indian languages by clustering of long distance bigram features of speech

    No full text
    In this paper, a novel method of identifying relationships between languages has been proposed. Our analysis deals with four major Indian languages, as well as Sanskrit and English. We have made use of long distance bigram Mel Frequency Cepstrum Coefficient features and different linkage measures to test the similarities between the clusters formed. Phylogenetic trees have been constructed to provide a visual understanding of the same. The results obtained match with already existing knowledge about language families. For all types of linkage measures, the closest language to Hindi is Marathi and for Tamil, it is Telugu. Since K-medoids give expected language relationships, they are used to learn dictionaries in order to see if they are useful in language identification as well. We have reported the results of one-vs-one classification and found that accuracy improves in the case of English when the weights recovered are multiplied with joint probability of the cluster associated with that medoid

    Cleavage of Co-C bond in allyl cobaloximes with arenesulphenyl chloride

    No full text
    830-834Arenesulphenyl chlorides (ArSCI; Ar = Ph, C6Cl5, 2,4(N02)2 C6H3) are reacted with allylcobaloximes, RCo(dmgH)2Py (R= allyl), under thermal and photochemical conditions to obtain the corresponding sulphides as the major organic products.-Pinenyl cobaloxime forms the ring opened product as well. The homolytic as well as heterolytic cleavage of the Co-C bond is considered.</span

    Homolytic displacements at carbon in organocobaloximes: Reactions of organocobaloximes with free radical precursor with two radical centres

    No full text
    986-988The reactions of MeC6H4-SO2-SPh with organocobaloximes, RCo(dmgH)2Py, (R= alkyl, benzyl, butenyl and allyl) under visible light photolysis show that the alkyl, butenyl and benzyl cobaloximes form the corresponding sulphides whereas the allyl cobaloximes form the organic sulphones

    Abstract

    No full text
    corecore