20 research outputs found

    Interpreting intermediate feature representations of raw-waveform deep CNNs by sonification

    Get PDF
    The majority of the recent works that address the interpretability of raw waveform based deep neural networks (DNNs) for audio processing focus on interpreting spectral and frequency response information, often limiting to visual and signal theoretic means of interpretation, solely for the first layer. This work proposes sonification, a method to interpret intermediate feature representations of sound event recognition (SER) 1D-convolutional neural networks (1D-CNNs) trained on raw waveforms by mapping these representations back into the discrete-time input signal domain, highlighting substructures in the input that maximally activate a feature map as intelligible acoustic events. Sonification is used to compare supervised and contrastive self-supervised feature representations, observing how the latter learn more acoustically discernible representations, especially in the deeper layers. A metric to quantify acoustic similarity between the interpretations and their corresponding inputs is proposed, and a layer-by-layer analysis of the trained feature representations using this metric supports the observations made

    Analysing the consumer preference of fluid milk in province no. 2 of Nepal

    Get PDF
    Information is an asset for any industry. Some information such as the consumer preference is hidden deep in the mind of the consumer which is difficult to access. Studies have revealed that the consumer preferences can be measured effectively and their research may provide a deeper understanding of the choices that consumers make when deciding to select an offer against another. Milk is one of the major components of diet for the people around the globe. The demand for milk and other dairy products is generally income elastic. The marketing of fluid milk is not similar as compared to other consumer-based goods. The demand for milk and milk products depend considerably on the consumption pattern, food habits, geographical region, urbanization and life style. The study was conducted to analyse the consumer preference of fluid milk in Province no. 2 of Nepal. Rautahat and Saptari districts from Province no. 2 were selected for the study. The total sample size of 180 household was selected for study but data from 159 households was only taken for consideration. Consumer preference was analysed using tabular and percentage analysis. Garret’s ranking technique was adopted to analyse the reason for preference of fluid milk by household consumer. From the study it was clear that almost all the households irrespective of the income and other socio-economic factors, preferred fluid milk. Nutritive value was found to be the most important reason for preference of fluid milk. The other reason for preference of fluid milk were taste, quality, availability, price and satisfaction. The consumption of fluid milk was found to be dependent over several socio- economic factors such as education, income, gender etc. The differences in consumption behaviour of the consumers provide an important inference to marketing and promotion strategies of dairy/ food products. Different promotion strategies based on different consumption determinants are perhaps necessary for effective marketing in a specific area

    Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners

    Get PDF
    In this work, we propose a Multi-Window Masked Autoencoder (MW-MAE) fitted with a novel Multi-Window Multi-Head Attention (MW-MHA) module that facilitates the modelling of local-global interactions in every decoder transformer block through attention heads of several distinct local and global windows. Empirical results on ten downstream audio tasks show that MW-MAEs consistently outperform standard MAEs in overall performance and learn better general-purpose audio representations, along with demonstrating considerably better scaling characteristics. Investigating attention distances and entropies reveals that MW-MAE encoders learn heads with broader local and global attention. Analyzing attention head feature representations through Projection Weighted Canonical Correlation Analysis (PWCCA) shows that attention heads with the same window sizes across the decoder layers of the MW-MAE learn correlated feature representations which enables each block to independently capture local and global information, leading to a decoupled decoder feature hierarchy. Code for feature extraction and downstream experiments along with pre-trained models will be released publically

    Masked Autoencoders with Multi-Window Attention Are Better Audio Learners

    Full text link
    Several recent works have adapted Masked Autoencoders (MAEs) for learning general-purpose audio representations. However, they do not address two key aspects of modelling multi-domain audio data: (i) real-world audio tasks consist of a combination of local+global contexts, and (ii) real-world audio signals are complex compositions of several acoustic elements with different time-frequency characteristics. To address these concerns, this work proposes a Multi-Window Masked Autoencoder (MW-MAE) fitted with a novel Multi-Window Multi-Head Attention module that can capture information at multiple local and global contexts in every decoder transformer block through attention heads of several distinct local and global windows. Empirical results on ten downstream audio tasks show that MW-MAEs consistently outperform standard MAEs in overall performance and learn better general-purpose audio representations, as well as demonstrate considerably better scaling characteristics. Exploratory analyses of the learned representations reveals that MW-MAE encoders learn attention heads with more distinct entropies compared to those learned by MAEs, while attention heads across the different transformer blocks in MW-MAE decoders learn correlated feature representations, enabling each block to independently capture local and global information, leading to a decoupled feature hierarchy. Code for feature extraction and downstream experiments along with pre-trained weights can be found at https://github.com/10997NeurIPS23/10997_mwmae

    Federated learning enables big data for rare cancer boundary detection.

    Get PDF
    Although machine learning (ML) has shown promise across disciplines, out-of-sample generalizability is concerning. This is currently addressed by sharing multi-site data, but such centralization is challenging/infeasible to scale due to various limitations. Federated ML (FL) provides an alternative paradigm for accurate and generalizable ML, by only sharing numerical model updates. Here we present the largest FL study to-date, involving data from 71 sites across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, reporting the largest such dataset in the literature (n = 6, 314). We demonstrate a 33% delineation improvement for the surgically targetable tumor, and 23% for the complete tumor extent, over a publicly trained model. We anticipate our study to: 1) enable more healthcare studies informed by large diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further analyses for glioblastoma by releasing our consensus model, and 3) demonstrate the FL effectiveness at such scale and task-complexity as a paradigm shift for multi-site collaborations, alleviating the need for data-sharing

    Author Correction: Federated learning enables big data for rare cancer boundary detection.

    Get PDF
    10.1038/s41467-023-36188-7NATURE COMMUNICATIONS14
    corecore