20 research outputs found
Interpreting intermediate feature representations of raw-waveform deep CNNs by sonification
The majority of the recent works that address the interpretability of raw waveform based deep neural networks (DNNs) for audio processing focus on interpreting spectral and frequency response information, often limiting to visual and signal theoretic means of interpretation, solely for the first layer. This work proposes sonification, a method to interpret intermediate feature representations of sound event recognition (SER) 1D-convolutional neural networks (1D-CNNs) trained on raw waveforms by mapping these representations back into the discrete-time input signal domain, highlighting substructures in the input that maximally activate a feature map as intelligible acoustic events. Sonification is used to compare supervised and contrastive self-supervised feature representations, observing how the latter learn more acoustically discernible representations, especially in the deeper layers. A metric to quantify acoustic similarity between the interpretations and their corresponding inputs is proposed, and a layer-by-layer analysis of the trained feature representations using this metric supports the observations made
Analysing the consumer preference of fluid milk in province no. 2 of Nepal
Information is an asset for any industry. Some information such as the consumer preference is hidden deep in the mind of the consumer which is difficult to access. Studies have revealed that the consumer preferences can be measured effectively and their research may provide a deeper understanding of the choices that consumers make when deciding to select an offer against another. Milk is one of the major components of diet for the people around the globe. The demand for milk and other dairy products is generally income elastic. The marketing of fluid milk is not similar as compared to other consumer-based goods. The demand for milk and milk products depend considerably on the consumption pattern, food habits, geographical region, urbanization and life style. The study was conducted to analyse the consumer preference of fluid milk in Province no. 2 of Nepal. Rautahat and Saptari districts from Province no. 2 were selected for the study. The total sample size of 180 household was selected for study but data from 159 households was only taken for consideration. Consumer preference was analysed using tabular and percentage analysis. Garret’s ranking technique was adopted to analyse the reason for preference of fluid milk by household consumer. From the study it was clear that almost all the households irrespective of the income and other socio-economic factors, preferred fluid milk. Nutritive value was found to be the most important reason for preference of fluid milk. The other reason for preference of fluid milk were taste, quality, availability, price and satisfaction. The consumption of fluid milk was found to be dependent over several socio- economic factors such as education, income, gender etc. The differences in consumption behaviour of the consumers provide an important inference to marketing and promotion strategies of dairy/ food products. Different promotion strategies based on different consumption determinants are perhaps necessary for effective marketing in a specific area
Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners
In this work, we propose a Multi-Window Masked Autoencoder (MW-MAE) fitted with a novel Multi-Window Multi-Head Attention (MW-MHA) module that facilitates the modelling of local-global interactions in every decoder transformer block through attention heads of several distinct local and global windows. Empirical results on ten downstream audio tasks show that MW-MAEs consistently outperform standard MAEs in overall performance and learn better general-purpose audio representations, along with demonstrating considerably better scaling characteristics. Investigating attention distances and entropies reveals that MW-MAE encoders learn heads with broader local and global attention. Analyzing attention head feature representations through Projection Weighted Canonical Correlation Analysis (PWCCA) shows that attention heads with the same window sizes across the decoder layers of the MW-MAE learn correlated feature representations which enables each block to independently capture local and global information, leading to a decoupled decoder feature hierarchy. Code for feature extraction and downstream experiments along with pre-trained models will be released publically
Masked Autoencoders with Multi-Window Attention Are Better Audio Learners
Several recent works have adapted Masked Autoencoders (MAEs) for learning
general-purpose audio representations. However, they do not address two key
aspects of modelling multi-domain audio data: (i) real-world audio tasks
consist of a combination of local+global contexts, and (ii) real-world audio
signals are complex compositions of several acoustic elements with different
time-frequency characteristics. To address these concerns, this work proposes a
Multi-Window Masked Autoencoder (MW-MAE) fitted with a novel Multi-Window
Multi-Head Attention module that can capture information at multiple local and
global contexts in every decoder transformer block through attention heads of
several distinct local and global windows. Empirical results on ten downstream
audio tasks show that MW-MAEs consistently outperform standard MAEs in overall
performance and learn better general-purpose audio representations, as well as
demonstrate considerably better scaling characteristics. Exploratory analyses
of the learned representations reveals that MW-MAE encoders learn attention
heads with more distinct entropies compared to those learned by MAEs, while
attention heads across the different transformer blocks in MW-MAE decoders
learn correlated feature representations, enabling each block to independently
capture local and global information, leading to a decoupled feature hierarchy.
Code for feature extraction and downstream experiments along with pre-trained
weights can be found at https://github.com/10997NeurIPS23/10997_mwmae
Federated learning enables big data for rare cancer boundary detection.
Although machine learning (ML) has shown promise across disciplines, out-of-sample generalizability is concerning. This is currently addressed by sharing multi-site data, but such centralization is challenging/infeasible to scale due to various limitations. Federated ML (FL) provides an alternative paradigm for accurate and generalizable ML, by only sharing numerical model updates. Here we present the largest FL study to-date, involving data from 71 sites across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, reporting the largest such dataset in the literature (n = 6, 314). We demonstrate a 33% delineation improvement for the surgically targetable tumor, and 23% for the complete tumor extent, over a publicly trained model. We anticipate our study to: 1) enable more healthcare studies informed by large diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further analyses for glioblastoma by releasing our consensus model, and 3) demonstrate the FL effectiveness at such scale and task-complexity as a paradigm shift for multi-site collaborations, alleviating the need for data-sharing
Author Correction: Federated learning enables big data for rare cancer boundary detection.
10.1038/s41467-023-36188-7NATURE COMMUNICATIONS14