16,979 research outputs found
Audio-visual multi-modality driven hybrid feature learning model for crowd analysis and classification
The high pace emergence in advanced software systems, low-cost hardware and decentralized cloud computing technologies have broadened the horizon for vision-based surveillance, monitoring and control. However, complex and inferior feature learning over visual artefacts or video streams, especially under extreme conditions confine majority of the at-hand vision-based crowd analysis and classification systems. Retrieving event-sensitive or crowd-type sensitive spatio-temporal features for the different crowd types under extreme conditions is a highly complex task. Consequently, it results in lower accuracy and hence low reliability that confines existing methods for real-time crowd analysis. Despite numerous efforts in vision-based approaches, the lack of acoustic cues often creates ambiguity in crowd classification. On the other hand, the strategic amalgamation of audio-visual features can enable accurate and reliable crowd analysis and classification. Considering it as motivation, in this research a novel audio-visual multi-modality driven hybrid feature learning model is developed for crowd analysis and classification. In this work, a hybrid feature extraction model was applied to extract deep spatio-temporal features by using Gray-Level Co-occurrence Metrics (GLCM) and AlexNet transferrable learning model. Once extracting the different GLCM features and AlexNet deep features, horizontal concatenation was done to fuse the different feature sets. Similarly, for acoustic feature extraction, the audio samples (from the input video) were processed for static (fixed size) sampling, pre-emphasis, block framing and Hann windowing, followed by acoustic feature extraction like GTCC, GTCC-Delta, GTCC-Delta-Delta, MFCC, Spectral Entropy, Spectral Flux, Spectral Slope and Harmonics to Noise Ratio (HNR). Finally, the extracted audio-visual features were fused to yield a composite multi-modal feature set, which is processed for classification using the random forest ensemble classifier. The multi-class classification yields a crowd-classification accurac12529y of (98.26%), precision (98.89%), sensitivity (94.82%), specificity (95.57%), and F-Measure of 98.84%. The robustness of the proposed multi-modality-based crowd analysis model confirms its suitability towards real-world crowd detection and classification tasks
Predicting extreme events in a data-driven model of turbulent shear flow using an atlas of charts
Dynamical systems with extreme events are difficult to capture with
data-driven modeling, due to the relative scarcity of data within extreme
events compared to the typical dynamics of the system, and the strong
dependence of the long-time occurrence of extreme events on short-time
conditions.A recently developed technique [Floryan, D. & Graham, M. D.
Data-driven discovery of intrinsic dynamics. Nat Mach Intell ,
1113-1120 (2022)], here denoted as , or CANDyMan, overcomes these difficulties
by decomposing the time series into separate charts based on data similarity,
learning dynamical models on each chart via individual time-mapping neural
networks, then stitching the charts together to create a single atlas to yield
a global dynamical model. We apply CANDyMan to a nine-dimensional model of
turbulent shear flow between infinite parallel free-slip walls under a
sinusoidal body force [Moehlis, J., Faisst, H. & Eckhardt, B. A low-dimensional
model for turbulent shear flows. New J Phys , 56 (2004)], which
undergoes extreme events in the form of intermittent quasi-laminarization and
long-time full laminarization. We demonstrate that the CANDyMan method allows
the trained dynamical models to more accurately forecast the evolution of the
model coefficients, reducing the error in the predictions as the model evolves
forward in time. The technique exhibits more accurate predictions of extreme
events, capturing the frequency of quasi-laminarization events and predicting
the time until full laminarization more accurately than a single neural
network.Comment: 9 pages, 7 figure
Depth Functions for Partial Orders with a Descriptive Analysis of Machine Learning Algorithms
We propose a framework for descriptively analyzing sets of partial orders
based on the concept of depth functions. Despite intensive studies of depth
functions in linear and metric spaces, there is very little discussion on depth
functions for non-standard data types such as partial orders. We introduce an
adaptation of the well-known simplicial depth to the set of all partial orders,
the union-free generic (ufg) depth. Moreover, we utilize our ufg depth for a
comparison of machine learning algorithms based on multidimensional performance
measures. Concretely, we analyze the distribution of different classifier
performances over a sample of standard benchmark data sets. Our results
promisingly demonstrate that our approach differs substantially from existing
benchmarking approaches and, therefore, adds a new perspective to the vivid
debate on the comparison of classifiers.Comment: Accepted to ISIPTA 2023; Forthcoming in: Proceedings of Machine
Learning Researc
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability
Large vision-language models have achieved outstanding performance, but their
size and computational requirements make their deployment on
resource-constrained devices and time-sensitive tasks impractical. Model
distillation, the process of creating smaller, faster models that maintain the
performance of larger models, is a promising direction towards the solution.
This paper investigates the distillation of visual representations in large
teacher vision-language models into lightweight student models using a small-
or mid-scale dataset. Notably, this study focuses on open-vocabulary
out-of-distribution (OOD) generalization, a challenging problem that has been
overlooked in previous model distillation literature. We propose two principles
from vision and language modality perspectives to enhance student's OOD
generalization: (1) by better imitating teacher's visual representation space,
and carefully promoting better coherence in vision-language alignment with the
teacher; (2) by enriching the teacher's language representations with
informative and finegrained semantic attributes to effectively distinguish
between different labels. We propose several metrics and conduct extensive
experiments to investigate their techniques. The results demonstrate
significant improvements in zero-shot and few-shot student performance on
open-vocabulary out-of-distribution classification, highlighting the
effectiveness of our proposed approaches. Our code will be released at
https://github.com/xuanlinli17/large_vlm_distillation_oo
Semantic-aware Transmission for Robust Point Cloud Classification
As three-dimensional (3D) data acquisition devices become increasingly
prevalent, the demand for 3D point cloud transmission is growing. In this
study, we introduce a semantic-aware communication system for robust point
cloud classification that capitalizes on the advantages of pre-trained
Point-BERT models. Our proposed method comprises four main components: the
semantic encoder, channel encoder, channel decoder, and semantic decoder. By
employing a two-stage training strategy, our system facilitates efficient and
adaptable learning tailored to the specific classification tasks. The results
show that the proposed system achieves classification accuracy of over 89\%
when SNR is higher than 10 dB and still maintains accuracy above 66.6\% even at
SNR of 4 dB. Compared to the existing method, our approach performs at 0.8\% to
48\% better across different SNR values, demonstrating robustness to channel
noise. Our system also achieves a balance between accuracy and speed, being
computationally efficient while maintaining high classification performance
under noisy channel conditions. This adaptable and resilient approach holds
considerable promise for a wide array of 3D scene understanding applications,
effectively addressing the challenges posed by channel noise.Comment: submitted to globecom 202
Convolutional neural network based on photoplethysmography signals for sleep apnea syndrome detection
IntroductionThe current method of monitoring sleep disorders is complex, time-consuming, and uncomfortable, although it can provide scientifc guidance to ensure worldwide sleep quality. This study aims to seek a comfortable and convenient method for identifying sleep apnea syndrome.MethodsIn this work, a one-dimensional convolutional neural network model was established. To classify this condition, the model was trained with the photoplethysmographic (PPG) signals of 20 healthy people and 39 sleep apnea syndrome (SAS) patients, and the influence of noise on the model was tested by anti-interference experiments.Results and DiscussionThe results showed that the accuracy of the model for SAS classifcation exceeds 90%, and it has some antiinterference ability. This paper provides a SAS detection method based on PPG signals, which is helpful for portable wearable detection
FedForgery: Generalized Face Forgery Detection with Residual Federated Learning
With the continuous development of deep learning in the field of image
generation models, a large number of vivid forged faces have been generated and
spread on the Internet. These high-authenticity artifacts could grow into a
threat to society security. Existing face forgery detection methods directly
utilize the obtained public shared or centralized data for training but ignore
the personal privacy and security issues when personal data couldn't be
centralizedly shared in real-world scenarios. Additionally, different
distributions caused by diverse artifact types would further bring adverse
influences on the forgery detection task. To solve the mentioned problems, the
paper proposes a novel generalized residual Federated learning for face Forgery
detection (FedForgery). The designed variational autoencoder aims to learn
robust discriminative residual feature maps to detect forgery faces (with
diverse or even unknown artifact types). Furthermore, the general federated
learning strategy is introduced to construct distributed detection model
trained collaboratively with multiple local decentralized devices, which could
further boost the representation generalization. Experiments conducted on
publicly available face forgery detection datasets prove the superior
performance of the proposed FedForgery. The designed novel generalized face
forgery detection protocols and source code would be publicly available.Comment: The code is available at https://github.com/GANG370/FedForgery. The
paper has been accepted in the IEEE Transactions on Information Forensics &
Securit
TabR: Unlocking the Power of Retrieval-Augmented Tabular Deep Learning
Deep learning (DL) models for tabular data problems are receiving
increasingly more attention, while the algorithms based on gradient-boosted
decision trees (GBDT) remain a strong go-to solution. Following the recent
trends in other domains, such as natural language processing and computer
vision, several retrieval-augmented tabular DL models have been recently
proposed. For a given target object, a retrieval-based model retrieves other
relevant objects, such as the nearest neighbors, from the available (training)
data and uses their features or even labels to make a better prediction.
However, we show that the existing retrieval-based tabular DL solutions provide
only minor, if any, benefits over the properly tuned simple retrieval-free
baselines. Thus, it remains unclear whether the retrieval-based approach is a
worthy direction for tabular DL.
In this work, we give a strong positive answer to this question. We start by
incrementally augmenting a simple feed-forward architecture with an
attention-like retrieval component similar to those of many (tabular)
retrieval-based models. Then, we highlight several details of the attention
mechanism that turn out to have a massive impact on the performance on tabular
data problems, but that were not explored in prior work. As a result, we design
TabR -- a simple retrieval-based tabular DL model which, on a set of public
benchmarks, demonstrates the best average performance among tabular DL models,
becomes the new state-of-the-art on several datasets, and even outperforms GBDT
models on the recently proposed ``GBDT-friendly'' benchmark (see the first
figure).Comment: Code: https://github.com/yandex-research/tabular-dl-tab
Statistical Estimation for Covariance Structures with Tail Estimates using Nodewise Quantile Predictive Regression Models
This paper considers the specification of covariance structures with tail
estimates. We focus on two aspects: (i) the estimation of the VaR-CoVaR risk
matrix in the case of larger number of time series observations than assets in
a portfolio using quantile predictive regression models without assuming the
presence of nonstationary regressors and; (ii) the construction of a novel
variable selection algorithm, so-called, Feature Ordering by Centrality
Exclusion (FOCE), which is based on an assumption-lean regression framework,
has no tuning parameters and is proved to be consistent under general sparsity
assumptions. We illustrate the usefulness of our proposed methodology with
numerical studies of real and simulated datasets when modelling systemic risk
in a network
- …