16,979 research outputs found

    Audio-visual multi-modality driven hybrid feature learning model for crowd analysis and classification

    Get PDF
    The high pace emergence in advanced software systems, low-cost hardware and decentralized cloud computing technologies have broadened the horizon for vision-based surveillance, monitoring and control. However, complex and inferior feature learning over visual artefacts or video streams, especially under extreme conditions confine majority of the at-hand vision-based crowd analysis and classification systems. Retrieving event-sensitive or crowd-type sensitive spatio-temporal features for the different crowd types under extreme conditions is a highly complex task. Consequently, it results in lower accuracy and hence low reliability that confines existing methods for real-time crowd analysis. Despite numerous efforts in vision-based approaches, the lack of acoustic cues often creates ambiguity in crowd classification. On the other hand, the strategic amalgamation of audio-visual features can enable accurate and reliable crowd analysis and classification. Considering it as motivation, in this research a novel audio-visual multi-modality driven hybrid feature learning model is developed for crowd analysis and classification. In this work, a hybrid feature extraction model was applied to extract deep spatio-temporal features by using Gray-Level Co-occurrence Metrics (GLCM) and AlexNet transferrable learning model. Once extracting the different GLCM features and AlexNet deep features, horizontal concatenation was done to fuse the different feature sets. Similarly, for acoustic feature extraction, the audio samples (from the input video) were processed for static (fixed size) sampling, pre-emphasis, block framing and Hann windowing, followed by acoustic feature extraction like GTCC, GTCC-Delta, GTCC-Delta-Delta, MFCC, Spectral Entropy, Spectral Flux, Spectral Slope and Harmonics to Noise Ratio (HNR). Finally, the extracted audio-visual features were fused to yield a composite multi-modal feature set, which is processed for classification using the random forest ensemble classifier. The multi-class classification yields a crowd-classification accurac12529y of (98.26%), precision (98.89%), sensitivity (94.82%), specificity (95.57%), and F-Measure of 98.84%. The robustness of the proposed multi-modality-based crowd analysis model confirms its suitability towards real-world crowd detection and classification tasks

    Predicting extreme events in a data-driven model of turbulent shear flow using an atlas of charts

    Full text link
    Dynamical systems with extreme events are difficult to capture with data-driven modeling, due to the relative scarcity of data within extreme events compared to the typical dynamics of the system, and the strong dependence of the long-time occurrence of extreme events on short-time conditions.A recently developed technique [Floryan, D. & Graham, M. D. Data-driven discovery of intrinsic dynamics. Nat Mach Intell 4\textbf{4}, 1113-1120 (2022)], here denoted as Charts and Atlases for Nonlinear Data-Driven Dynamics on Manifolds\textit{Charts and Atlases for Nonlinear Data-Driven Dynamics on Manifolds}, or CANDyMan, overcomes these difficulties by decomposing the time series into separate charts based on data similarity, learning dynamical models on each chart via individual time-mapping neural networks, then stitching the charts together to create a single atlas to yield a global dynamical model. We apply CANDyMan to a nine-dimensional model of turbulent shear flow between infinite parallel free-slip walls under a sinusoidal body force [Moehlis, J., Faisst, H. & Eckhardt, B. A low-dimensional model for turbulent shear flows. New J Phys 6\textbf{6}, 56 (2004)], which undergoes extreme events in the form of intermittent quasi-laminarization and long-time full laminarization. We demonstrate that the CANDyMan method allows the trained dynamical models to more accurately forecast the evolution of the model coefficients, reducing the error in the predictions as the model evolves forward in time. The technique exhibits more accurate predictions of extreme events, capturing the frequency of quasi-laminarization events and predicting the time until full laminarization more accurately than a single neural network.Comment: 9 pages, 7 figure

    Depth Functions for Partial Orders with a Descriptive Analysis of Machine Learning Algorithms

    Full text link
    We propose a framework for descriptively analyzing sets of partial orders based on the concept of depth functions. Despite intensive studies of depth functions in linear and metric spaces, there is very little discussion on depth functions for non-standard data types such as partial orders. We introduce an adaptation of the well-known simplicial depth to the set of all partial orders, the union-free generic (ufg) depth. Moreover, we utilize our ufg depth for a comparison of machine learning algorithms based on multidimensional performance measures. Concretely, we analyze the distribution of different classifier performances over a sample of standard benchmark data sets. Our results promisingly demonstrate that our approach differs substantially from existing benchmarking approaches and, therefore, adds a new perspective to the vivid debate on the comparison of classifiers.Comment: Accepted to ISIPTA 2023; Forthcoming in: Proceedings of Machine Learning Researc

    Distilling Large Vision-Language Model with Out-of-Distribution Generalizability

    Full text link
    Large vision-language models have achieved outstanding performance, but their size and computational requirements make their deployment on resource-constrained devices and time-sensitive tasks impractical. Model distillation, the process of creating smaller, faster models that maintain the performance of larger models, is a promising direction towards the solution. This paper investigates the distillation of visual representations in large teacher vision-language models into lightweight student models using a small- or mid-scale dataset. Notably, this study focuses on open-vocabulary out-of-distribution (OOD) generalization, a challenging problem that has been overlooked in previous model distillation literature. We propose two principles from vision and language modality perspectives to enhance student's OOD generalization: (1) by better imitating teacher's visual representation space, and carefully promoting better coherence in vision-language alignment with the teacher; (2) by enriching the teacher's language representations with informative and finegrained semantic attributes to effectively distinguish between different labels. We propose several metrics and conduct extensive experiments to investigate their techniques. The results demonstrate significant improvements in zero-shot and few-shot student performance on open-vocabulary out-of-distribution classification, highlighting the effectiveness of our proposed approaches. Our code will be released at https://github.com/xuanlinli17/large_vlm_distillation_oo

    Semantic-aware Transmission for Robust Point Cloud Classification

    Full text link
    As three-dimensional (3D) data acquisition devices become increasingly prevalent, the demand for 3D point cloud transmission is growing. In this study, we introduce a semantic-aware communication system for robust point cloud classification that capitalizes on the advantages of pre-trained Point-BERT models. Our proposed method comprises four main components: the semantic encoder, channel encoder, channel decoder, and semantic decoder. By employing a two-stage training strategy, our system facilitates efficient and adaptable learning tailored to the specific classification tasks. The results show that the proposed system achieves classification accuracy of over 89\% when SNR is higher than 10 dB and still maintains accuracy above 66.6\% even at SNR of 4 dB. Compared to the existing method, our approach performs at 0.8\% to 48\% better across different SNR values, demonstrating robustness to channel noise. Our system also achieves a balance between accuracy and speed, being computationally efficient while maintaining high classification performance under noisy channel conditions. This adaptable and resilient approach holds considerable promise for a wide array of 3D scene understanding applications, effectively addressing the challenges posed by channel noise.Comment: submitted to globecom 202

    Convolutional neural network based on photoplethysmography signals for sleep apnea syndrome detection

    Get PDF
    IntroductionThe current method of monitoring sleep disorders is complex, time-consuming, and uncomfortable, although it can provide scientifc guidance to ensure worldwide sleep quality. This study aims to seek a comfortable and convenient method for identifying sleep apnea syndrome.MethodsIn this work, a one-dimensional convolutional neural network model was established. To classify this condition, the model was trained with the photoplethysmographic (PPG) signals of 20 healthy people and 39 sleep apnea syndrome (SAS) patients, and the influence of noise on the model was tested by anti-interference experiments.Results and DiscussionThe results showed that the accuracy of the model for SAS classifcation exceeds 90%, and it has some antiinterference ability. This paper provides a SAS detection method based on PPG signals, which is helpful for portable wearable detection

    FedForgery: Generalized Face Forgery Detection with Residual Federated Learning

    Full text link
    With the continuous development of deep learning in the field of image generation models, a large number of vivid forged faces have been generated and spread on the Internet. These high-authenticity artifacts could grow into a threat to society security. Existing face forgery detection methods directly utilize the obtained public shared or centralized data for training but ignore the personal privacy and security issues when personal data couldn't be centralizedly shared in real-world scenarios. Additionally, different distributions caused by diverse artifact types would further bring adverse influences on the forgery detection task. To solve the mentioned problems, the paper proposes a novel generalized residual Federated learning for face Forgery detection (FedForgery). The designed variational autoencoder aims to learn robust discriminative residual feature maps to detect forgery faces (with diverse or even unknown artifact types). Furthermore, the general federated learning strategy is introduced to construct distributed detection model trained collaboratively with multiple local decentralized devices, which could further boost the representation generalization. Experiments conducted on publicly available face forgery detection datasets prove the superior performance of the proposed FedForgery. The designed novel generalized face forgery detection protocols and source code would be publicly available.Comment: The code is available at https://github.com/GANG370/FedForgery. The paper has been accepted in the IEEE Transactions on Information Forensics & Securit

    TabR: Unlocking the Power of Retrieval-Augmented Tabular Deep Learning

    Full text link
    Deep learning (DL) models for tabular data problems are receiving increasingly more attention, while the algorithms based on gradient-boosted decision trees (GBDT) remain a strong go-to solution. Following the recent trends in other domains, such as natural language processing and computer vision, several retrieval-augmented tabular DL models have been recently proposed. For a given target object, a retrieval-based model retrieves other relevant objects, such as the nearest neighbors, from the available (training) data and uses their features or even labels to make a better prediction. However, we show that the existing retrieval-based tabular DL solutions provide only minor, if any, benefits over the properly tuned simple retrieval-free baselines. Thus, it remains unclear whether the retrieval-based approach is a worthy direction for tabular DL. In this work, we give a strong positive answer to this question. We start by incrementally augmenting a simple feed-forward architecture with an attention-like retrieval component similar to those of many (tabular) retrieval-based models. Then, we highlight several details of the attention mechanism that turn out to have a massive impact on the performance on tabular data problems, but that were not explored in prior work. As a result, we design TabR -- a simple retrieval-based tabular DL model which, on a set of public benchmarks, demonstrates the best average performance among tabular DL models, becomes the new state-of-the-art on several datasets, and even outperforms GBDT models on the recently proposed ``GBDT-friendly'' benchmark (see the first figure).Comment: Code: https://github.com/yandex-research/tabular-dl-tab

    Statistical Estimation for Covariance Structures with Tail Estimates using Nodewise Quantile Predictive Regression Models

    Full text link
    This paper considers the specification of covariance structures with tail estimates. We focus on two aspects: (i) the estimation of the VaR-CoVaR risk matrix in the case of larger number of time series observations than assets in a portfolio using quantile predictive regression models without assuming the presence of nonstationary regressors and; (ii) the construction of a novel variable selection algorithm, so-called, Feature Ordering by Centrality Exclusion (FOCE), which is based on an assumption-lean regression framework, has no tuning parameters and is proved to be consistent under general sparsity assumptions. We illustrate the usefulness of our proposed methodology with numerical studies of real and simulated datasets when modelling systemic risk in a network
    corecore