4,475 research outputs found

    Noisy-ArcMix: Additive Noisy Angular Margin Loss Combined With Mixup Anomalous Sound Detection

    Full text link
    Unsupervised anomalous sound detection (ASD) aims to identify anomalous sounds by learning the features of normal operational sounds and sensing their deviations. Recent approaches have focused on the self-supervised task utilizing the classification of normal data, and advanced models have shown that securing representation space for anomalous data is important through representation learning yielding compact intra-class and well-separated intra-class distributions. However, we show that conventional approaches often fail to ensure sufficient intra-class compactness and exhibit angular disparity between samples and their corresponding centers. In this paper, we propose a training technique aimed at ensuring intra-class compactness and increasing the angle gap between normal and abnormal samples. Furthermore, we present an architecture that extracts features for important temporal regions, enabling the model to learn which time frames should be emphasized or suppressed. Experimental results demonstrate that the proposed method achieves the best performance giving 0.90%, 0.83%, and 2.16% improvement in terms of AUC, pAUC, and mAUC, respectively, compared to the state-of-the-art method on DCASE 2020 Challenge Task2 dataset.Comment: Submitted to ICASSP 202

    DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel Speech Enhancement

    Full text link
    In this study, we propose a dense frequency-time attentive network (DeFT-AN) for multichannel speech enhancement. DeFT-AN is a mask estimation network that predicts a complex spectral masking pattern for suppressing the noise and reverberation embedded in the short-time Fourier transform (STFT) of an input signal. The proposed mask estimation network incorporates three different types of blocks for aggregating information in the spatial, spectral, and temporal dimensions. It utilizes a spectral transformer with a modified feed-forward network and a temporal conformer with sequential dilated convolutions. The use of dense blocks and transformers dedicated to the three different characteristics of audio signals enables more comprehensive enhancement in noisy and reverberant environments. The remarkable performance of DeFT-AN over state-of-the-art multichannel models is demonstrated based on two popular noisy and reverberant datasets in terms of various metrics for speech quality and intelligibility.Comment: 5 pages, 2 figures, 3 table

    RGI-Net: 3D Room Geometry Inference from Room Impulse Responses in the Absence of First-order Echoes

    Full text link
    Room geometry is important prior information for implementing realistic 3D audio rendering. For this reason, various room geometry inference (RGI) methods have been developed by utilizing the time of arrival (TOA) or time difference of arrival (TDOA) information in room impulse responses. However, the conventional RGI technique poses several assumptions, such as convex room shapes, the number of walls known in priori, and the visibility of first-order reflections. In this work, we introduce the deep neural network (DNN), RGI-Net, which can estimate room geometries without the aforementioned assumptions. RGI-Net learns and exploits complex relationships between high-order reflections in room impulse responses (RIRs) and, thus, can estimate room shapes even when the shape is non-convex or first-order reflections are missing in the RIRs. The network takes RIRs measured from a compact audio device equipped with a circular microphone array and a single loudspeaker, which greatly improves its practical applicability. RGI-Net includes the evaluation network that separately evaluates the presence probability of walls, so the geometry inference is possible without prior knowledge of the number of walls.Comment: 5 pages, 3 figures, 3 table

    Statistical Analysis of the Metropolitan Seoul Subway System: Network Structure and Passenger Flows

    Full text link
    The Metropolitan Seoul Subway system, consisting of 380 stations, provides the major transportation mode in the metropolitan Seoul area. Focusing on the network structure, we analyze statistical properties and topological consequences of the subway system. We further study the passenger flows on the system, and find that the flow weight distribution exhibits a power-law behavior. In addition, the degree distribution of the spanning tree of the flows also follows a power law.Comment: 10 pages, 4 figure

    Sleepless in Seoul: `The Ant and the Metrohopper'

    Full text link
    One of Aesop's (La Fontain's) famous fables `The Ant and the Grasshopper' is widely known to give a moral lesson through comparison between the hard working ant and the party-loving grasshopper. Here we show a slightly different version of this fable, namely, "The Ant and the Metrohopper," which describes human mobility patterns in modern urban life. Numerous real transportation networks and the trajectory data have been studied in order to understand mobility patterns. We study trajectories of commuters on the public transportation of Metropolitan Seoul, Korea. Smart cards (Integrated Circuit Cards; ICCs) are used in the public transportation system, which allow collection of transit transaction data, including departure and arrival stations and time. This empirical analysis provides human mobility patterns, which impact traffic forecasting and transportation optimization, as well as urban planning.Comment: to be appeared in Journal of the Korean Physical Societ
    • โ€ฆ
    corecore