Generative self-supervised learning for seismic event classification

Abstract

Deep learning has been widely applied to seismic signal classification, predominantly through supervised learning, typically relying on large labeled datasets. However, since the process of labeling large volumes of seismic data by domain experts is time-consuming and prone to human error, labeled seismic datasets are scarce. To address the problem of limited labeled data availability, a novel approach for seismic event classification is proposed employing self-supervised learning techniques. Initially, a generative-based self-supervised learning model, specifically an auto-encoder, is designed to extract informative features from the Short Time Fourier Transform of seismic recordings. These features are classified into four categories: earthquakes, micro-earthquakes, rockfalls, and anthropogenic noise. Classification is performed using (a) unsupervised K-means clustering on unlabeled data and (b) semi-supervised approaches, where only 5 to 33.3% of the data are labeled. The proposed semi-supervised method achieves high performance on a publicly available Résif dataset with recall of 0.90 for earthquakes, 0.65 for micro-earthquakes, 0.91 for rockfalls, and 0.84 for noise signals when trained with 20% of the labeled data. Additionally, we introduce a novel method to improve data labeling efficiency by using Self-Organizing Maps to cluster features from large datasets into multiple nodes. Our results demonstrate that the experts can more effectively and confidently label a small number of nodes instead of labeling all the events in the large dataset, thereby reducing the experts’ workload to just 4.6% of the original effort and our study reveals that this approach provides an excellent trade-off between expert labeling effort and classification accuracy, making it a highly effective solution for seismic event labeling. To evaluate the generalization capability of our proposed self-supervised learning model, we tested it on two unseen seismic datasets: the globally distributed Stanford Earthquake Dataset and the regionally focused Pacific Northwest Curated Seismic Dataset. On Stanford Earthquake Dataset, the pre-trained model effectively extracted discriminative earthquake and noise features, achieving high clustering accuracies. The Pacific Northwest Curated Seismic Dataset further challenges generalization with heterogeneous and previously unseen event types such as explosions, and thunder. Despite this diversity, the pre-trained model still preserved meaningful feature separability and captured inter-class relationships among acoustically similar events. Overall, these findings highlight the model’s ability to generalize effectively across both global and regional seismic datasets, underscoring its potential for wide deployment in seismological monitoring and event characterization without extensive retraining

Similar works

Full text

thumbnail-image

University of Strathclyde Institutional Repository

redirect
Last time updated on 01/12/2025

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.