9 research outputs found

    Improved Distributed Minimum Variance Distortionless Response (MVDR) Beamforming Method Based on a Local Average Consensus Algorithm for Bird Audio Enhancement in Wireless Acoustic Sensor Networks

    No full text
    Currently, wireless acoustic sensor networks (WASN) are commonly used for wild bird monitoring. To better realize the automatic identification of birds during monitoring, the enhancement of bird audio is essential in nature. Currently, distributed beamformer is the most suitable method for bird audio enhancement of WASN. However, there are still several disadvantages of this method, such as large noise residue and slow convergence rate. To overcome these shortcomings, an improved distributed minimum variance distortionless response (IDMVDR) beamforming method for bird audio enhancement in WASN is proposed in this paper. In this method, the average metropolis weight local average consensus algorithm is first introduced to increase the consensus convergence rate, then a continuous spectrum update algorithm is proposed to estimate the noise power spectral density (PSD) to improve the noise reduction performance. Lastly, an MVDR beamformer is introduced to enhance the bird audio. Four different network topologies of the WASNs were considered, and the bird audio enhancement was performed on these WASNs to validate the effectiveness of the proposed method. Compared with two classical methods, the results show that the Segmental signal to noise ratio (SegSNR), mean square error (MSE), and perceptual evaluation of speech quality (PESQ) obtained by the proposed method are better and the consensus rate is faster, which means that the proposed method performs better in audio quality and convergence rate, and therefore it is suitable for WASN with dynamic topology

    An Image Object Detection Model Based on Mixed Attention Mechanism Optimized YOLOv5

    No full text
    As one of the more difficult problems in the field of computer vision, utilizing object image detection technology in a complex environment includes other key technologies, such as pattern recognition, artificial intelligence, and digital image processing. However, because an environment can be complex, changeable, highly different, and easily confused with the target, the target is easily affected by other factors, such as insufficient light, partial occlusion, background interference, etc., making the detection of multiple targets extremely difficult and the robustness of the algorithm low. How to make full use of the rich spatial information and deep texture information in an image to accurately identify the target type and location is an urgent problem to be solved. The emergence of deep neural networks provides an effective way for image feature extraction and full utilization. By aiming at the above problems, this paper proposes an object detection model based on the mixed attention mechanism optimization of YOLOv5 (MAO-YOLOv5). The proposed method fuses the local features and global features in an image so as to better enrich the expression ability of the feature map and more effectively detect objects with large differences in size within the image. Then, the attention mechanism is added to the feature map to weigh each channel, enhance the key features, remove the redundant features, and improve the recognition ability of the feature network towards the target object and background. The results show that the proposed network model has higher precision and a faster running speed and can perform better in object-detection tasks

    Bird Species Identification Using Spectrogram Based on Multi-Channel Fusion of DCNNs

    No full text
    Deep convolutional neural networks (DCNNs) have achieved breakthrough performance on bird species identification using a spectrogram of bird vocalization. Aiming at the imbalance of the bird vocalization dataset, a single feature identification model (SFIM) with residual blocks and modified, weighted, cross-entropy function was proposed. To further improve the identification accuracy, two multi-channel fusion methods were built with three SFIMs. One of these fused the outputs of the feature extraction parts of three SFIMs (feature fusion mode), the other fused the outputs of the classifiers of three SFIMs (result fusion mode). The SFIMs were trained with three different kinds of spectrograms, which were calculated through short-time Fourier transform, mel-frequency cepstrum transform and chirplet transform, respectively. To overcome the shortage of the huge number of trainable model parameters, transfer learning was used in the multi-channel models. Using our own vocalization dataset as a sample set, it is found that the result fusion mode model outperforms the other proposed models, the best mean average precision (MAP) reaches 0.914. Choosing three durations of spectrograms, 100 ms, 300 ms and 500 ms for comparison, the results reveal that the 300 ms duration is the best for our own dataset. The duration is suggested to be determined based on the duration distribution of bird syllables. As for the performance with the training dataset of BirdCLEF2019, the highest classification mean average precision (cmAP) reached 0.135, which means the proposed model has certain generalization ability

    Cross-corpus open set bird species recognition by vocalization

    No full text
    In the wild, bird vocalizations of the same species across different populations may be different (e.g., so called dialect). Besides, the number of species is unknown in advance. These two facts make the task of bird species recognition based on vocalization a challenging one. This study treats this task as an open set recognition (OSR) cross-corpus scenario. We propose Instance Frequency Normalization (IFN) to remove instance-specific differences across different corpora. Furthermore, an x-vector feature extraction model integrated Time Delay Neural Network (TDNN) and Long Short-Term Memory (LSTM) are designed to better capture sequence information. Finally, the threshold-based Probabilistic Linear Discriminant Analysis (PLDA) is introduced to discriminate the extracted x-vector features to discover the unknown classes. When compared to the best results of the existing method, the average ACCs for the single-corpus and cross-corpus experiments are improved, implying that our method can provide a potential solution and improve performance for cross-corpus bird species recognition based on vocalization in open set condition

    Retrieval of Live Fuel Moisture Content Based on Multi-Source Remote Sensing Data and Ensemble Deep Learning Model

    No full text
    Live fuel moisture content (LFMC) is an important index used to evaluate the wildfire risk and fire spread rate. In order to further improve the retrieval accuracy, two ensemble models combining deep learning models were proposed. One is a stacking ensemble model based on LSTM, TCN and LSTM-TCN models, and the other is an Adaboost ensemble model based on the LSTM-TCN model. Measured LFMC data, MODIS, Landsat-8, Sentinel-1 remote sensing data and auxiliary data such as canopy height and land cover of the forest-fire-prone areas in the Western United States, were selected for our study, and the retrieval results of different models with different groups of remote sensing data were compared. The results show that using multi-source data can integrate the advantages of different types of remote sensing data, resulting in higher accuracy of LFMC retrieval than that of single-source remote sensing data. The ensemble models can better extract the nonlinear relationship between LFMC and remote sensing data, and the stacking ensemble model with all the MODIS, Landsat-8 and Sentinel-1 remote sensing data achieved the best LFMC retrieval results, with R2  = 0.85, RMSE = 18.88 and ubRMSE = 17.99. The proposed stacking ensemble model is more suitable for LFMC retrieval than the existing method
    corecore