详细信息
Classification of Complicated Urban Forest Acoustic Scenes with Deep Learning Models ( SCI-EXPANDED收录 EI收录) 被引量:8
文献类型:期刊文献
英文题名:Classification of Complicated Urban Forest Acoustic Scenes with Deep Learning Models
作者:Zhang, Chengyun[1] Zhan, Haisong[1] Hao, Zezhou[2] Gao, Xinghui[1]
第一作者:Zhang, Chengyun
通信作者:Gao, XH[1];Hao, ZZ[2]
机构:[1]Guangzhou Univ, Sch Elect & Commun Engn, Guangzhou 510006, Peoples R China;[2]Chinese Acad Forestry, Res Inst Trop Forestry, Guangzhou 510520, Peoples R China
年份:2023
卷号:14
期号:2
外文期刊名:FORESTS
收录:;EI(收录号:20230913655682);Scopus(收录号:2-s2.0-85149037165);WOS:【SCI-EXPANDED(收录号:WOS:000938612100001)】;
基金:This research was funded by the National Natural Science Foundation of China (32171520), the Research Project of the Education Bureau of Guangzhou (No. 202032882), and the National Natural Science Foundation of China (32201338).
语种:英文
外文关键词:acoustic monitoring; acoustic scenes; deep learning; urban forest; urban sound
摘要:The use of passive acoustic monitoring (PAM) can compensate for the shortcomings of traditional survey methods on spatial and temporal scales and achieve all-weather and wide-scale assessment and prediction of environmental dynamics. Assessing the impact of human activities on biodiversity by analyzing the characteristics of acoustic scenes in the environment is a frontier hotspot in urban forestry. However, with the accumulation of monitoring data, the selection and parameter setting of the deep learning model greatly affect the content and efficiency of sound scene classification. This study compared and evaluated the performance of different deep learning models for acoustic scene classification based on the recorded sound data from Guangzhou urban forest. There are seven categories of acoustic scenes for classification: human sound, insect sound, bird sound, bird-human sound, insect-human sound, bird-insect sound, and silence. A dataset containing seven acoustic scenes was constructed, with 1000 samples for each scene. The requirements of the deep learning models on the training data volume and training epochs in the acoustic scene classification were evaluated through several sets of comparison experiments, and it was found that the models were able to achieve satisfactory accuracy when the training sample data volume for a single category was 600 and the training epochs were 100. To evaluate the generalization performance of different models to new data, a small test dataset was constructed, and multiple trained models were used to make predictions on the test dataset. All experimental results showed that the DenseNet_BC_34 model performs best among the comparison models, with an overall accuracy of 93.81% for the seven acoustic scenes on the validation dataset. This study provides practical experience for the application of deep learning techniques in urban sound monitoring and provides new perspectives and technical support for further exploring the relationship between human activities and biodiversity.
参考文献:
正在载入数据...