Abstract:
Aiming at the problems of complex abalone habitat, low visibility with a large amount of noise in the image in the marine pasture, this study proposes an in-water abalone recognition method YOLOv11-AMSTAR based on the improved You Only Look Once version 11(YOLOv11) model, The core optimization of the model consists of three aspects:Firstly, a new enhanced feature extraction module (C3Star) is constructed using Cross Stage Partial with kernel size 2(C3K2) and StarNet, which enhances the high-dimensional feature representation by star operation, and mines the hidden higher-order correlation information while preserving the original feature information, thus improving the nonlinear representation and feature differentiation ability of the model. Second, the downsampling module is introduced. Secondly, the downsampling module Adaptive Downsampling (ADown) is introduced, which rearranges the dimensionality of the input feature maps and adjusts the fine-grainedness to enhance the ability of the deep network in the model to capture spatial features. Finally, Self-Attention and Convolution mix (ACmix) is added to the neck network to fuse different levels of semantic information, enhance the model's ability to extract and integrate features, and reduce the interference of cluttered background information. The experimental results show that compared with the original model, YOLOv11-AMSTAR's mAP@0.5, recall rate, and accuracy mAP@0.5:0.95 have been increased by 5.21%,2.06%,2.66%,and 1.79%,respectively. The study shows that YOLOv11-AMSTAR can significantly enhance the feature extraction ability of abalone in harsh underwater environments such as low contrast and blur, and significantly improve the detection precision. This study not only provides an efficient and reliable technical solution for automated and accurate fishing of underwater organisms, but also its composite improvement strategy for low-quality images and camouflaged targets provides an important academic reference and application value for solving other target detection problems in similar complex scenes.