CVPR2025 | MobileMamba: A new breakthrough in lightweight Mamba network, taking into account multiple receptive fields, efficient reasoning and super precision

1. Overview at a glance

MobileMamba proposes a lightweight multi-receptive field visual Mamba network . Through a three-stage network design and the MRFFI (Multi-Receptive Field Feature Interaction) module, it improves the model inference speed while achieving higher accuracy, surpassing the existing CNN, ViT and Mamba structures.

2. Core Issues

The current lightweight visual models are mainly based on CNN and Transformer:

• CNN’s local receptive field limits its global modeling capabilities.

• Transformer has a global receptive field, but the computational complexity is high at high resolution ( O(N²) ).

• The existing Mamba lightweight model has low FLOPs but slow inference speed .

MobileMamba aims to:

• Optimize the inference speed of Mamba to improve the throughput while ensuring low FLOPs.

• Enhance multi-scale receptive field interaction , taking into account both long- and short-range feature capture and high-frequency detail extraction.

• Adapt to high-resolution tasks and improve performance in tasks such as classification, object detection, and semantic segmentation.

3. Technical highlights

(1) Three-stage network design

• By weighing the trade-offs between four-stage and three-stage networks, choose a three-stage architecture to improve accuracy at the same throughput , or improve throughput at the same accuracy .

(2) MRFFI (Multi-Receptive Field Feature Interaction) module

• WTE-Mamba (Long-range Wavelet Transform Enhanced Mamba) : combines global modeling with high-frequency edge information extraction.

• MK-DeConv (Multi-core Deep Convolution) : Extract information of different scales and enhance local receptive field.

• Eliminate Redundant Identity : Reduce channel redundancy and improve computing efficiency.

(3) Training & Testing Strategy Optimization

• Knowledge Distillation improves the learning ability of lightweight models.

• Extended Training Epochs further improves the upper limit of accuracy.

• Normalization Layer Fusion accelerates inference at test time.

4. Methodological framework

picture

MobileMamba optimizes inference and feature extraction through the following core steps:

(1) Multi-receptive field feature interaction (MRFFI)

• Long-range information is extracted through WTE-Mamba , while high-frequency features are enhanced by combining wavelet transform.

• MK-DeConv uses convolution kernels of different sizes to interact local information and improve multi-scale perception capabilities.

• Reduce computational cost and improve inference speed by eliminating redundant identity mappings .

(2) Lightweight Mamba structure

• A three-stage design is used to reduce the amount of computation and improve throughput.

• Combine multi-directional scanning and low-rank state space mapping to improve computational efficiency.

(3) Optimizing training and inference

• Knowledge distillation : Learn from stronger teacher models to improve small model performance.

• Extend the number of training rounds : Experiments have shown that 300 rounds did not fully converge, and extending it to 1000 rounds can improve accuracy.

• Normalization layer fusion : reduces computational redundancy and improves computational efficiency during inference.

5. Quick Overview of Experimental Results

picture

MobileMamba demonstrates superior performance in multiple benchmark tests:

✅ ImageNet-1K classification

• MobileMamba-B4 83.6% Top-1 , +1.8% improvement over EfficientVMamba , and ×3.5 times faster inference speed .

✅Object Detection (COCO)

• Mask R-CNN : Compared with EMO, it improves mAP by +1.3↑ and throughput by +57%↑ .

• RetinaNet : Improves mAP by +2.1↑ and inference speed by ×4.3 times compared to EfficientVMamba .

✅Semantic Segmentation (ADE20K)

• Semantic FPN : Improves mIoU by +1.1↑ compared to EdgeViT , with only 20% of FLOPs .

• PSPNet : Improves mIoU by +0.4↑ compared to MobileViTv2 , with only 11% FLOPs .

6. Practical value and application

• Edge device visual computing : suitable for resource-constrained scenarios such as smartphones, embedded devices, and the Internet of Things (IoT).

• Autonomous driving and monitoring : Provides efficient visual computing in high-resolution scenarios , suitable for target detection and segmentation tasks.

• Medical image analysis : Extract key medical image features through multi-receptive field characteristics to improve diagnostic efficiency .

7. Open Questions

Is MobileMamba’s multi-receptive field feature interaction strategy applicable to other tasks such as video understanding or 3D vision?

How to further optimize MobileMamba to improve CPU/mobile inference speed?

Can we combine LoRA or other efficient parameter fine-tuning methods to improve the adaptability of MobileMamba for specific tasks?

<<:

>>: Required course: VLAN is so important! Share VLAN planning and configuration examples in two most common scenarios!

edgeNAT April Fools' Day event offers 30% off monthly payment and 40% off annual payment, Korean/US/Hong Kong VPS starts from 42 yuan per month

Blog

Huawei releases MetaAAU, reducing energy consumption by 30% and improving performance and energy saving

Recommend

5 must-have software for computer installation, each one is more powerful than the other, you can't live without it after using it

[[398008]] I found that I haven't shared PC s...

CVPR2025 | MobileMamba: A new breakthrough in lightweight Mamba network, taking into account multiple receptive fields, efficient reasoning and super precision

1. Overview at a glance

2. Core Issues

3. Technical highlights

4. Methodological framework

5. Quick Overview of Experimental Results

6. Practical value and application

7. Open Questions

edgeNAT April Fools' Day event offers 30% off monthly payment and 40% off annual payment, Korean/US/Hong Kong VPS starts from 42 yuan per month

Huawei releases MetaAAU, reducing energy consumption by 30% and improving performance and energy saving

Six steps to prepare for a 5G IoT future

What is a Computer Network Hub?

HTTP connection management diagram

AkkoCloud: UK CN2 GIA/Germany CN2 GIA/US CN2 GIA high bandwidth VPS quarterly payment starts from 99 yuan

How to understand the multi-layer model of bus communication protocol

Let's talk about short links

Hosteons: OpenVZ/KVM VPS hosting 50% off, $13.5 per year, multiple data centers in Los Angeles/New York

Borui Data passed the CMMI Level 5 assessment, the first in the domestic APM field

Recommend

5 must-have software for computer installation, each one is more powerful than the other, you can't live without it after using it

Explore different VGG networks. What do you discover?

Advantages of 5G technology in future US military networks

A big competition among operators’ 5G strengths!

High-quality networks build differentiated competitiveness for operators and enable business success

What should you know about 5G technology? What will happen in the future?

16 Useful Bandwidth Monitoring Tools to Analyze Network Usage in Linux

A quick overview of 5G industry developments in March 2021

AI and blockchain: What kind of sparks will the collision of these two popular technologies create?

GSMA: China is expected to have 460 million 5G connections by the end of 2025

10 Useful HTML File Upload Tips

Edge cloud and 5G will impact the next era of networking

How did TA succeed in intercepting tens of millions of malicious addresses?

What network engineers should know about ARP

5G development strategies and measures of major countries and regions and their implications for my country