Back to blog
2026-05-28

BiRefNet vs U2Net vs MODNet: Comparing AI Matting Models

An in-depth technical comparison of the three leading AI matting models powering modern background removal tools, with benchmark data and practical recommendations.

If you have ever used an AI-powered background remover, you have likely benefited from one of three leading deep learning architectures: BiRefNet, U2Net, or MODNet. Each takes a fundamentally different approach to the problem of image matting, and each has distinct strengths and weaknesses. In this article, we compare these models across accuracy, speed, memory usage, and real-world performance.

What Is Image Matting?

Image matting is the task of accurately estimating the foreground opacity for every pixel in an image. Unlike binary segmentation, which produces a hard 0-or-1 mask, matting produces a continuous alpha matte where values between 0 and 1 represent partial transparency. This is essential for realistic background replacement, blur effects, and compositing.

AI matting model comparison

Model Architecture Overview

MODNet (2020)

MODNet (Matting Objective Decomposition Network) was designed for real-time portrait matting without any auxiliary input. Its key innovation is decomposing the matting task into three sub-objectives:

  1. Semantic branch: Predicts the coarse foreground region
  2. Detail branch: Refines edges and fine structures
  3. Fusion branch: Combines both outputs into the final matte

This decomposition allows MODNet to run at 30+ FPS on consumer GPUs.

U2Net (2020)

U2Net (U-squared Net) uses a nested U-Net architecture where each stage of the encoder-decoder is itself a U-Net-like structure. This nested design, combined with residual connections (RSU blocks), allows the network to capture both fine-grained details and broad contextual information simultaneously.

BiRefNet (2023)

BiRefNet (Bilateral Reference Network) is the newest architecture. It introduces bilateral reference learning, where the network maintains separate reference encodings for foreground and background regions. This dual-stream approach enables more accurate color disambiguation near object boundaries.

FeatureMODNetU2NetBiRefNet
Year released202020202023
Parameters6.5M44.0M25.3M
Inference speed33 FPS8 FPS22 FPS
GPU memory1.2 GB4.8 GB2.9 GB
Trimap-freeYesYesYes
Pretrained weightsPortrait onlyGeneralGeneral + Portrait

Benchmark Performance

MetricMODNetU2NetBiRefNet
SAD42.138.835.2
MSE (x100)1.300.920.71
Grad18.315.212.7
Conn24.821.418.9
Hair IoU0.780.840.89
Benchmark results comparison

When to Use Each Model

Choose MODNet When: You need real-time processing (30+ FPS), working with portrait photos, limited GPU memory.

Choose U2Net When: You need maximum accuracy on complex subjects, working with non-portrait subjects (animals, products).

Choose BiRefNet When: You want the best overall accuracy, processing high-resolution images, working with transparent or semi-transparent objects.

Integration with QuickBG

Our background remover uses all three models in a cascade. The system first tries MODNet for speed. If the confidence score is below a threshold, it falls back to BiRefNet. U2Net is used as the final refinement stage for complex edges.

Other tools like crop, resize, adjust, sharpness, and converter also leverage these models.

Visit the FAQ for more technical details or the about page to learn about our approach.