<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Pymatting on ICE-ICE-BEAR-BLOG</title><link>https://ice-ice-bear.github.io/tags/pymatting/</link><description>Recent content in Pymatting on ICE-ICE-BEAR-BLOG</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Mon, 13 Apr 2026 00:00:00 +0900</lastBuildDate><atom:link href="https://ice-ice-bear.github.io/tags/pymatting/index.xml" rel="self" type="application/rss+xml"/><item><title>The Background Removal Library Landscape — BiRefNet, ViTMatte, MatAnyone, and Friends</title><link>https://ice-ice-bear.github.io/posts/2026-04-13-matting-libraries/</link><pubDate>Mon, 13 Apr 2026 00:00:00 +0900</pubDate><guid>https://ice-ice-bear.github.io/posts/2026-04-13-matting-libraries/</guid><description>&lt;img src="https://ice-ice-bear.github.io/" alt="Featured image of post The Background Removal Library Landscape — BiRefNet, ViTMatte, MatAnyone, and Friends" /&gt;&lt;h2 id="overview"&gt;Overview
&lt;/h2&gt;&lt;p&gt;Building &lt;a class="link" href="https://github.com/ice-ice-bear/popcon-matting-bench" target="_blank" rel="noopener"
 &gt;popcon-matting-bench&lt;/a&gt; forced a survey of every credible open-source matting library. The space breaks into three eras: classical algorithms (pymatting, FBA), trimap-free deep models (BiRefNet, ViTMatte), and the new generation of stable video matting (MatAnyone). This post maps the landscape and notes which model wins for which job.&lt;/p&gt;
&lt;h2 id="todays-exploration-map"&gt;Today&amp;rsquo;s Exploration Map
&lt;/h2&gt;&lt;pre class="mermaid" style="visibility:hidden"&gt;graph TD
 A[Background Removal Need] --&gt; B{Trimap available?}
 B --&gt;|Yes| C[Classical: pymatting / FBA]
 B --&gt;|No| D{Image or Video?}
 D --&gt;|Image| E[BiRefNet / ViTMatte]
 D --&gt;|Video| F[MatAnyone]
 E --&gt; G[Toon-style?]
 G --&gt;|Yes| H[MatteoKartoon BiRefNet fork]
 G --&gt;|No| I[ZhengPeng7 BiRefNet]&lt;/pre&gt;&lt;h2 id="birefnet--high-resolution-dichotomous-segmentation"&gt;BiRefNet — High-Resolution Dichotomous Segmentation
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://github.com/ZhengPeng7/BiRefNet" target="_blank" rel="noopener"
 &gt;ZhengPeng7/BiRefNet&lt;/a&gt; (CAAI AIR 2024) is the model nearly every recent background-removal demo, including &lt;a class="link" href="https://www.birefnet.top/" target="_blank" rel="noopener"
 &gt;birefnet.top&lt;/a&gt;, is built on. It targets &lt;em&gt;dichotomous image segmentation&lt;/em&gt; — high-resolution binary foreground/background masks — and it does so with a bilateral reference design: two streams (one for the source image, one for a reference) cross-attend through the U-Net decoder.&lt;/p&gt;
&lt;p&gt;Two things make BiRefNet stand out:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Resolution.&lt;/strong&gt; Most segmentation models top out at 1024×1024; BiRefNet has weights for 2048×2048 and the architecture handles arbitrary aspect ratios well. For e-commerce or asset extraction, this is decisive.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Generalization.&lt;/strong&gt; The default &lt;code&gt;general&lt;/code&gt; checkpoint handles humans, products, animals, and abstract shapes. Specialized variants (&lt;code&gt;portrait&lt;/code&gt;, &lt;code&gt;matting&lt;/code&gt;, &lt;code&gt;dis5k_general&lt;/code&gt;) are available on Hugging Face if you need accuracy on a specific domain.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;a class="link" href="https://github.com/MatteoKartoon/BiRefNet" target="_blank" rel="noopener"
 &gt;MatteoKartoon/BiRefNet&lt;/a&gt; is a fork called &lt;strong&gt;ToonOut&lt;/strong&gt; that fine-tunes BiRefNet on toon/sticker datasets — relevant for any product generating animated emoji or cartoon assets. The fork mostly changes the training data and the evaluation harness; the core model is unchanged.&lt;/p&gt;
&lt;h2 id="vitmatte--vit-backbone-trimap-input"&gt;ViTMatte — ViT Backbone, Trimap Input
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://github.com/hustvl/ViTMatte" target="_blank" rel="noopener"
 &gt;hustvl/ViTMatte&lt;/a&gt; (Information Fusion vol.103, March 2024) takes a different bet: a Vision Transformer backbone with explicit trimap input. The trimap (foreground / background / unknown regions) is a hard requirement, which makes ViTMatte less plug-and-play than BiRefNet but &lt;strong&gt;significantly more accurate on hair, fur, and translucent edges&lt;/strong&gt; when you can supply one. The pipeline pattern is: BiRefNet produces an initial mask → erode/dilate to a trimap → ViTMatte refines the alpha at sub-pixel quality.&lt;/p&gt;
&lt;h2 id="matanyone--stable-video-matting-cvpr-2025"&gt;MatAnyone — Stable Video Matting (CVPR 2025)
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://github.com/pq-yang/MatAnyone" target="_blank" rel="noopener"
 &gt;pq-yang/MatAnyone&lt;/a&gt; targets the hardest matting problem: &lt;strong&gt;temporal stability&lt;/strong&gt;. Frame-by-frame matting on video produces flicker — the alpha mask jitters by a pixel or two between frames, which the human eye picks up immediately. MatAnyone introduces memory-augmented region propagation: the model carries a memory bank of past frames&amp;rsquo; high-confidence regions and uses them to constrain the current frame&amp;rsquo;s mask. The result is video matting that doesn&amp;rsquo;t shimmer.&lt;/p&gt;
&lt;p&gt;This matters for popcon&amp;rsquo;s animated-emoji pipeline: extracting a clean alpha across 30 frames requires either MatAnyone or a hand-rolled temporal smoother on top of BiRefNet.&lt;/p&gt;
&lt;h2 id="pymatting-and-fba--the-classical-baselines"&gt;pymatting and FBA — The Classical Baselines
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://github.com/pymatting/pymatting" target="_blank" rel="noopener"
 &gt;pymatting/pymatting&lt;/a&gt; (1.9k stars, MIT) implements every classical alpha matting method worth knowing — Closed-Form, KNN, Large Kernel, Random Walk, Shared Sampling — plus Fast Multi-Level Foreground Estimation. It requires a trimap but runs entirely on CPU (with optional CuPy/PyOpenCL acceleration for foreground estimation). The library is also the foundation of &lt;a class="link" href="https://github.com/danielgatis/rembg" target="_blank" rel="noopener"
 &gt;Rembg&lt;/a&gt;, the most widely deployed open-source background removal tool.&lt;/p&gt;
&lt;p&gt;&lt;a class="link" href="https://github.com/MarcoForte/FBA_Matting" target="_blank" rel="noopener"
 &gt;MarcoForte/FBA_Matting&lt;/a&gt; is the official &amp;ldquo;F, B, Alpha&amp;rdquo; matting paper repo — predicts foreground color, background color, and alpha jointly, which gives much cleaner composites when the foreground and background colors differ subtly.&lt;/p&gt;
&lt;p&gt;The classical methods aren&amp;rsquo;t obsolete. For high-throughput batch processing where a trimap is available (e.g., chroma-key footage, scanned documents), they&amp;rsquo;re often &lt;strong&gt;10-100× faster&lt;/strong&gt; than deep models with comparable quality.&lt;/p&gt;
&lt;h2 id="architecture-pattern-for-popcon-matting-bench"&gt;Architecture Pattern for popcon-matting-bench
&lt;/h2&gt;&lt;pre class="mermaid" style="visibility:hidden"&gt;graph LR
 A[Input Image] --&gt; B[BiRefNet &amp;lt;br/&amp;gt; coarse mask]
 B --&gt; C[Trimap generation &amp;lt;br/&amp;gt; erode/dilate]
 C --&gt; D[ViTMatte &amp;lt;br/&amp;gt; or pymatting]
 D --&gt; E[FBA Foreground &amp;lt;br/&amp;gt; estimation]
 E --&gt; F[Composite output]&lt;/pre&gt;&lt;p&gt;The benchmark repo&amp;rsquo;s job is to score each model on standard datasets (DIS-5K, AIM-500, RealWorldPortrait636) and produce a comparison harness. Key metrics: SAD, MSE, Grad, Conn for alpha quality; mIoU for binary segmentation; latency per 1024×1024 image on a single A100.&lt;/p&gt;
&lt;h2 id="insights"&gt;Insights
&lt;/h2&gt;&lt;p&gt;The matting space has bifurcated cleanly: &lt;strong&gt;BiRefNet owns high-resolution segmentation, ViTMatte owns trimap-refined alpha, MatAnyone owns video, and pymatting/FBA own the classical CPU path.&lt;/strong&gt; There&amp;rsquo;s no single model that wins everywhere — production pipelines almost always cascade two or three. The interesting business question is no longer &lt;em&gt;which model&lt;/em&gt; but &lt;em&gt;what trimap workflow you want&lt;/em&gt;: zero-shot (BiRefNet alone) trades quality for ergonomics, while two-stage (BiRefNet → ViTMatte) trades latency for hair-grade accuracy. ToonOut shows the path forward for verticalized matting — the base model is good enough that fine-tuning on niche datasets is a low-risk play.&lt;/p&gt;
&lt;h2 id="quick-links"&gt;Quick Links
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/ZhengPeng7/BiRefNet" target="_blank" rel="noopener"
 &gt;ZhengPeng7/BiRefNet&lt;/a&gt; — base model, CAAI AIR'24&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/MatteoKartoon/BiRefNet" target="_blank" rel="noopener"
 &gt;MatteoKartoon/BiRefNet (ToonOut)&lt;/a&gt; — toon-finetuned fork&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/hustvl/ViTMatte" target="_blank" rel="noopener"
 &gt;hustvl/ViTMatte&lt;/a&gt; — trimap-based ViT matting&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/pq-yang/MatAnyone" target="_blank" rel="noopener"
 &gt;pq-yang/MatAnyone&lt;/a&gt; — stable video matting (CVPR'25)&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/pymatting/pymatting" target="_blank" rel="noopener"
 &gt;pymatting/pymatting&lt;/a&gt; — classical algorithms&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/MarcoForte/FBA_Matting" target="_blank" rel="noopener"
 &gt;MarcoForte/FBA_Matting&lt;/a&gt; — F, B, Alpha joint estimation&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://www.birefnet.top/" target="_blank" rel="noopener"
 &gt;birefnet.top demo&lt;/a&gt; — online inference&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/ice-ice-bear/popcon-matting-bench" target="_blank" rel="noopener"
 &gt;ice-ice-bear/popcon-matting-bench&lt;/a&gt; — the benchmark&lt;/li&gt;
&lt;/ul&gt;</description></item></channel></rss>