<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Toonout on ICE-ICE-BEAR-BLOG</title><link>https://ice-ice-bear.github.io/tags/toonout/</link><description>Recent content in Toonout on ICE-ICE-BEAR-BLOG</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Fri, 17 Apr 2026 00:00:00 +0900</lastBuildDate><atom:link href="https://ice-ice-bear.github.io/tags/toonout/index.xml" rel="self" type="application/rss+xml"/><item><title>ToonOut — A BiRefNet Fork That Finally Gets Anime Hair Right</title><link>https://ice-ice-bear.github.io/posts/2026-04-17-toonout/</link><pubDate>Fri, 17 Apr 2026 00:00:00 +0900</pubDate><guid>https://ice-ice-bear.github.io/posts/2026-04-17-toonout/</guid><description>&lt;img src="https://ice-ice-bear.github.io/" alt="Featured image of post ToonOut — A BiRefNet Fork That Finally Gets Anime Hair Right" /&gt;&lt;h2 id="overview"&gt;Overview
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://github.com/MatteoKartoon/BiRefNet" target="_blank" rel="noopener"
 &gt;MatteoKartoon/BiRefNet&lt;/a&gt; — branded &lt;strong&gt;ToonOut&lt;/strong&gt; — is a fork of the popular &lt;a class="link" href="https://github.com/ZhengPeng7/BiRefNet" target="_blank" rel="noopener"
 &gt;BiRefNet&lt;/a&gt; high-resolution segmentation model, fine-tuned specifically for anime-style characters. It ships with the weights, the 1,228-image training dataset, a paper on &lt;a class="link" href="https://arxiv.org/abs/2509.06839" target="_blank" rel="noopener"
 &gt;arXiv:2509.06839&lt;/a&gt;, and a small but organized codebase. Stars 92, MIT for code and weights, CC-BY 4.0 for the dataset. The numbers they publish are striking: pixel accuracy jumps from &lt;strong&gt;95.3% to 99.5%&lt;/strong&gt; on their test set after domain fine-tuning.&lt;/p&gt;
&lt;pre class="mermaid" style="visibility:hidden"&gt;graph TD
 A["BiRefNet (base model)"] --&gt; B["Fine-tune on 1,228 anime images"]
 B --&gt; C["ToonOut weights (joelseytre/toonout)"]
 D["Toonout dataset (CC-BY 4.0)"] --&gt; B
 C --&gt; E["Improved hair/transparency handling"]
 E --&gt; F["Pixel accuracy 99.5 percent"]&lt;/pre&gt;&lt;h2 id="why-a-fork-instead-of-a-plug-in"&gt;Why a Fork Instead of a Plug-in
&lt;/h2&gt;&lt;p&gt;General-purpose background removers — U²-Net, rembg, even vanilla BiRefNet — are trained on photographic imagery. Anime characters break three assumptions those models quietly make:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Hair has hard edges.&lt;/strong&gt; Photographs have wispy, low-contrast strands. Anime hair is a solid silhouette with occasional internal holes. Photo-trained models tend to either bleed the background into hair gaps or erase sharp spikes.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Transparency is stylistic, not optical.&lt;/strong&gt; Semi-transparent magic effects, glass ornaments, and veils are drawn as 50% alpha without the soft light falloff you&amp;rsquo;d see in a photo. Models trained on photographic transparency hallucinate gradients that aren&amp;rsquo;t there.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Line work is part of the subject.&lt;/strong&gt; Thin black outlines framing a character are signal, not noise. Photo-trained segmenters sometimes trim them as &amp;ldquo;edge artifacts.&amp;rdquo;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;ToonOut addresses all three by fine-tuning with a dataset that explicitly annotates these cases. The paper reports the model &amp;ldquo;shows marked improvements in background removal accuracy for anime-style images&amp;rdquo; — and the 4.2 percentage point jump in pixel accuracy on their held-out test set is the measurable part of that claim.&lt;/p&gt;
&lt;h2 id="the-engineering-polish-matters"&gt;The Engineering Polish Matters
&lt;/h2&gt;&lt;p&gt;Reading the repo structure, this is not a drive-by research release. It&amp;rsquo;s clearly been rebuilt for reuse:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;train_finetuning.sh&lt;/code&gt;&lt;/strong&gt; — adjusted settings, explicitly switching the data type to &lt;strong&gt;bfloat16&lt;/strong&gt; to avoid NaN gradient explosions during fine-tuning. Anyone who has tried to fine-tune BiRefNet at fp16 knows exactly what pain this avoids.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;evaluations.py&lt;/code&gt;&lt;/strong&gt; — a clean rewrite of the original &lt;code&gt;eval_existingOnes.py&lt;/code&gt; with corrected settings. The original BiRefNet eval script is notoriously fiddly; having a trustworthy evaluator is half the battle.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Organized folder layout&lt;/strong&gt; — code is split into &lt;code&gt;birefnet/&lt;/code&gt; (library), &lt;code&gt;scripts/&lt;/code&gt; (Python entry points), and &lt;code&gt;bash_scripts/&lt;/code&gt; (shell wrappers for each script). Five scripts cover the full lifecycle: split, train, test, evaluate, visualize. Three utilities handle baseline prediction, alpha mask extraction, and Photoroom API comparison.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The hardware disclaimer is refreshingly honest: &amp;ldquo;this repo was used on an environment with 2x GeForce RTX 4090 instances with 24GB VRAM.&amp;rdquo; Translation: if you fine-tune on smaller cards, you will need to tune your batch sizes. The authors didn&amp;rsquo;t bury this in a footnote.&lt;/p&gt;
&lt;h2 id="dataset-transparency"&gt;Dataset Transparency
&lt;/h2&gt;&lt;p&gt;1,228 anime images split into &lt;code&gt;train&lt;/code&gt; / &lt;code&gt;val&lt;/code&gt; / &lt;code&gt;test&lt;/code&gt;, each split further organized by generation folder (suggesting the dataset was built iteratively — emotions, outfits, actions across multiple annotation rounds). Each image exists in three views:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;im/&lt;/code&gt; — raw RGB&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gt/&lt;/code&gt; — ground-truth alpha mask&lt;/li&gt;
&lt;li&gt;&lt;code&gt;an/&lt;/code&gt; — RGBA with transparency composited in&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The CC-BY 4.0 license means you can use the dataset commercially as long as you credit the authors. That&amp;rsquo;s rare for anime-related datasets, which often end up in legally ambiguous territory — either non-commercial, or &amp;ldquo;please don&amp;rsquo;t sue us&amp;rdquo; silent about provenance.&lt;/p&gt;
&lt;h2 id="where-this-fits-in-the-pipeline"&gt;Where This Fits in the Pipeline
&lt;/h2&gt;&lt;p&gt;For anyone running a production background removal stack (as I am on &lt;a class="link" href="https://ice-ice-bear.github.io/posts/2026-04-15-popcon-dev7/" &gt;popcon&lt;/a&gt; and &lt;a class="link" href="https://ice-ice-bear.github.io/posts/2026-04-15-hybrid-search-dev14/" &gt;hybrid-image-search-demo&lt;/a&gt;), ToonOut is a drop-in replacement for the BiRefNet model file:&lt;/p&gt;
&lt;pre class="mermaid" style="visibility:hidden"&gt;graph LR
 A[Input anime image] --&gt; B["BiRefNet arch (unchanged)"]
 B --&gt; C["Load: ToonOut weights"]
 C --&gt; D[Alpha mask output]
 D --&gt; E["Composite to RGBA"]&lt;/pre&gt;&lt;p&gt;The inference path doesn&amp;rsquo;t change — same architecture, same input/output spec. You swap the checkpoint and get better hair/transparency on anime subjects. The catch: performance on photographic subjects will regress, because the fine-tune is domain-specialized. If your pipeline handles both realistic and stylized inputs, you&amp;rsquo;d need a classifier upstream or two separate model endpoints.&lt;/p&gt;
&lt;h2 id="quick-links"&gt;Quick Links
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/MatteoKartoon/BiRefNet" target="_blank" rel="noopener"
 &gt;MatteoKartoon/BiRefNet GitHub&lt;/a&gt; — fork with weights, dataset, paper&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://arxiv.org/abs/2509.06839" target="_blank" rel="noopener"
 &gt;arXiv:2509.06839&lt;/a&gt; — the paper&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://huggingface.co/joelseytre/toonout" target="_blank" rel="noopener"
 &gt;joelseytre/toonout on Hugging Face&lt;/a&gt; — ready-to-use weights&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/ZhengPeng7/BiRefNet" target="_blank" rel="noopener"
 &gt;Original BiRefNet&lt;/a&gt; — for comparison&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="insights"&gt;Insights
&lt;/h2&gt;&lt;p&gt;ToonOut is a strong case study in domain fine-tuning economics. 1,228 images is a tiny dataset by modern standards — and yet the pixel accuracy gap it closes (4.2 points on what was already a 95%+ baseline) is exactly the kind of last-mile improvement that matters in production. The interesting pattern is that open-source segmentation models are now being specialized the way fashion or medical classifiers have been for years: take a strong general backbone, curate a domain-specific dataset, fine-tune, release both. When the cost of a good general model is low enough, the competitive surface moves to data curation and domain specialization. That&amp;rsquo;s also why releasing the dataset alongside the weights matters more than releasing either alone — the next fork can add 500 more images, retrain, and move the numbers again.&lt;/p&gt;</description></item></channel></rss>