<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Wan2.6 on ICE-ICE-BEAR-BLOG</title><link>https://ice-ice-bear.github.io/tags/wan2.6/</link><description>Recent content in Wan2.6 on ICE-ICE-BEAR-BLOG</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Fri, 10 Apr 2026 00:00:00 +0900</lastBuildDate><atom:link href="https://ice-ice-bear.github.io/tags/wan2.6/index.xml" rel="self" type="application/rss+xml"/><item><title>PopCon Dev Log #5 — SAM2 Interactive Refinement and Video Model Upgrade</title><link>https://ice-ice-bear.github.io/posts/2026-04-10-popcon-dev5/</link><pubDate>Fri, 10 Apr 2026 00:00:00 +0900</pubDate><guid>https://ice-ice-bear.github.io/posts/2026-04-10-popcon-dev5/</guid><description>&lt;img src="https://ice-ice-bear.github.io/" alt="Featured image of post PopCon Dev Log #5 — SAM2 Interactive Refinement and Video Model Upgrade" /&gt;&lt;blockquote&gt;
 &lt;p&gt;&lt;a class="link" href="https://ice-ice-bear.github.io/en/posts/2026-04-08-popcon-dev4/" &gt;Previous post: PopCon Dev Log #4&lt;/a&gt;&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;h2 id="overview"&gt;Overview
&lt;/h2&gt;&lt;p&gt;Dev Log #4 covered SAM 2.1 interactive segmentation and cost optimization. This post picks up from there: building out the &lt;strong&gt;refine page from scratch&lt;/strong&gt; and completing a &lt;strong&gt;hybrid pipeline&lt;/strong&gt; that runs rembg in batch first, then lets SAM2 handle the touch-ups. On the video side, I upgraded to wan2.6-i2v-flash at 720P and rewrote the motion prompts around physical body mechanics.&lt;/p&gt;
&lt;h2 id="1-hybrid-background-removal-pipeline"&gt;1. Hybrid Background Removal Pipeline
&lt;/h2&gt;&lt;h3 id="the-problem-rembg-alone-isnt-enough"&gt;The problem: rembg alone isn&amp;rsquo;t enough
&lt;/h3&gt;&lt;p&gt;rembg is fast and great for batch processing, but with complex-boundary images like emoji frames, it often leaves background residue or clips off parts of the character. SAM2 is precise, but clicking through every single frame one by one takes forever.&lt;/p&gt;
&lt;h3 id="the-solution-rembg-batch--sam2-touch-up"&gt;The solution: rembg batch → SAM2 touch-up
&lt;/h3&gt;&lt;p&gt;The answer was to combine both tools&amp;rsquo; strengths into a hybrid approach.&lt;/p&gt;
&lt;pre class="mermaid" style="visibility:hidden"&gt;flowchart LR
 A["Upload emoji frames"] --&gt; B["rembg batch&amp;lt;br/&amp;gt;(runs automatically)"]
 B --&gt; C{"Review result"}
 C --&gt;|"Looks clean"| D["Done"]
 C --&gt;|"Residue / clipping"| E["SAM2 refine&amp;lt;br/&amp;gt;erase / restore clicks"]
 E --&gt; F["Apply &amp;amp;&amp;lt;br/&amp;gt;update frame"]
 F --&gt; C&lt;/pre&gt;&lt;p&gt;The key idea is that when the user navigates to &lt;code&gt;/refine&lt;/code&gt;, &lt;strong&gt;rembg runs automatically&lt;/strong&gt; in the background while a loading screen is displayed. Once it finishes, the user lands directly in the refine canvas and only needs to touch up the frames that need it.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-typescript" data-lang="typescript"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;// refine/page.tsx — trigger auto-rembg on page load
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nx"&gt;useEffect&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;frames&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;rembgComplete&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nx"&gt;runRembgOnAllFrames&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;frames&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nx"&gt;setRembgComplete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;frames&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="sam2-erase--restore-refinement"&gt;SAM2 erase / restore refinement
&lt;/h3&gt;&lt;p&gt;SAM2 on top of the rembg result operates in two modes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Erase&lt;/strong&gt;: click a leftover background region — it gets masked out&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Restore&lt;/strong&gt;: click a part of the character that rembg accidentally removed — it gets restored from the original&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;RembgRefineCanvas.tsx&lt;/code&gt; collects canvas click coordinates and sends a list of points to the backend SAM2 endpoint. One tricky part: &lt;strong&gt;multi-point input must be wrapped as a single object&lt;/strong&gt;. If the SAM2 API interprets each point as a separate object, you get one mask per point instead of a single unified mask.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# backend/main.py — wrap multi-point as single object&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;input_points&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;x&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;y&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;points&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;input_labels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;label&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;points&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="c1"&gt;# 1=foreground, 0=background&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Pass as single object to get one unified mask&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;masks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;predictor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;point_coords&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;input_points&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;point_labels&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;input_labels&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;multimask_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="2-dedicated-refinement-canvas-for-character-images"&gt;2. Dedicated Refinement Canvas for Character Images
&lt;/h2&gt;&lt;p&gt;Beyond emoji frames, the &lt;strong&gt;character source image&lt;/strong&gt; also needs SAM2 refinement. If the character upload has a messy background, it propagates problems through the entire downstream pipeline.&lt;/p&gt;
&lt;p&gt;I built &lt;code&gt;CharacterRefineCanvas.tsx&lt;/code&gt; as a standalone component called from &lt;code&gt;CharacterUpload.tsx&lt;/code&gt;. The erase/restore logic is identical to the emoji side, but the UI focuses on a single image without frame navigation.&lt;/p&gt;
&lt;h2 id="3-refine-ux-polish"&gt;3. Refine UX Polish
&lt;/h2&gt;&lt;p&gt;The pipeline was working, but using it in practice exposed a pile of UX issues. A significant portion of the 24 commits in this sprint went into fixing them.&lt;/p&gt;
&lt;h3 id="side-by-side-original-reference"&gt;Side-by-side original reference
&lt;/h3&gt;&lt;p&gt;When refining, you constantly need to check &amp;ldquo;is this part background or character?&amp;rdquo; — which means you need the original visible. I added a side-by-side layout with the original next to the refine canvas, and &lt;strong&gt;synchronized crosshairs&lt;/strong&gt; so the mouse position shows up on both views simultaneously.&lt;/p&gt;
&lt;h3 id="frame-navigation"&gt;Frame navigation
&lt;/h3&gt;&lt;p&gt;Emoji sets typically have dozens of frames. I added arrow-key navigation between frames and a clickable thumbnail strip at the bottom. Per-frame SAM2 segmentation state also resets automatically on frame switch.&lt;/p&gt;
&lt;h3 id="toolbar-consolidation"&gt;Toolbar consolidation
&lt;/h3&gt;&lt;p&gt;The initial version had buttons scattered everywhere. I moved undo / reset / apply into a compact toolbar above the canvas and collapsed everything into a single row. A tabbed UI lets you toggle between viewing the rembg result and entering SAM2 refine mode.&lt;/p&gt;
&lt;h3 id="small-bug-fixes"&gt;Small bug fixes
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Click dots lingering after Apply&lt;/strong&gt; — fixed by clearing the dots array in the apply event handler&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Unnecessary CORS preflight on same-origin images&lt;/strong&gt; — setting &lt;code&gt;crossOrigin&lt;/code&gt; on a canvas &lt;code&gt;Image&lt;/code&gt; object triggers a preflight even for same-origin URLs; removed the unneeded attribute&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="4-video-generation-upgrade"&gt;4. Video Generation Upgrade
&lt;/h2&gt;&lt;h3 id="wan26-i2v-flash-at-720p"&gt;wan2.6-i2v-flash at 720P
&lt;/h3&gt;&lt;p&gt;I upgraded the video generation model to &lt;strong&gt;wan2.6-i2v-flash&lt;/strong&gt; and bumped the resolution to 720P. The model parameter update also required a field name change in the API call.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Renamed prompt_extend → extend_prompt, added negative_prompt&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;wan2.6-i2v-flash&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;motion_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;extend_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# was: prompt_extend&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;negative_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;blurry, low quality, distorted&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;resolution&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;720P&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="motion-prompt-rewrite"&gt;Motion prompt rewrite
&lt;/h3&gt;&lt;p&gt;The old motion presets were simple instructions like &amp;ldquo;wave hand&amp;rdquo; or &amp;ldquo;nod head&amp;rdquo;. I rewrote them as &lt;strong&gt;detailed physical body mechanics descriptions&lt;/strong&gt;. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Before: &lt;code&gt;&amp;quot;wave hand&amp;quot;&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;After: &lt;code&gt;&amp;quot;character raises right arm from resting position, forearm rotates at elbow joint, hand pivots at wrist with fingers spread, smooth pendulum motion&amp;quot;&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="background-prompt-correction"&gt;Background prompt correction
&lt;/h3&gt;&lt;p&gt;Effects (particles, light, etc.) were sticking to the character during video generation. I added an explicit instruction to keep effects separate from the character and forced a &lt;strong&gt;solid white background&lt;/strong&gt; in the prompt.&lt;/p&gt;
&lt;h2 id="5-matting-model-benchmark"&gt;5. Matting Model Benchmark
&lt;/h2&gt;&lt;h3 id="why-a-separate-benchmark-was-needed"&gt;Why a separate benchmark was needed
&lt;/h3&gt;&lt;p&gt;I kept saying rembg works &amp;ldquo;most of the time,&amp;rdquo; but had no quantitative evidence. To systematically compare which matting model works best for the specific domain of LINE animated emoji frames, I created a dedicated repository: &lt;strong&gt;popcon-matting-bench&lt;/strong&gt;.&lt;/p&gt;
&lt;h3 id="test-conditions"&gt;Test conditions
&lt;/h3&gt;&lt;p&gt;Six models/configurations were compared:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Model&lt;/th&gt;
 &lt;th&gt;Description&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;rembg&lt;/td&gt;
 &lt;td&gt;Default U2-Net based (baseline)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;rembg_enhanced&lt;/td&gt;
 &lt;td&gt;rembg with enhanced post-processing&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;MODNet ONNX&lt;/td&gt;
 &lt;td&gt;Lightweight 25MB portrait matting&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;ViTMatte_5&lt;/td&gt;
 &lt;td&gt;trimap width 5px&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;ViTMatte_10&lt;/td&gt;
 &lt;td&gt;trimap width 10px&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;ViTMatte_20&lt;/td&gt;
 &lt;td&gt;trimap width 20px&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;RVM&lt;/td&gt;
 &lt;td&gt;Robust Video Matting (designed for real human video)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="metrics"&gt;Metrics
&lt;/h3&gt;&lt;p&gt;Two metrics were used:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Halo Score&lt;/strong&gt;: White fringe intensity at the alpha edge when composited on a black background. Lower is better.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Coverage Ratio&lt;/strong&gt;: Foreground area relative to the rembg baseline. 1.0 means identical to baseline.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Halo Score calculation — measures white fringe at alpha boundary&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compute_halo_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rgb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="s2"&gt;&amp;#34;&amp;#34;&amp;#34;Measure brightness leakage at alpha edge on black composite.&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;# Extract alpha boundary (pixels where 0 &amp;lt; alpha &amp;lt; 255)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;edge_mask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;245&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;edge_mask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;# Composite on black background&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;composite&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rgb&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;255.0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uint8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;# Average brightness at edge region&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;edge_brightness&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;composite&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;edge_mask&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;255.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;edge_brightness&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="results-cartoon-bear-character-24-frames"&gt;Results: cartoon bear character (24 frames)
&lt;/h3&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Model&lt;/th&gt;
 &lt;th&gt;Clean Rate&lt;/th&gt;
 &lt;th&gt;Halo Score&lt;/th&gt;
 &lt;th&gt;Coverage Ratio&lt;/th&gt;
 &lt;th&gt;Notes&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;rembg&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;100%&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;0.000&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;1.000&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Best for high-contrast cartoon&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;rembg_enhanced&lt;/td&gt;
 &lt;td&gt;100%&lt;/td&gt;
 &lt;td&gt;0.000&lt;/td&gt;
 &lt;td&gt;1.000&lt;/td&gt;
 &lt;td&gt;Identical to rembg&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;ViTMatte_20&lt;/td&gt;
 &lt;td&gt;100%&lt;/td&gt;
 &lt;td&gt;0.031&lt;/td&gt;
 &lt;td&gt;1.016&lt;/td&gt;
 &lt;td&gt;Best detail preservation (motion lines, effects)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;ViTMatte_10&lt;/td&gt;
 &lt;td&gt;100%&lt;/td&gt;
 &lt;td&gt;0.024&lt;/td&gt;
 &lt;td&gt;1.008&lt;/td&gt;
 &lt;td&gt;Stable&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;ViTMatte_5&lt;/td&gt;
 &lt;td&gt;100%&lt;/td&gt;
 &lt;td&gt;0.018&lt;/td&gt;
 &lt;td&gt;1.002&lt;/td&gt;
 &lt;td&gt;Conservative&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;MODNet&lt;/td&gt;
 &lt;td&gt;96%&lt;/td&gt;
 &lt;td&gt;0.045&lt;/td&gt;
 &lt;td&gt;0.860&lt;/td&gt;
 &lt;td&gt;Loses 14% of foreground (portrait-trained)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;RVM&lt;/td&gt;
 &lt;td&gt;42%&lt;/td&gt;
 &lt;td&gt;0.089&lt;/td&gt;
 &lt;td&gt;0.630&lt;/td&gt;
 &lt;td&gt;Destroys cartoon content (real video-trained)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="conclusions"&gt;Conclusions
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;rembg&lt;/strong&gt;: For thick-outline, high-contrast cartoon characters — zero halo, 100% coverage. No additional model needed.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ViTMatte_20&lt;/strong&gt;: For frames with thin lines, pastel tones, or motion blur, it preserves 1.6% more detail than rembg. Suitable for complex emoji.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MODNet / RVM&lt;/strong&gt;: Optimized for portraits and real-world video respectively — unsuitable for cartoon emoji. MODNet loses 14% of foreground; RVM loses 37%.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This benchmark drove the hybrid pipeline design decision — simple characters are handled fine by rembg auto-processing, and only complex frames need SAM2 touch-up.&lt;/p&gt;
&lt;h2 id="6-other-improvements"&gt;6. Other Improvements
&lt;/h2&gt;&lt;h3 id="custom-prompt-editor"&gt;Custom prompt editor
&lt;/h3&gt;&lt;p&gt;Users can now edit motion prompts directly in a text editor. Editor state persists across page navigation.&lt;/p&gt;
&lt;h3 id="download-buttons"&gt;Download buttons
&lt;/h3&gt;&lt;p&gt;Added per-frame and per-video download buttons so refined frames and generated videos can be saved individually.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary
&lt;/h2&gt;&lt;p&gt;The theme of this sprint was &lt;strong&gt;balancing automation with manual touch-up&lt;/strong&gt;.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Area&lt;/th&gt;
 &lt;th&gt;What changed&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Background removal&lt;/td&gt;
 &lt;td&gt;rembg auto-batch → SAM2 manual touch-up hybrid&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Matting benchmark&lt;/td&gt;
 &lt;td&gt;6 models compared — rembg best for high-contrast cartoon, ViTMatte_20 best for detail preservation&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Refine UX&lt;/td&gt;
 &lt;td&gt;Side-by-side reference, keyboard navigation, tabbed UI&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Character refinement&lt;/td&gt;
 &lt;td&gt;Dedicated SAM2 canvas separated out&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Video generation&lt;/td&gt;
 &lt;td&gt;wan2.6-i2v-flash 720P, body mechanics prompts&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Convenience&lt;/td&gt;
 &lt;td&gt;Custom prompts, download buttons, state persistence&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;With rembg handling ~90% of the work and SAM2 catching the remaining 10%, the time spent removing backgrounds across dozens of emoji frames dropped to less than half of what it was before. The next post will cover deploying these finished assets to actual sticker and emoji platforms.&lt;/p&gt;</description></item></channel></rss>