<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Dashscope on ICE-ICE-BEAR-BLOG</title><link>https://ice-ice-bear.github.io/tags/dashscope/</link><description>Recent content in Dashscope on ICE-ICE-BEAR-BLOG</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Wed, 08 Apr 2026 00:00:00 +0900</lastBuildDate><atom:link href="https://ice-ice-bear.github.io/tags/dashscope/index.xml" rel="self" type="application/rss+xml"/><item><title>PopCon Dev Log #4 — SAM 2.1 Interactive Segmentation and Cost Optimization</title><link>https://ice-ice-bear.github.io/posts/2026-04-08-popcon-dev4/</link><pubDate>Wed, 08 Apr 2026 00:00:00 +0900</pubDate><guid>https://ice-ice-bear.github.io/posts/2026-04-08-popcon-dev4/</guid><description>&lt;img src="https://ice-ice-bear.github.io/" alt="Featured image of post PopCon Dev Log #4 — SAM 2.1 Interactive Segmentation and Cost Optimization" /&gt;&lt;p&gt;&lt;a class="link" href="https://ice-ice-bear.github.io/en/posts/2026-04-07-popcon-dev3/" &gt;Previous post: PopCon Dev Log #3&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="overview"&gt;Overview
&lt;/h2&gt;&lt;p&gt;This is the fourth entry in the PopCon dev log series. Two major changes happened this round. First, VEO 3&amp;rsquo;s cost was unsustainable, so I switched the video generation model to Alibaba&amp;rsquo;s DashScope Wan 2.2. Second, rembg&amp;rsquo;s background removal quality wasn&amp;rsquo;t cutting it, so I built an interactive segmentation workflow using Meta&amp;rsquo;s SAM 2.1 — users click on the foreground object and SAM generates a precise mask.&lt;/p&gt;
&lt;h2 id="video-generation-model-swap-veo-3--dashscope-wan-22"&gt;Video Generation Model Swap: VEO 3 → DashScope Wan 2.2
&lt;/h2&gt;&lt;h3 id="the-cost-problem"&gt;The Cost Problem
&lt;/h3&gt;&lt;p&gt;VEO 3 produced good results, but the cost added up fast. PopCon needs to generate multiple action videos per emoji character, so per-generation cost matters a lot.&lt;/p&gt;
&lt;p&gt;I evaluated several alternatives:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Option&lt;/th&gt;
 &lt;th&gt;Pros&lt;/th&gt;
 &lt;th&gt;Cons&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;fal.ai Wan 2.1&lt;/td&gt;
 &lt;td&gt;Simple API&lt;/td&gt;
 &lt;td&gt;Mediocre quality-to-price ratio&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;RunPod GPU&lt;/td&gt;
 &lt;td&gt;Full control&lt;/td&gt;
 &lt;td&gt;Infrastructure overhead&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Alibaba DashScope Wan 2.2&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;Lowest cost, decent quality&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;China-based API&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;DashScope Wan 2.2 won on price-to-quality ratio.&lt;/p&gt;
&lt;h3 id="related-improvements"&gt;Related Improvements
&lt;/h3&gt;&lt;p&gt;Alongside the model swap, several other changes went in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Frontend action selection&lt;/strong&gt;: Users can now pick which actions to generate instead of getting all of them&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Backbone generation removed&lt;/strong&gt;: No longer needed with Wan 2.2&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;End pose generation removed&lt;/strong&gt;: Eliminated an unnecessary processing step&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Inter-action throttles removed&lt;/strong&gt;: No more artificial delays between action generations&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="character-generation-improvements"&gt;Character Generation Improvements
&lt;/h2&gt;&lt;h3 id="full-body-enforcement"&gt;Full-Body Enforcement
&lt;/h3&gt;&lt;p&gt;AI character generation sometimes produced only upper-body results. This caused inconsistent lower bodies across different actions. I updated the prompts to enforce full-body generation every time.&lt;/p&gt;
&lt;h3 id="reference-image-support"&gt;Reference Image Support
&lt;/h3&gt;&lt;p&gt;Users can now upload a reference image when generating characters. This is useful for creating variations of existing characters or matching a particular style.&lt;/p&gt;
&lt;h3 id="other-improvements"&gt;Other Improvements
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Broader image format support&lt;/strong&gt;: WebP, GIF, BMP, and TIFF uploads now accepted&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Background removal for uploads&lt;/strong&gt;: Uploaded character images can optionally have their background removed&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Media preview modal&lt;/strong&gt;: Click an emoji card to see it at full size&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Asset download links&lt;/strong&gt;: Direct download for generated assets&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="performance-optimization"&gt;Performance Optimization
&lt;/h2&gt;&lt;pre class="mermaid" style="visibility:hidden"&gt;flowchart LR
 subgraph Before["Sequential"]
 A1["Pose 1"] --&gt; A2["Pose 2"] --&gt; A3["Pose 3"]
 end
 subgraph After["Parallel"]
 B1["Pose 1"]
 B2["Pose 2"]
 B3["Pose 3"]
 end
 Before --&gt;|"sequential → parallel"| After&lt;/pre&gt;&lt;p&gt;Pose generation was changed from sequential to parallel. Startup delay and inter-action throttles were removed. End pose generation was eliminated entirely. The perceived speed improvement is significant.&lt;/p&gt;
&lt;h2 id="sam-21-interactive-background-removal"&gt;SAM 2.1 Interactive Background Removal
&lt;/h2&gt;&lt;h3 id="why-rembg-wasnt-enough"&gt;Why rembg Wasn&amp;rsquo;t Enough
&lt;/h3&gt;&lt;p&gt;In the &lt;a class="link" href="https://ice-ice-bear.github.io/en/posts/2026-04-07-popcon-dev3/" &gt;previous post&lt;/a&gt;, I implemented background removal with rembg. The quality issues were hard to ignore:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Inaccurate foreground boundaries on complex backgrounds&lt;/li&gt;
&lt;li&gt;Parts of the character getting clipped, or background artifacts remaining&lt;/li&gt;
&lt;li&gt;Fundamental limitation of fully automated approaches — the model can&amp;rsquo;t always tell what&amp;rsquo;s foreground&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="why-sam-21"&gt;Why SAM 2.1
&lt;/h3&gt;&lt;p&gt;Meta&amp;rsquo;s SAM 2.1 (Segment Anything Model) segments based on user-provided point prompts. Key advantages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Interactive&lt;/strong&gt;: Users indicate foreground/background directly, improving accuracy&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Runs on M1 Mac&lt;/strong&gt;: I initially considered cloud GPU options like RunPod, but confirmed SAM 2.1 runs well on M1 Mac via PyTorch&amp;rsquo;s MPS backend&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Easy integration&lt;/strong&gt;: Available through the &lt;code&gt;ultralytics&lt;/code&gt; package&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="architecture"&gt;Architecture
&lt;/h3&gt;&lt;pre class="mermaid" style="visibility:hidden"&gt;flowchart TB
 subgraph Frontend["Next.js /refine Page"]
 F1["Load frame image"]
 F2["SegmentCanvas component&amp;lt;br/&amp;gt;click to place points"]
 F3["Mask preview"]
 F4["Apply mask"]
 end
 subgraph Backend["FastAPI SAM2 Endpoints"]
 B1["GET /raw-frame&amp;lt;br/&amp;gt;serve original frames"]
 B2["POST /sam/predict&amp;lt;br/&amp;gt;points → mask prediction"]
 B3["POST /sam/apply&amp;lt;br/&amp;gt;apply mask → RGBA result"]
 end
 subgraph Model["SAMSegmenter Class"]
 M1["predict: point-based mask generation"]
 M2["apply_mask: mask → RGBA conversion"]
 M3["predict_and_apply_all&amp;lt;br/&amp;gt;batch process all frames"]
 end
 F1 --&gt; B1
 F2 --&gt; B2
 B2 --&gt; M1
 F4 --&gt; B3
 B3 --&gt; M2&lt;/pre&gt;&lt;h3 id="workflow-changes"&gt;Workflow Changes
&lt;/h3&gt;&lt;p&gt;Previously, the pipeline was fully automatic: video generation → frame extraction → background removal. With SAM, there&amp;rsquo;s now a user interaction step in the middle:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Video generation → frame extraction (worker stage 3 completes here)&lt;/li&gt;
&lt;li&gt;Status changes to &lt;code&gt;awaiting_refinement&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;User visits &lt;code&gt;/refine&lt;/code&gt; page and clicks to remove backgrounds&lt;/li&gt;
&lt;li&gt;Final asset generation after refinement&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I added the &lt;code&gt;awaiting_refinement&lt;/code&gt; status so the frontend can show a &amp;ldquo;waiting for background removal&amp;rdquo; state and display a Refine Backgrounds link. The ProgressTracker treats this status as generation-complete.&lt;/p&gt;
&lt;h3 id="implementation-details"&gt;Implementation Details
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Backend — SAMSegmenter class&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;predict&lt;/code&gt;: Takes click points, returns predicted masks&lt;/li&gt;
&lt;li&gt;&lt;code&gt;apply_mask&lt;/code&gt;: Applies a predicted mask to the original image, producing an RGBA result&lt;/li&gt;
&lt;li&gt;&lt;code&gt;predict_and_apply_all&lt;/code&gt;: Batch processes all frames&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Backend — API endpoints&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;GET /raw-frame&lt;/code&gt;: Serves original frame images&lt;/li&gt;
&lt;li&gt;&lt;code&gt;POST /sam/predict&lt;/code&gt;: Point-based mask prediction, returns RGBA mask&lt;/li&gt;
&lt;li&gt;&lt;code&gt;POST /sam/apply&lt;/code&gt;: Applies mask to frame&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Frontend — SegmentCanvas component&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Renders frame image on a canvas&lt;/li&gt;
&lt;li&gt;Captures click events to collect point coordinates&lt;/li&gt;
&lt;li&gt;Calls SAM API for mask preview&lt;/li&gt;
&lt;li&gt;Calls apply API on confirmation&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="commit-log"&gt;Commit Log
&lt;/h2&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Message&lt;/th&gt;
 &lt;th&gt;Changes&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;feat: replace VEO 3 with DashScope Wan 2.2 and remove backbone generation&lt;/td&gt;
 &lt;td&gt;Swap video generation model, remove backbone step&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;feat: pass selected action names from frontend to backend&lt;/td&gt;
 &lt;td&gt;Frontend action selection&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;fix: clear character preview when switching between upload and generate modes&lt;/td&gt;
 &lt;td&gt;Reset preview on mode switch&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;feat: add optional reference image support for AI character generation&lt;/td&gt;
 &lt;td&gt;Reference image upload&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;feat: support WebP, GIF, BMP, and TIFF image uploads&lt;/td&gt;
 &lt;td&gt;Broader format support&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;feat: add background removal option for uploaded character images&lt;/td&gt;
 &lt;td&gt;Background removal for uploads&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;perf: remove end pose generation and inter-action throttles&lt;/td&gt;
 &lt;td&gt;Remove unnecessary steps and delays&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;feat: enforce full-body character generation and add asset download links&lt;/td&gt;
 &lt;td&gt;Full-body enforcement, download links&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;fix: add media preview modal with close button to emoji cards&lt;/td&gt;
 &lt;td&gt;Media preview modal&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;perf: parallelize pose generation and eliminate startup delay&lt;/td&gt;
 &lt;td&gt;Parallel pose generation&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;docs: add SAM2 interactive background removal design spec&lt;/td&gt;
 &lt;td&gt;SAM2 design document&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;docs: add SAM2 interactive background removal implementation plan&lt;/td&gt;
 &lt;td&gt;SAM2 implementation plan&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;feat: add ultralytics SAM 2.1 dependency and sam_model config&lt;/td&gt;
 &lt;td&gt;Add SAM 2.1 dependency&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;feat: add awaiting_refinement status to models&lt;/td&gt;
 &lt;td&gt;New awaiting_refinement status&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;refactor: simplify process_video to extract-only (no bg removal)&lt;/td&gt;
 &lt;td&gt;Simplify video processing&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;refactor: worker stage 3 extracts frames only, ends at awaiting_refinement&lt;/td&gt;
 &lt;td&gt;Worker stage 3 stops at extraction&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;feat: add SAMSegmenter class with predict, apply_mask, predict_and_apply_all&lt;/td&gt;
 &lt;td&gt;Core SAMSegmenter implementation&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;feat: add SAM2 endpoints and raw frame serving to FastAPI&lt;/td&gt;
 &lt;td&gt;SAM2 API endpoints&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;feat: add SAM embed/predict/apply API functions&lt;/td&gt;
 &lt;td&gt;Frontend SAM API functions&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;feat: add SegmentCanvas click-to-segment component&lt;/td&gt;
 &lt;td&gt;Click-to-segment canvas component&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;feat: add /refine page for interactive SAM2 background removal&lt;/td&gt;
 &lt;td&gt;/refine page implementation&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;feat: add Refine Backgrounds link and awaiting_refinement status display&lt;/td&gt;
 &lt;td&gt;Refine link and status display&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;feat: treat awaiting_refinement as generation-complete in ProgressTracker&lt;/td&gt;
 &lt;td&gt;ProgressTracker status handling&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;fix: address code review findings&lt;/td&gt;
 &lt;td&gt;Code review fixes&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;merge: integrate main refactors with SAM2 interactive bg removal&lt;/td&gt;
 &lt;td&gt;Merge main refactors&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;merge: integrate main branch changes with SAM2 implementation&lt;/td&gt;
 &lt;td&gt;Merge main changes&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;fix: return RGBA mask from SAM predict endpoint&lt;/td&gt;
 &lt;td&gt;Fix SAM predict RGBA mask&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="next-steps"&gt;Next Steps
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;Improve UX for applying segmentation results across all frames at once&lt;/li&gt;
&lt;li&gt;Connect the final APNG/GIF asset generation pipeline&lt;/li&gt;
&lt;li&gt;Optimize SAM model loading for deployment environments&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;p&gt;&lt;em&gt;This is the fourth post in the PopCon series. More to come.&lt;/em&gt;&lt;/p&gt;</description></item></channel></rss>