<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Ai Training on ICE-ICE-BEAR-BLOG</title><link>https://ice-ice-bear.github.io/tags/ai-training/</link><description>Recent content in Ai Training on ICE-ICE-BEAR-BLOG</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Thu, 16 Apr 2026 00:00:00 +0900</lastBuildDate><atom:link href="https://ice-ice-bear.github.io/tags/ai-training/index.xml" rel="self" type="application/rss+xml"/><item><title>Fuzzy Canary — A Clever Anti-AI Scraping Trap Using Hidden NSFW Links</title><link>https://ice-ice-bear.github.io/posts/2026-04-16-fuzzy-canary/</link><pubDate>Thu, 16 Apr 2026 00:00:00 +0900</pubDate><guid>https://ice-ice-bear.github.io/posts/2026-04-16-fuzzy-canary/</guid><description>&lt;img src="https://ice-ice-bear.github.io/" alt="Featured image of post Fuzzy Canary — A Clever Anti-AI Scraping Trap Using Hidden NSFW Links" /&gt;&lt;h2 id="overview"&gt;Overview
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://github.com/vivienhenz24/fuzzy-canary" target="_blank" rel="noopener"
 &gt;vivienhenz24/fuzzy-canary&lt;/a&gt; (268 stars) is a TypeScript npm package that takes a creative social-engineering approach to the AI scraping arms race. Instead of trying to block scrapers technically, it plants invisible links to pornographic websites in your HTML. When AI training pipelines crawl the page, their content safety filters detect the NSFW links and flag the entire page for exclusion from training data.&lt;/p&gt;
&lt;h2 id="how-it-works"&gt;How It Works
&lt;/h2&gt;&lt;pre class="mermaid" style="visibility:hidden"&gt;flowchart LR
 A["Scraper visits page"] --&gt; B["Finds hidden&amp;lt;br/&amp;gt;NSFW links"]
 B --&gt; C["Content safety&amp;lt;br/&amp;gt;filter triggered"]
 C --&gt; D["Page excluded&amp;lt;br/&amp;gt;from training"]&lt;/pre&gt;&lt;p&gt;The mechanism is straightforward: AI training pipelines universally have content safety filters. If a scraper encounters NSFW links on a page, it flags the entire page as unsafe and excludes it from the training dataset. Fuzzy Canary exploits this by embedding invisible links that humans never see but scrapers always find.&lt;/p&gt;
&lt;h2 id="usage"&gt;Usage
&lt;/h2&gt;&lt;p&gt;Installation is simple:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;npm i @fuzzycanary/core
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;There are two modes of operation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Server-side (recommended)&lt;/strong&gt;: Use the React component &lt;code&gt;&amp;lt;Canary /&amp;gt;&lt;/code&gt; in your root layout. The links are injected at render time.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Client-side&lt;/strong&gt;: Auto-init script that injects links after page load.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The server-side approach is recommended because client-side injection may not be picked up by scrapers that don&amp;rsquo;t execute JavaScript.&lt;/p&gt;
&lt;h2 id="caveats"&gt;Caveats
&lt;/h2&gt;&lt;p&gt;The main trade-off is SEO impact. The hidden links are injected for &lt;strong&gt;all visitors&lt;/strong&gt;, including legitimate search engine crawlers like Googlebot. While the links are invisible to users, search engines may still index them and potentially penalize the page. This is a real consideration for production sites that depend on search traffic.&lt;/p&gt;
&lt;h2 id="takeaway"&gt;Takeaway
&lt;/h2&gt;&lt;p&gt;Fuzzy Canary is a clever &amp;ldquo;poor-man&amp;rsquo;s solution&amp;rdquo; that turns AI companies&amp;rsquo; own safety mechanisms against them. It won&amp;rsquo;t stop determined scrapers with custom pipelines, but it raises the cost of scraping for those using standard training infrastructure. A creative entry in the ongoing arms race between content creators and AI training data collection.&lt;/p&gt;</description></item></channel></rss>