<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Mimir on ICE-ICE-BEAR-BLOG</title><link>https://ice-ice-bear.github.io/tags/mimir/</link><description>Recent content in Mimir on ICE-ICE-BEAR-BLOG</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Tue, 07 Apr 2026 00:00:00 +0900</lastBuildDate><atom:link href="https://ice-ice-bear.github.io/tags/mimir/index.xml" rel="self" type="application/rss+xml"/><item><title>Hybrid Image Search Dev Log #10 — OTel Metrics Dashboard and Pipeline Performance Optimization</title><link>https://ice-ice-bear.github.io/posts/2026-04-07-hybrid-search-dev10/</link><pubDate>Tue, 07 Apr 2026 00:00:00 +0900</pubDate><guid>https://ice-ice-bear.github.io/posts/2026-04-07-hybrid-search-dev10/</guid><description>&lt;img src="https://ice-ice-bear.github.io/" alt="Featured image of post Hybrid Image Search Dev Log #10 — OTel Metrics Dashboard and Pipeline Performance Optimization" /&gt;&lt;p&gt;In &lt;a class="link" href="https://ice-ice-bear.github.io/en/posts/2026-04-06-hybrid-search-dev9/" &gt;the previous post: hybrid-image-search dev log #9&lt;/a&gt;, we integrated OpenTelemetry tracing with Grafana Cloud Tempo. This time, we added metrics collection to build resource usage dashboards and optimized the performance bottlenecks we discovered through trace analysis.&lt;/p&gt;
&lt;h2 id="commit-log-for-this-session"&gt;Commit Log for This Session
&lt;/h2&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th style="text-align: center"&gt;Order&lt;/th&gt;
 &lt;th style="text-align: center"&gt;Type&lt;/th&gt;
 &lt;th&gt;Description&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td style="text-align: center"&gt;1&lt;/td&gt;
 &lt;td style="text-align: center"&gt;feat&lt;/td&gt;
 &lt;td&gt;Add OTel metrics export for pipeline resource dashboards&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: center"&gt;2&lt;/td&gt;
 &lt;td style="text-align: center"&gt;docs&lt;/td&gt;
 &lt;td&gt;Add observability section to README and fix dashboard metric names&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: center"&gt;3&lt;/td&gt;
 &lt;td style="text-align: center"&gt;perf&lt;/td&gt;
 &lt;td&gt;Reduce CPU/RAM spikes in generation pipeline&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: center"&gt;4&lt;/td&gt;
 &lt;td style="text-align: center"&gt;perf&lt;/td&gt;
 &lt;td&gt;Move S3 and Pylette ops to thread executor&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: center"&gt;5&lt;/td&gt;
 &lt;td style="text-align: center"&gt;perf&lt;/td&gt;
 &lt;td&gt;Add 2-minute timeout to Gemini API calls&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="background-traces-without-metrics"&gt;Background: Traces Without Metrics
&lt;/h2&gt;&lt;p&gt;After setting up OTel tracing in #9, we could see individual request spans in Grafana Cloud Tempo. But one piece was missing — &lt;strong&gt;resource usage&lt;/strong&gt;. Traces show &amp;ldquo;how long each function took&amp;rdquo; but not &amp;ldquo;how much CPU/RAM spiked at that moment.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Running the image generation pipeline on a t3.medium (2 vCPUs, 4GB RAM) felt sluggish, but we had no data on exactly where resources were being consumed.&lt;/p&gt;
&lt;h2 id="step-1-adding-otel-metrics-export"&gt;Step 1: Adding OTel Metrics Export
&lt;/h2&gt;&lt;h3 id="observability-pipeline-architecture"&gt;Observability Pipeline Architecture
&lt;/h3&gt;&lt;pre class="mermaid" style="visibility:hidden"&gt;flowchart LR
 App["FastAPI App &amp;lt;br/&amp;gt; hybrid-image-search"]
 OTelSDK["OTel SDK &amp;lt;br/&amp;gt; Traces + Metrics"]
 Tempo["Grafana Cloud &amp;lt;br/&amp;gt; Tempo"]
 Mimir["Grafana Cloud &amp;lt;br/&amp;gt; Mimir"]
 Dashboard["Grafana &amp;lt;br/&amp;gt; Dashboard"]

 App --&gt;|instrument| OTelSDK
 OTelSDK --&gt;|OTLP gRPC| Tempo
 OTelSDK --&gt;|OTLP gRPC| Mimir
 Tempo --&gt;|trace query| Dashboard
 Mimir --&gt;|PromQL| Dashboard&lt;/pre&gt;&lt;p&gt;Previously, we were only sending traces to Tempo. We added a &lt;strong&gt;metrics exporter&lt;/strong&gt; to send CPU utilization, memory usage, and per-stage pipeline duration to Grafana Cloud Mimir (a Prometheus-compatible long-term storage backend).&lt;/p&gt;
&lt;p&gt;Grafana Mimir extends Prometheus&amp;rsquo;s TSDB into a distributed architecture. Grafana Cloud provides it as a managed service, so you just configure the OTLP endpoint and start querying with PromQL.&lt;/p&gt;
&lt;h3 id="pipeline-resource-usage-dashboard"&gt;Pipeline Resource Usage Dashboard
&lt;/h3&gt;&lt;p&gt;Key panels from the dashboard:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;CPU Usage (%)&lt;/strong&gt; — momentary spikes to 80-90% during pipeline execution&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory Usage (MB)&lt;/strong&gt; — sharp RAM increases during Pylette color extraction&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pipeline Stage Duration&lt;/strong&gt; — time per stage (Gemini call, S3 upload, color extraction)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The problem became clear: a single image generation was nearly saturating the CPU.&lt;/p&gt;
&lt;h2 id="step-2-identifying-performance-bottlenecks"&gt;Step 2: Identifying Performance Bottlenecks
&lt;/h2&gt;&lt;p&gt;Correlating Grafana Tempo traces with the new resource dashboard revealed a pattern:&lt;/p&gt;
&lt;pre class="mermaid" style="visibility:hidden"&gt;flowchart TD
 subgraph Before["Before — Sequential Execution"]
 direction TB
 G1["Gemini API Call &amp;lt;br/&amp;gt; Image Generation"]
 S1["S3 Upload &amp;lt;br/&amp;gt; Sync Blocking"]
 P1["Pylette Color Extraction &amp;lt;br/&amp;gt; CPU Intensive"]
 G1 --&gt; S1 --&gt; P1
 end

 subgraph After["After — Async Separation"]
 direction TB
 G2["Gemini API Call &amp;lt;br/&amp;gt; Image Generation"]
 S2["S3 Upload &amp;lt;br/&amp;gt; thread executor"]
 P2["Pylette Color Extraction &amp;lt;br/&amp;gt; thread executor"]
 G2 --&gt; S2
 G2 --&gt; P2
 end

 Before -.-&gt;|optimization| After&lt;/pre&gt;&lt;h3 id="three-bottlenecks-found"&gt;Three Bottlenecks Found
&lt;/h3&gt;&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;S3 upload was synchronously blocking&lt;/strong&gt; — &lt;code&gt;boto3&lt;/code&gt;&amp;rsquo;s &lt;code&gt;upload_fileobj&lt;/code&gt; was blocking the entire async event loop. Other requests stalled in turn.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pylette color extraction was CPU-intensive&lt;/strong&gt; — extracting dominant colors from images consumed significant CPU, also running synchronously on the main thread.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No timeout on Gemini API calls&lt;/strong&gt; — intermittently, responses would never arrive, leaving requests in an infinite wait state.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="step-3-applying-optimizations"&gt;Step 3: Applying Optimizations
&lt;/h2&gt;&lt;h3 id="moving-s3-and-pylette-to-thread-executor"&gt;Moving S3 and Pylette to Thread Executor
&lt;/h3&gt;&lt;p&gt;Since FastAPI is &lt;code&gt;asyncio&lt;/code&gt;-based, CPU-intensive or blocking I/O tasks should run in a separate thread via &lt;code&gt;asyncio.to_thread()&lt;/code&gt; or &lt;code&gt;loop.run_in_executor()&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Before: blocking the event loop&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;s3_client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;upload_fileobj&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;colors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;extract_colors&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# After: offloaded to thread executor&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_thread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s3_client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;upload_fileobj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;colors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_thread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;extract_colors&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This way, the event loop can handle other requests while S3 uploads or color extraction are in progress.&lt;/p&gt;
&lt;h3 id="2-minute-timeout-for-gemini-api"&gt;2-Minute Timeout for Gemini API
&lt;/h3&gt;&lt;p&gt;We added a 120-second timeout to Gemini API calls using &lt;code&gt;asyncio.wait_for&lt;/code&gt;. We also checked rate limits and costs on Google AI Studio — when a connection hangs with no response, wasted server resources are a bigger concern than billing.&lt;/p&gt;
&lt;p&gt;We searched for &amp;ldquo;gemini_semaphore&amp;rdquo; patterns too, but concurrency control via semaphore was already in place. The issue wasn&amp;rsquo;t concurrency — it was &lt;strong&gt;indefinite waiting&lt;/strong&gt; on individual calls.&lt;/p&gt;
&lt;h2 id="results"&gt;Results
&lt;/h2&gt;&lt;p&gt;Improvements confirmed on the dashboard:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Metric&lt;/th&gt;
 &lt;th&gt;Before&lt;/th&gt;
 &lt;th&gt;After&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Peak CPU during pipeline&lt;/td&gt;
 &lt;td&gt;~90%&lt;/td&gt;
 &lt;td&gt;~50%&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Event loop blocking time&lt;/td&gt;
 &lt;td&gt;Entire S3 upload duration&lt;/td&gt;
 &lt;td&gt;Near zero&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Max wait for unresponsive requests&lt;/td&gt;
 &lt;td&gt;Unlimited&lt;/td&gt;
 &lt;td&gt;120 seconds&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="additional-research-locust-load-testing"&gt;Additional Research: Locust Load Testing
&lt;/h2&gt;&lt;p&gt;We also explored Locust load testing tutorials. Current optimizations target single-request performance, but as concurrent users increase, we need to precisely measure the limits of a t3.medium instance. The plan is to run Locust load tests in the next session and establish a scaling strategy.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary
&lt;/h2&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Topic&lt;/th&gt;
 &lt;th&gt;Summary&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;OTel Metrics&lt;/td&gt;
 &lt;td&gt;CPU/RAM/pipeline metrics sent to Grafana Cloud Mimir&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Resource Dashboard&lt;/td&gt;
 &lt;td&gt;Pipeline Resource Usage dashboard for bottleneck visualization&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Performance Optimization&lt;/td&gt;
 &lt;td&gt;S3, Pylette moved to thread executor to unblock event loop&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Gemini Timeout&lt;/td&gt;
 &lt;td&gt;2-minute timeout to prevent indefinite waits&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Next Steps&lt;/td&gt;
 &lt;td&gt;Locust load testing, scaling strategy&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;</description></item></channel></rss>