<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Spot Instances on ICE-ICE-BEAR-BLOG</title><link>https://ice-ice-bear.github.io/tags/spot-instances/</link><description>Recent content in Spot Instances on ICE-ICE-BEAR-BLOG</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Wed, 22 Apr 2026 00:00:00 +0900</lastBuildDate><atom:link href="https://ice-ice-bear.github.io/tags/spot-instances/index.xml" rel="self" type="application/rss+xml"/><item><title>RunPod Spot vs On-Demand — When the 50% Discount Is Worth the Interruption</title><link>https://ice-ice-bear.github.io/posts/2026-04-22-runpod-spot-vs-ondemand/</link><pubDate>Wed, 22 Apr 2026 00:00:00 +0900</pubDate><guid>https://ice-ice-bear.github.io/posts/2026-04-22-runpod-spot-vs-ondemand/</guid><description>&lt;img src="https://ice-ice-bear.github.io/" alt="Featured image of post RunPod Spot vs On-Demand — When the 50% Discount Is Worth the Interruption" /&gt;&lt;h2 id="overview"&gt;Overview
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://www.runpod.io/blog/spot-vs-on-demand-instances-runpod" target="_blank" rel="noopener"
 &gt;RunPod&amp;rsquo;s blog post &amp;ldquo;Spot vs. On-Demand Instances&amp;rdquo;&lt;/a&gt; is short but it&amp;rsquo;s exactly the right framing for a choice most people make badly. Spot instances cost about half what on-demand costs for the same GPU, but can be interrupted without notice. Whether that&amp;rsquo;s a win or a disaster depends entirely on a single property of your workload: &lt;strong&gt;can it checkpoint and resume?&lt;/strong&gt;&lt;/p&gt;
&lt;pre class="mermaid" style="visibility:hidden"&gt;graph TD
 W["GPU workload"] --&gt; Q1{"Can it checkpoint&lt;br/&gt;and resume?"}
 Q1 --&gt;|"yes"| Q2{"Is time-to-finish&lt;br/&gt;critical?"}
 Q1 --&gt;|"no"| OD["On-Demand&lt;br/&gt;always"]
 Q2 --&gt;|"yes"| OD
 Q2 --&gt;|"no"| Spot["Spot&lt;br/&gt;~50% cheaper"]
 Spot --&gt; Note1["Workloads that fit:&lt;br/&gt;- training runs&lt;br/&gt;- batch inference&lt;br/&gt;- fine-tuning with checkpoints"]
 OD --&gt; Note2["Workloads that need OD:&lt;br/&gt;- interactive notebooks&lt;br/&gt;- user-facing inference&lt;br/&gt;- jobs with tight SLAs"]&lt;/pre&gt;&lt;h2 id="the-pricing-reality"&gt;The Pricing Reality
&lt;/h2&gt;&lt;p&gt;RunPod&amp;rsquo;s example from the post: an A6000 at &lt;strong&gt;$0.232/gpu/hour on spot&lt;/strong&gt; versus &lt;strong&gt;$0.491/gpu/hour on-demand&lt;/strong&gt;. The discount is consistent at roughly 50% across most SKUs — RTX 4090, A100, H100 — though the exact delta fluctuates with availability. The math is clean: a 24-hour training run at $0.491 costs $11.78 on-demand; on spot, $5.57. Over a month of heavy training, this is the difference between $353 and $167.&lt;/p&gt;
&lt;p&gt;The pricing is attractive enough that the question isn&amp;rsquo;t &lt;em&gt;whether&lt;/em&gt; to use spot — it&amp;rsquo;s &lt;em&gt;which&lt;/em&gt; workloads can tolerate interruption.&lt;/p&gt;
&lt;h2 id="the-interruption-contract"&gt;The Interruption Contract
&lt;/h2&gt;&lt;p&gt;The key line from the post: &lt;em&gt;&amp;ldquo;Spot instances can be interrupted without notice, while on-demand instances are non-interruptible.&amp;rdquo;&lt;/em&gt; Compared to AWS EC2 Spot, RunPod Spot is &lt;strong&gt;harsher&lt;/strong&gt; — AWS gives you a 2-minute warning before termination; RunPod may not. In practice, this means:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;You cannot rely on graceful shutdown handlers&lt;/strong&gt; to save state. The instance can disappear between two lines of code.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Persistent volume storage is the contract.&lt;/strong&gt; Whatever is in the pod&amp;rsquo;s ephemeral disk at the moment of interruption is gone; whatever is on the attached volume survives.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Checkpoint frequency is a cost/reliability knob.&lt;/strong&gt; Checkpoint every minute and you waste compute writing checkpoints; checkpoint every hour and a preemption at minute 55 costs you 55 minutes.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="workloads-that-are-a-good-fit"&gt;Workloads That Are a Good Fit
&lt;/h2&gt;&lt;p&gt;Per the post and augmented with production experience:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Training runs with automatic checkpointing.&lt;/strong&gt; Anything that uses PyTorch Lightning&amp;rsquo;s &lt;code&gt;ModelCheckpoint&lt;/code&gt;, Hugging Face&amp;rsquo;s &lt;code&gt;Trainer(save_steps=...)&lt;/code&gt;, or a custom checkpoint-every-N-steps loop. If the training loop can resume from the last checkpoint without losing more than a minute or two, spot is almost always correct.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Batch inference over a large dataset.&lt;/strong&gt; Checkpoint progress by persisting the list of completed items to the attached volume. On preemption, a new pod reads the list and picks up where the old one left off. The classic embarrassingly-parallel batch job.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fine-tuning with snapshotted optimizer state.&lt;/strong&gt; LoRA fine-tunes on a 7B model generally take hours and naturally produce intermediate checkpoints. Spot preempts → relaunch → resume from last checkpoint. The total wall time increases, but the cost drops in half.&lt;/p&gt;
&lt;h2 id="workloads-that-need-on-demand"&gt;Workloads That Need On-Demand
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Interactive Jupyter notebooks.&lt;/strong&gt; Nobody wants to lose their mid-experiment state. The post captures this: &lt;em&gt;&amp;ldquo;No one wants to be interrupted in the middle of their flow if you&amp;rsquo;re experimenting in a Jupyter notebook.&amp;rdquo;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;User-facing inference.&lt;/strong&gt; If a real user is waiting for a response, you can&amp;rsquo;t preempt the worker mid-request. PopCon&amp;rsquo;s GPU worker is exactly this shape — a user clicks &amp;ldquo;generate&amp;rdquo; and expects a response within seconds.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Jobs with tight SLAs.&lt;/strong&gt; If missing a 4-hour deadline has a business cost, spot&amp;rsquo;s unpredictable wall-clock time is a risk. The dollar savings don&amp;rsquo;t cover the deadline risk.&lt;/p&gt;
&lt;h2 id="a-hidden-third-option-serverless"&gt;A Hidden Third Option: Serverless
&lt;/h2&gt;&lt;p&gt;The post doesn&amp;rsquo;t cover it, but RunPod &lt;strong&gt;Serverless&lt;/strong&gt; is a meaningful third category. Serverless handles the pool management for you — instances are warmed, kept idle until a request arrives, and paid-per-second of actual execution. It&amp;rsquo;s neither spot nor on-demand in the traditional sense, but it solves the same problem spot solves (don&amp;rsquo;t pay for idle GPU) with a different mechanism (managed pool + per-request billing).&lt;/p&gt;
&lt;p&gt;When to choose which:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Workload&lt;/th&gt;
 &lt;th&gt;Best fit&lt;/th&gt;
 &lt;th&gt;Reason&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Interactive notebook&lt;/td&gt;
 &lt;td&gt;On-demand Pod&lt;/td&gt;
 &lt;td&gt;Can&amp;rsquo;t tolerate interruption&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;User-facing inference (low QPS)&lt;/td&gt;
 &lt;td&gt;Serverless&lt;/td&gt;
 &lt;td&gt;Scale-to-zero, no cold start penalty for warm endpoints&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;User-facing inference (high QPS)&lt;/td&gt;
 &lt;td&gt;On-demand Pod&lt;/td&gt;
 &lt;td&gt;Consistent latency, predictable cost at scale&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Training run (checkpointed)&lt;/td&gt;
 &lt;td&gt;Spot&lt;/td&gt;
 &lt;td&gt;~50% cost savings, interruption is recoverable&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Batch inference&lt;/td&gt;
 &lt;td&gt;Spot&lt;/td&gt;
 &lt;td&gt;Embarrassingly parallel, easy to checkpoint&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Fine-tuning&lt;/td&gt;
 &lt;td&gt;Spot&lt;/td&gt;
 &lt;td&gt;Checkpoints are natural in the workflow&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="the-practical-rule"&gt;The Practical Rule
&lt;/h2&gt;&lt;p&gt;The post&amp;rsquo;s framing in one sentence: &lt;em&gt;&amp;ldquo;use spot instances when things are well automated, or when the workload just isn&amp;rsquo;t that important and you can take a gamble. Use on-demand instances if you need the guarantee that your work won&amp;rsquo;t be stopped.&amp;rdquo;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This is correct but leaves out the practical engineering rule: &lt;strong&gt;you get spot-grade savings only if you&amp;rsquo;ve already built checkpoint/resume.&lt;/strong&gt; If you haven&amp;rsquo;t, the effective cost of spot is on-demand plus your time to rebuild the experiment when a preemption destroys it. Factor your own hourly rate into the savings calculation.&lt;/p&gt;
&lt;h2 id="insights"&gt;Insights
&lt;/h2&gt;&lt;p&gt;The spot/on-demand/serverless triangle is the right way to think about GPU cloud costs today. Too many teams default to on-demand for everything and then complain about GPU bills. The failure mode on the other side — defaulting to spot without checkpointing — is equally bad. The decisive question is always: &lt;strong&gt;what happens if this instance dies in the next 60 seconds?&lt;/strong&gt; If the answer is &amp;ldquo;we resume from last checkpoint,&amp;rdquo; go spot. If the answer is &amp;ldquo;we lose an experiment / a user sees an error,&amp;rdquo; go on-demand or Serverless. Build the checkpoint layer once — it pays for itself in the first training run where spot halves your bill.&lt;/p&gt;</description></item></channel></rss>