A Framework for Running Pricing Experiments

By Swiftools Team · Published April 28, 2025 · 8 min read

A workspace with calculator and a laptop on a desk

Pricing is the lever in SaaS with the highest leverage and the least experimentation. A 10% improvement in price usually beats a 10% improvement in conversion rate, retention, or acquisition cost - and yet most teams set their prices once based on a gut feel and don't touch them for years.

Part of the reason is that pricing experiments are scary in a way that landing-page A/B tests aren't. Charging different customers different prices feels unfair. Public price changes feel irreversible. Telling existing customers their plan is going up feels like the start of a churn event. All of these fears are real, and all of them have known ways around them. This post is the practical framework.

The mistake almost every team makes

Treating pricing as a single decision instead of a series of experiments.

A pricing page is, structurally, three to five interrelated decisions: the price points themselves, the packaging (what's in each tier), the gates (what forces an upgrade), the billing period structure, and the discount/trial mechanics. Most teams change all five at once when they "redo pricing," then have no idea which change moved the metrics.

The fix is to separate them. Each variable gets its own experiment. Each experiment has a single hypothesis. You change one thing at a time, measure, then move on.

Pre-flight: what should never change

Two non-negotiables before any pricing experiment:

Existing customers don't get worse pricing without warning and a long timeline. If you're raising prices, existing customers grandfather in for at least 12 months, or you give them 90 days' notice and an opt-out. Surprise increases on existing customers produce churn spikes that take quarters to recover from.

You don't run different prices on people for whom price comparison is a fast Google search. If your audience is small and connected (developer tools, niche B2B), they will compare notes within a week and discover the test. This doesn't kill testing, but it means you should test by cohort (e.g. all sign-ups after May 1 see the new price) rather than by user-level random assignment.

Cohort tests vs A/B tests, and when each works

Cohort tests: change the price on a specific date. Everyone who signs up before sees the old price; everyone after sees the new. Compare conversion, ARPU, retention across the two cohorts.

Pros: no fairness concerns, no risk of cross-pollination, simple to implement in any billing system.

Cons: confounded with everything else that changed between the two periods (marketing campaigns, seasonality, product changes). Needs a long observation window before you trust the result.

A/B tests: at sign-up, randomly assign each user to see price A or price B. Compare the same metrics.

Pros: controls for confounding variables, faster to a statistically significant result.

Cons: feels unfair if it leaks, harder to implement (your billing logic now needs per-user prices), and existing tooling for pricing A/B tests is bad.

For most small teams, do cohort tests, accept the slower signal, and be conservative about which differences you treat as real.

The hierarchy of what's worth testing first

Approximately in order of leverage:

  1. The headline price point. Try a 25% higher anchor on your main tier. This is the single highest-leverage test almost every SaaS could run. Most are underpriced.
  2. The trial length and trial mechanics. 14 vs 30 days. Credit card required vs not. Free tier vs free trial. Each of these changes conversion materially.
  3. The packaging - what's gated by tier. Moving one feature from the lower tier to the higher tier can shift the upgrade rate dramatically without changing any price.
  4. Annual vs monthly default. Defaulting to annual (with a clear monthly option) typically lifts cash-collected per customer by 15-30% with no change in headline price.
  5. The number of tiers. Three is the magic number for B2B SaaS - the "decoy" effect in the middle tier reliably drives upgrade. Two-tier pricing leaves money on the table; four-plus creates decision paralysis.

Gates vs softcaps

The mechanics of how you enforce a tier limit matters more than people think.

A gate stops the user cold. "You've used your 50 included submissions this month. Upgrade to continue." This is high-conversion but high-friction; it can produce frustration and churn.

A softcap charges overage instead. "You've used 50 included submissions; further submissions are $0.10 each, billed monthly." This is low-friction but low-conversion - users keep using and never upgrade.

A nudged softcap warns at 80%, shows the upgrade prompt repeatedly, but lets work continue. This is often the sweet spot - the user doesn't feel punished, but they do feel the friction enough to consider the upgrade.

For a price-sensitivity test, the gate mechanism is more important to standardize than the price itself. Two tiers at the same price point with different enforcement will produce different upgrade rates, and you might attribute the difference to the wrong cause.

Discount and coupon mechanics

The cleanest pricing experiments are often disguised as discount experiments. The published price doesn't change; the offered discount does. Stripe makes this trivial via coupons applied at checkout. The test:

  • Variant A: 25% off first 3 months at signup.
  • Variant B: 50% off first 1 month at signup.
  • Variant C: No discount, lower headline price by 15%.

All three move the effective price the user pays in year one similarly, but they convert and retain differently. Variants with longer discount periods tend to convert better but retain worse (the cliff at month 3 produces churn). The clean version (lower headline price) produces lower top-line conversion but better LTV.

When to interview instead of test

Before running any pricing experiment, run pricing interviews with 10-15 prospects (per our interview guide). The questions:

  • "What did you pay for the last tool you adopted in this category?" (anchors them to actual past behavior)
  • "At what price would this be expensive but you'd still buy?" (Van Westendorp's price-sensitivity meter has variations on this)
  • "At what price would it be cheap enough that you'd question the quality?"
  • "Walk me through the approval process to get this expensed at your company."

The last question, in B2B contexts, is usually the most important. If your price needs manager approval at one number and CFO approval at a higher number, those are the real cliff points - not anything you'd discover from a quantitative test.

Patrick Campbell's team at Price Intelligently (now ProfitWell) has written extensively on quantitative pricing research; the methodology there is more robust than the quick interview version, but for teams under $5M ARR the interview approach is usually sufficient and dramatically cheaper.

How long to wait before you trust a pricing test

Conversion rate signal can show up within a week if your traffic is high enough; retention signal takes 3-6 months. The mistake is calling a pricing test based on conversion alone.

A 20% price increase that drops conversion 15% looks neutral by day 7 (slightly more revenue per visitor) but might be a win by day 180 (the smaller cohort retains better because price-insensitive customers churn less). Or the opposite - the higher price keeps the high-value customers out and you bleed slowly.

The discipline: don't make a final pricing decision based on a test that hasn't run for at least one full retention cycle. For SMB SaaS that's typically 90 days; for prosumer/B2C, 30 days is usually enough.

The measurement plan template

Every pricing experiment should have these defined upfront:

  • Hypothesis: one sentence, with a directional prediction.
  • Primary metric: revenue per visitor (RPV), measured over the test window.
  • Guardrails: activation rate, retention at 30/60/90 days, support ticket volume.
  • Decision threshold: "we'll adopt the new price if RPV is at least 10% higher with retention not worse than 5pp lower."
  • Test duration: minimum 30 days for conversion, 90 days for retention call.
  • Rollback plan: if guardrails fail, exactly how do we revert.

Most teams skip writing this down, and then six weeks later argue about whether the test "worked." Writing it down before the test runs eliminates the post-hoc rationalization problem.

The boring conclusion

Pricing experiments are not, structurally, different from any other product experiment. Define a hypothesis. Pick one variable. Build a clean test. Define what success looks like before you start. Wait long enough to see retention signal. Decide. Move on.

The reason teams treat pricing as special is that the social and emotional weight feels bigger - charging customers more feels like a moral act in a way that changing a button color doesn't. It isn't. The customers who'll pay more are the ones who get more value; charging them less is leaving money on the table, which means underinvesting in the product they like. The framework above is the tool to find out where the right number is.

Sources & Further Reading