BeInfi Blog
5 Pricing Models for AI Agents That Actually Work
Back to blog
Pricing AI Agents Strategy Monetization

5 Pricing Models for AI Agents That Actually Work

I
Infi Pulse Team
| | 6 min read

Setting the right price for an AI agent is one of the most underestimated challenges developers face. Charge too much and nobody uses it. Charge too little and you’re paying to work.

Let’s analyze 5 models that real companies are using successfully.

First Things First: Understand Your Costs

Before setting any price, calculate your cost per request:

Cost per request =
  Token cost (input + output) +
  Infra cost (hosting/request) +
  Context cost (embeddings, RAG) +
  Error margin (10-20%)

Example for a chatbot with GPT-4o:
- Input: ~800 tokens * $0.0025/1K = $0.002
- Output: ~400 tokens * $0.01/1K = $0.004
- Infra: ~$0.001/request
- Context (RAG): ~$0.001/request
= $0.008 per request

With this base cost, let’s get to the models.

1. Per-Token Pricing

How it works: Charges directly for the amount of tokens consumed.

Who uses it: OpenAI, Anthropic, Google AI

Example:
- Input tokens: $0.005 / 1K tokens
- Output tokens: $0.015 / 1K tokens

User consuming 100K input + 30K output/month:
= (100 * $0.005) + (30 * $0.015) = $0.50 + $0.45 = $0.95/month

Pros:

  • Perfect alignment between usage and cost
  • Transparent: user knows exactly what they’re paying for
  • Scales naturally with usage

Cons:

  • Hard to predict cost (for the user)
  • Can discourage usage (fear of high bill)
  • Requires precise token tracking

Ideal for: APIs, developer-focused platforms, high volume.

Implementation with Infinitum:

import { Pulse } from '@beinfi/pulse-sdk'
import { pulseMiddleware } from '@beinfi/pulse-sdk/ai'

const pulse = new Pulse(process.env.PULSE_API_KEY!)

// Create separate meters for input and output
const middleware = pulseMiddleware({
  pulse,
  customerId: user.id,
  meters: {
    input: 'input_tokens',   // $0.005/1K
    output: 'output_tokens', // $0.015/1K
  },
})

2. Per-Request Pricing

How it works: Charges a fixed amount per call, regardless of size.

Who uses it: Many AI wrappers, vertical chatbots

Example:
- $0.05 per message
- or $0.10 per document analysis
- or $1.00 per report generation

User making 200 requests/month:
= 200 * $0.05 = $10/month

Pros:

  • Simple to understand
  • Predictable for the user
  • Easy to implement

Cons:

  • Short and long requests cost the same (unfair)
  • Can have negative margin on heavy requests
  • Less granular

Ideal for: Consumer products, simple chatbots, discrete actions.

Implementation with Infinitum:

// Tracking per request (fixed value per call)
const pulse = new Pulse(process.env.PULSE_API_KEY!)

// After each request:
await pulse.metering.track({
  meterId: 'requests',
  customerId: user.id,
  value: 1,
})

3. Tiered Pricing (Plans with Limits)

How it works: Fixed plans with usage limits. Overage charged separately.

Who uses it: ChatGPT Plus, Jasper, Copy.ai

Example:
+------------+----------+---------------+------------+
| Plan       | Price    | Limit         | Overage    |
+------------+----------+---------------+------------+
| Free       | $0       | 50 msgs/day   | Blocked    |
| Pro        | $20/mo   | 2,000 msgs    | $0.02/msg  |
| Business   | $99/mo   | 15,000 msgs   | $0.01/msg  |
| Enterprise | Custom   | Unlimited     | N/A        |
+------------+----------+---------------+------------+

Pros:

  • Familiar: users understand plans
  • Predictable revenue (fixed base + upside)
  • Natural upsell (free -> pro -> business)
  • Easy to communicate on a landing page

Cons:

  • Complexity in billing logic
  • Users get frustrated when hitting limits
  • Need to define the “right” limits (trial and error)

Ideal for: B2C products, SaaS with plan-based growth, freemium apps.

4. Credit-Based Pricing

How it works: User buys credits upfront and spends them as they use.

Who uses it: Midjourney, RunwayML, many generative AI apps

Example:
- Starter Pack: 500 credits for $10
- Pro Pack: 2,500 credits for $40 (20% discount)
- Business Pack: 10,000 credits for $120 (40% discount)

Costs per action:
- Simple message: 1 credit
- Document analysis: 5 credits
- Image generation: 10 credits
- Full report: 25 credits

Pros:

  • Positive cash flow (user pays before using)
  • Natural gamification (remaining credits)
  • Flexible: different actions cost different credits
  • No surprise bills

Cons:

  • Can feel manipulative (obfuscating real price)
  • Expired credits = frustration
  • More complex to implement than per-token

Ideal for: Visual/creative products, mobile apps, consumer market.

5. Hybrid Pricing (Base + Usage)

How it works: Fixed monthly subscription + overage charges.

Who uses it: Vercel, AWS, most cloud providers

Example:
- Base: $29/month (includes 1M tokens)
- Overage input: $0.003/1K tokens
- Overage output: $0.008/1K tokens

Average user (1.5M tokens/month):
= $29 + (500K * $0.005/1K) = $29 + $2.50 = $31.50/month

Heavy user (10M tokens/month):
= $29 + (9M * $0.005/1K) = $29 + $45 = $74/month

Pros:

  • Predictable base revenue
  • Unlimited upside with heavy users
  • Fair: who uses more, pays more
  • Better LTV than pure usage-based

Cons:

  • More complex to communicate
  • Need to define “included usage” right
  • Invoice can vary (uncomfortable for some)

Ideal for: B2B SaaS, developer platforms, products with variable usage.

Which Model to Choose?

Quick flowchart:

Is your product an API?
  -> Yes -> Per-Token Pricing
  -> No  |

Are your users developers?
  -> Yes -> Hybrid Pricing (base + usage)
  -> No  |

Does your product have discrete actions (generate image, analyze doc)?
  -> Yes -> Credit-Based
  -> No  |

Do you want maximum simplicity?
  -> Yes -> Tiered Pricing with plans
  -> No  -> Per-Request Pricing

Price Benchmarks by Vertical

+------------------------+------------------+-------------+
| Vertical               | Common Model     | Avg ARPU    |
+------------------------+------------------+-------------+
| AI Chatbot (B2C)       | Tiered/Credits   | $15-30/mo   |
| AI Coding Assistant    | Hybrid           | $20-50/mo   |
| AI Writing Tool        | Credits          | $25-60/mo   |
| AI Data Analysis       | Per-request      | $50-200/mo  |
| AI API Platform        | Per-token        | $100-1K/mo  |
| AI Customer Support    | Hybrid           | $200-500/mo |
+------------------------+------------------+-------------+

Final Tips

  1. Start simple: Per-token or per-request. You can always add complexity later.
  2. Generous free tier: Let people try before they pay.
  3. Transparency: Show exactly how the price is calculated.
  4. Monitor margin: Review prices monthly vs actual provider costs.
  5. A/B test: Test different models with small cohorts before scaling.

The right pricing model can be the difference between a side project and a $1M ARR business.