Blog /
Your WordPress AI connector dies on the 4th click: the 30,000 token limit nobody warned you about
You pasted an Anthropic key into the new WordPress 7.0 Connectors screen. You turned on three AI capabilities. You ran a bulk re-tag job on eight posts.
The first three posts re-tagged fine.
The fourth post hung. The fifth threw a generic “AI request failed” toast. The sixth wiped the half-finished edit you had open in another tab.
You just hit the 30,000-input-tokens-per-minute cliff. WordPress did not warn you. The plugin did not warn you. Anthropic’s dashboard will show you the spike about 90 seconds later, after the damage.
The problem: 30,000 tokens per minute is smaller than it sounds
Anthropic’s default rate limit (Tier 1, what every new API key starts on) is 30,000 input tokens per minute on Claude Sonnet. 50,000 on Claude Haiku. You stay at Tier 1 until your organization has deposited $400 cumulatively. New keys, side-project keys, and most freelancer keys live here forever.
To put 30,000 input tokens in human scale:
- A 1,500-word blog post is roughly 2,000 tokens.
- A system prompt for a moderately-instructed plugin is 2,000 to 4,000 tokens.
- A single Claude call that reads one medium-length post, plus its tool definitions, plus the system prompt, plus conversation history, lands between 8,000 and 12,000 tokens.
That math means three Claude Sonnet calls in the same 60-second window puts you over the cap. Not three sessions. Three calls. A multi-turn editor session is three calls in 20 seconds.
The wire response when you cross it is a 429 with a retry-after header. The WordPress 7.0 AI Client surfaces that header. Most plugins built against the AI Client do not honor it yet.
What hitting the cliff actually looks like
Picture a real Tuesday afternoon.
You opened wp-admin to fix a typo. While you’re there, your SEO plugin’s nightly cron fires a “summarize and re-tag” job across eight posts. Each one ships about 3,000 tokens of post body and a 2,000-token system prompt. No prompt caching configured. That batch alone is 40,000 input tokens inside 45 seconds.
Post one through three: fine.
Post four: 429. The cron logs it as “provider error” and moves on.
Then your form plugin’s auto-reply capability fires on a new contact submission. Same connector, same Anthropic organization key. 429. The visitor sees a generic “we’ll get back to you” message; the AI-personalized reply they were supposed to receive never sends.
Then you click “Generate description” on a product. 429. This one you see. The button just spins.
You retry. 429. You retry again. 429.
The Anthropic Console will tell you what happened in about 90 seconds. By then you have explained it to a client.
The cliff is not abusive workloads. The cliff is three normal plugins sharing one key.
Why this lands in your lap, not the plugin author’s
The WordPress 7.0 Connectors API is a real improvement. Site owners paste one key at Settings → Connectors. Plugins declare capabilities. Core handles the wiring. Anthropic, OpenAI, and Google all slot in as default providers.
What it does not do today:
- No built-in token-budget throttle. Soft warnings only. Nothing in core stops a request after a 429.
- No prompt-cache convention. Plugins construct prompts however they want. Most will not think about cache breakpoint placement.
- Plugin-to-plugin contention. Every capability backed by the same connector pulls from the same Anthropic organization. Your SEO plugin and your form plugin share the 30,000.
- Default model is Sonnet. The lower of the two ceilings. Plugins asking for “smart” capabilities land on the tighter cap by default.
So when the cliff hits, WordPress logs “provider error.” The plugin author blames the provider. The provider hands you a retry-after header you never saw. The bill for the bad UX lands on you.
Why trust this writeup
We hit this exact wall shipping Riff, the in-admin AI editor inside DesignSetGo Apps Pro. Iteration 4 of a single editor session, 429, session dead, user staring at a broken screen.
The last week of commits has been about getting back under the cap and staying there: bounded tool results, message-history compaction, four-breakpoint cache placement, partial-work snapshots so a 429 stops becoming a total loss. The notes below are field-tested, not theoretical. Receipts in the next section.
DesignSetGo Apps does not hold your Anthropic key. Your site’s WordPress 7.0 Connectors screen does. We read your connector’s quota, surface it inside the app authoring flow, and refuse to ship configurations that we can predict will 429 on day one. That’s the angle we have on this problem. The fixes below work whether you use DesignSetGo Apps or not.
The five things that keep you off the cliff
1. Cap every tool result before it hits message history
The model’s perspective is “I called a tool, I got content, I can use it.” The wire’s perspective is “that content rides along on every future call in this session.”
In Riff, an unranged read_file on a 30,000-token bundle ate the cap on iteration 2. We now hard-truncate any unranged read to 150 lines or 6,144 bytes, whichever hits first, and tell the model to re-read with start_line / end_line if it needs more. Explicit ranges bypass the cap, because the model declared what it actually needs.
For WP AI Client plugin authors: same rule applies to get_post_content, search_posts, read_media_caption_batch, anything you let a tool fetch. Cap the per-call size. Put the cap in the tool description so the model can plan around it.
2. Compact stale tool results out of history
A 6,000-token read from iteration 1 does not need to ride along on iteration 5. The model used it once. Replaying it is pure cost.
Riff walks the message history once per iteration. Any read_file result followed by an assistant turn gets elided: metadata (path, bytes, line count) stays, content gets replaced with “call read_file again if you need this.” Write-shaped tools (which return diff summaries) stay intact.
This pattern stopped our per-iteration input from growing linearly with N.
3. Place your four cache breakpoints like you mean it
Anthropic allows 4 ephemeral cache_control breakpoints per request. Use all 4:
system prompt ← breakpoint 1 (largest static block)
tools[last] ← breakpoint 2 (whole tool array up to here)
messages[0] (user) ← breakpoint 3 (brand context, prior_files snapshot,
edit workflow)
messages[last] (assist) ← breakpoint 4 (conversation history through
previous turn)
Cached tokens do not count toward ITPM on most models (Claude Haiku 3.5 is the exception, Haiku 4.5 is fine), and they bill at roughly 10% of base input rate. With an 18,000-token prefix cached at breakpoint 1, your effective per-turn budget is whatever changes after that breakpoint, typically a few hundred to a few thousand tokens. The 30,000 ceiling stops being scary.
Two ways to lose this:
- Breakpoint instability. One byte that changes inside the cached prefix invalidates the cache for every concurrent session sharing it. Pin a baseline fixture in tests. Refuse to merge prompt changes without re-pinning.
- Per-model cache scoping. Haiku’s cache does not warm Sonnet’s. If you route some calls to Haiku for cheapness, you pay two cold prefixes. Commit to one model per workflow, or budget for the warm-up.
WP AI Client plugin authors: the AI Client abstraction may not expose cache_control directly today. Check your provider plugin. If you cannot place breakpoints, keep prompts short enough that 30,000 uncached is livable. Long prompts without caching is the unsafe combination.
4. Send less context on every turn
Riff’s “previous app” block used to inline every file under 1KB. On a CSS-heavy app, that was 5,000 to 8,000 tokens before the model decided which file to look at.
We rewrote it tree-only. Only the manifest is inlined (it carries the description, identifiers, routes). Every other file becomes path + size + line count. The model pulls content via read_file on the files it actually wants. One-time cost for the file being edited, instead of an every-turn cost for the whole bundle.
The general rule: before you send a token, ask whether the model needs it this turn, or just might need it eventually. “Might need eventually” content belongs behind a tool call.
5. Snapshot partial work so a 429 is not a total loss
This one is not about reducing tokens. It is about surviving the failure when you trip the cap anyway.
Riff snapshots working files as a partial_envelope after every tool dispatch. If a run errors anywhere (validate cap, iteration cap, provider error, rate_limited), we ship whatever the model last produced. The user gets a “partial” banner, the work is preserved, the next turn includes context about what stopped the previous one.
If a 429 blanks an entire edit session and the user starts over, you created two problems: the rate-limit incident, and the trust hit. The trust hit is the bigger one.
The short checklist for site owners
If you just added an Anthropic key to your WordPress 7.0 Connectors screen:
- Check what tier the key is in. New keys are Tier 1: 30,000 ITPM on Sonnet. You stay there until $400 in cumulative deposits.
- Set a customer-set spend cap in the Anthropic Console below your tier ceiling. Catch surprise spend at the provider, not the plugin.
- Do not share an API key across unrelated sites. They share the budget and the rate limit.
- Watch the Anthropic Console Usage chart for the first week. It plots hourly max uncached input tokens per minute. If you are routinely brushing the limit, upgrade tiers or constrain the plugins doing the spending.
- Do not turn on every AI capability by default. The Connectors screen makes it tempting. The 30,000 cap does not care about your good intentions.
The short checklist for plugin authors on WP AI Client
- Cap any tool result that returns user content. Make the cap a constant the model can see in the tool description.
- Honor
retry-afteron 429. The WP AI Client response surface includes rate-limit headers. Do not swallow them. Do not retry in a tight loop. - Prefer Claude Haiku for routing, classification, short summaries. 50,000 ITPM at Tier 1 vs Sonnet’s 30,000, and significantly cheaper per token.
- Keep prompts stable. Identical prefix bytes is what makes prompt caching work. A timestamp or user ID interpolated into the top of your system prompt defeats the cache for every site using your plugin.
- Do not fan out from a single hook.
save_posttriggering four plugins’ AI calls in parallel is the classic Tier 1 killer. Debounce. - Surface the limit clearly. “Anthropic Tier 1 rate limit (30,000 input tokens / minute) reached. Retry available in 18 seconds.” beats “AI request failed.”
- Document the spend tier you assume. If your batch capability needs Tier 2, say so on the settings screen.
- Snapshot partial progress. Long-running jobs checkpoint between AI calls so a 429 loses one item, not the whole batch.
What we wish were in core
Two things would change the shape of this problem:
- Hard budget enforcement, not soft warnings. A plugin that has burned its per-hour ITPM allocation should be unable to make the call, not warned and allowed through.
- A shared per-organization token bucket that every plugin on a connector draws from, with fair-share scheduling. The only way to stop “plugin A starves plugin B” without every plugin author reimplementing rate-limiting.
Neither exists in core today. The next-best thing is the discipline above: cache aggressively, bound your tool results, honor retry, route to Claude Haiku where you can, plan for partial.
See the cliff before it sees you
DesignSetGo Apps reads your WordPress 7.0 Connector’s tier and quota, then refuses to ship app configurations we can predict will 429 on day one. You see “this ai.prompt loop will burn 38,000 input tokens per minute on a heavy day, your connector is capped at 30,000” before you publish, not after a client calls.
The 30,000-token cliff is not going to move. You can keep building like it is not there, and find out the same way we did, on iteration 4 of a session you cannot afford to lose. Or you can ship with the rails on.
See how DesignSetGo Apps surfaces your connector’s quota