
“Gateways are all you need.” – Karan Sampath, Anthropic
❓ What You’ll Learn
- How does one OpenAI-shaped endpoint let you swap providers without a rewrite?
- How does a string-swap test separate gateway products from raw infra?
- Could native orchestration routing demote gateways from product to plumbing?
- Why is the signal sitting in gateway logs while production drifts?
- How does fast model refresh eat moats built on repetition and cache?
- How can a thin compliance layer turn an open router into a regulated wedge?
- How could labs pressure neutral routers that consolidate the long tail?
- Why does model fragmentation reward a neutral routing layer over proprietary SDK?
- If tokens go near-passthrough, where do margin and diligence actually shift?
- Which 3 squeezes hit independent gateways at once?
- Which looming compliance clock turns vertical pitches into real procurement?
- Why does one landmark security rollup hint at who buys gateways next?
💎 Why It Matters
OpenRouter pushes over 20 trillion tokens per week and no provider holds more than 23% of the volume.
The model layer just got a switching layer.
🔍 Problem
AI shipped 100+ frontier-grade models in twelve months.
Each model demands a new SDK, auth scheme, rate limit and contract review.
The cost of moving exceeded the cost of being slightly wrong.
💡 Solution
AI Gateway companies (OpenRouter, Cloudflare, Vercel and Portkey) route every major model through one API endpoint.
🏁 Players
Open Aggregators
- OpenRouter • The reference open aggregator with 300+ models behind one OpenAI-compatible endpoint.
Hyperscaler Gateways
- Cloudflare AI Gateway • Edge-distributed gateway with logs, caching, rate limiting and analytics plus unified billing and edge caching extensions in 2026.
- Vercel AI Gateway • One endpoint to hundreds of models with zero markup on tokens, $5/mo free credit per team and tight integration with the AI SDK.
Enterprise Security Gateways
- Portkey • Enterprise AI Gateway and control plane acquired by Palo Alto Networks in April 2026, now folded into Prisma AIRS as the AI Gateway layer with broad model and MCP tool coverage.
- Kong AI Gateway • Extension of Kong’s API management platform into AI traffic with throughput-oriented positioning versus other open proxies.
Generative Media Gateways
- Fal.ai • Reference generative-media gateway with image, video, audio and 3D models behind one fast endpoint.
🔮 Predictions
- Token markup will collapse to 5% or less across every major AI Gateway.
- Margin moves to caching, observability, evals, governance and integration depth.
- Vercel AI Gateway already prices tokens at zero markup with a $5/mo free tier.
- Once one well-funded player commits to passthrough, peers follow because customers won’t easily defend a 20-30% premium over bring-your-own keys.
- AI Gateways will become a default feature inside every major cloud platform.
- Cloudflare and Vercel already ship AI Gateway as a built-in feature.
- AWS Bedrock and Azure AI Foundry are converging on the same shape.
- The competitive cost of not shipping a gateway rises every month as customers route inference through whichever cloud already speaks the protocol.
- At least one frontier lab will ship a first-party multi-provider AI Gateway.
- Karan Sampath’s Anthropic talk reads like a roadmap.
- Frontier labs see gateways as both a threat (commoditization) and opportunity (the developer relationship).
- The lab that ships a gateway routing to its competitors earns trust to be the default routing layer when its own model is the right answer.
☁️ Opportunities
- Sell AI Gateway migration sprints. Offer gateway migration in 30 days for a fixed price.
- Route mapping, observability backfill, billing reconciliation, security review and regression test on production traffic.
- First two engagements come from your own network. Build a repeatable migration playbook from those and turn it into a productized service. The Respan migration guide is one publicly visible example of how the work gets scoped.
- Ship a cross-gateway cost dashboard for ops and finance teams.
- Help to learn which gateway and which model you are over-spending on and by how much.
- The data already exists in customers’ own logs. The gateways have no incentive to flag cheaper alternatives.
- Read gateway logs. Compare models using OpenRouter rankings: “You spent $X on GPT-5 last month. The same prompts on Claude Opus 4.7 would have cost 0.6X with comparable scores on your eval set.”
- Run continuous evals across gateway fleets scoped for inference procurement teams.
- Show how a model swap saved 30% on cost, yet quality dropped just by 12%.
- Every gateway logs requests. Almost none run continuous quality regression tests on the production stream.
- Plug a CLI or SDK into any gateway’s request stream, define golden datasets and LLM-as-judge rubrics and run them on a sample of production traffic. Adjacent products likeBraintrust andLangSmith ship the eval primitives but stay tied to a single framework or SDK.
🏔️ Risks
- Frontier Defection • A frontier lab ships its own multi-provider gateway and the third-party gateway value prop collapses overnight for the developer market.
- Hyperscaler Squeeze • AWS, Azure, GCP, Cloudflare and Vercel ship gateways native to their clouds, leaving independents fighting on geographic reach and integration depth they won’t easily match.
- Margin Collapse • Vercel’s zero-markup pricing forces the rest of the category to passthrough; the historical parallel is Apigee getting squeezed before its $625M Google sale.
🔑 Key Lessons
- Categories form when supply fragments faster than buyers can adapt. Below 25% single-vendor share is the trigger zone. The same forcing function that made API Gateways viable in 2015 just produced AI Gateways in 2026.
- The string-swap test draws the category boundary. If the user changes one parameter to switch vendors, the product is an AI Gateway. If the user has to write a Dockerfile, the product is AI Infrastructure. The two markets compete on different surfaces.
🔥 Hot Takes
- Cache hit rate is the new MRR. The gateways winning on cache today are extracting margin the rest won’t easily catch.
- OpenRouter’s biggest threat is OpenAI shipping its own multi-provider gateway. Frontier labs will route to competitors when their own model isn’t the right answer because that earns the developer trust to be the routing layer when it is.
😠 Haters
“Isn’t this just another wrapper with no real moat?“
The wrapper is the easy part. The moat is the cache, the policy library, the audit logs, the compliance certifications and the integration depth into customer workloads.Vercel’s zero-markup pricing tells you the pure-aggregation layer has no moat. The companies that survive are the ones building the feature layer above it.
“Centralizing prompts through a gateway turns every vendor into the same subprocessor story. One breach or bad retention policy and my customer data is exfiltrated at scale.”
That risk is real and it’s why enterprises buy gateways with private deployment, region pinning, field-level redaction and immutable audit trails. The honest tradeoff is fewer bespoke SDK integrations vs a smaller set of chokepoints that you can monitor.
“This is a single point of failure on someone else’s models. One outage and the whole thing breaks.”
A direct OpenAI integration breaks during an OpenAI outage. A gateway with automatic failover routes to Anthropic, Google or Tencent’s Hy3 Preview during the same outage. The gateway is the resilience layer.
“You’re only hearing from the survivors. 3 other gateways shut down in 2025.”
The category-defining acquisition of PANW-Portkey, the 400% year-over-year token growth on the leading aggregator and the hyperscaler entries (Cloudflare, Vercel, AWS and Azure) all happened in 2026. The graveyard of 2025 is real. The category in 2026 looks materially different.
🔗 Links
- 100 Trillion Token Study • Real usage on which models get volume and how much traffic flows through OpenRouter.
- Router and Load Balancing • How the open-source proxy spreads load, retries after failures and routes across deployments when you run it yourself.
- Billions of Logs and AI Gateway at Scale • Why proxying model calls creates the logs teams lean on to debug issues and watch spend.
📈 Want the full picture?
Why will revenue per token routed be the metric VCs ask first by 2028?
How does Palo Alto announcing intent to acquire Portkey on April 30, 2026 tee up Trends Pro’s forecast of ≥3 acquisitions involving banks, telcos or defense primes?
Which $5M+ ARR vertical gateway thesis hardens once EU AI Act high-risk deadlines hit August 2, 2026?
Why could LangGraph / MCP / Mastra / CrewAI native routing demote gateways from headline product to plumbing?
How does an 18–36 month trust-and-certification lag quietly favor hyperscalers in every RFP scorecard?
What does one-click SOC 2, HIPAA, EU AI Act and ISO 42001 reporting from gateway logs look like when the team is literally two people?
Why might a Shopify App Store–shaped plugin aisle beat launching another bare OpenAI-compatible router?
Why will prompt cache hit rate replace latency as the headline benchmark once buyers compare upstream bypass rates, not milliseconds?
Trends Pro has the answers. Plus 17 players, 6 predictions, 6 opportunities and 10 links.
Get Weekly Reports
Join 54,000+ founders and investors
📈 Unlock Pro Reports, 1:1 Intros and Masterminds
Become a Trends Pro Member and join 1,200+ founders enjoying…
🧠 Founder Mastermind Groups • To share goals, progress and solve problems together, each group is made up of 6 members who meet for 1 hour each Monday.
📖 160+ Trends Pro Reports • To make sense of new markets, ideas and business models, check out our research reports.
💬 1:1 Founder Intros • Make new friends, share lessons and find ways to help each other. Keep life interesting by meeting new founders each week.
💲 100k+ Startup Discounts • Get access to $100k+ in startup discounts on AWS, Twilio, Webflow, ClickUp and more.
Brought to you by the team behind HeadsUp














