Claude Opus 4.7 Released: What Web Developers Need to Know

Anthropic shipped Claude Opus 4.7 on April 16, 2026, and the headline number is hard to ignore: 70% on CursorBench, up from 58% for Opus 4.6. That is not a polish release. That is a model that resolves roughly three times more production engineering tasks on Rakuten-SWE-Bench than its predecessor, clears three Terminal-Bench 2.0 tasks no prior Claude could touch, and jumps from 54.5% to 98.5% on XBOW’s visual-acuity benchmark.

At TheBomb®, we have run Opus 4.6 across our dev workflow since late 2025 - planning, refactors, code review, the whole chain. So when Anthropic drops a new Opus tier the same day we are shipping three client builds, we pay attention. This is our read on what actually changed, what it costs, and where Claude Opus 4.7 earns its keep for web developers and agencies in 2026.

Pricing held steady at $5/M input and $25/M output tokens, matching Opus 4.6. The new API identifier is claude-opus-4-7, and it is live on Claude apps, the Anthropic API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. Anthropic’s release notes cover the full announcement.

What’s New in Claude Opus 4.7

The upgrade is broad rather than single-axis. Six things are meaningfully different:

Coding performance: double-digit lifts on internal and external benchmarks, with the largest gains on long-horizon engineering work.
Agent autonomy: first Claude to pass implicit-need tests on the Notion Agent eval; Devin reports coherent multi-hour sessions.
Vision: input images up to 2,576 px on the long edge (~3.75 MP), roughly 3× the previous cap.
New xhigh effort level: slotted between high and max, now the default for Claude Code plans.
/ultrareview slash command: deep code review inside Claude Code, with three free runs for Pro and Max users.
Task budgets (public beta): hard spend caps for agentic sessions so a runaway loop cannot nuke your month.

The tokenizer also changed - more on that below, because it matters for your bill.

Coding Benchmarks: The Numbers That Matter

Benchmarks are noisy, but the pattern here is consistent: Anthropic Opus 4.7 is better at writing, reviewing, and debugging real code under agentic conditions.

Benchmark	Opus 4.6	Opus 4.7	Source
CursorBench	58%	70%	Anthropic
Rakuten-SWE-Bench (production tasks resolved)	1×	3×	Anthropic
Terminal-Bench 2.0 (previously unsolved tasks)	0	3 cleared	Anthropic
Internal 93-task coding benchmark	baseline	+13% resolution	Anthropic
CodeRabbit code-review recall	baseline	+10%	Anthropic
XBOW visual acuity	54.5%	98.5%	Anthropic

That CodeRabbit recall lift deserves its own sentence. Code review is where most AI tools quietly under-deliver - they flag the obvious and miss the subtle. A 10% recall jump on an established review benchmark is exactly the kind of improvement that shows up as fewer regressions in production, which is the only metric that actually pays rent.

Vercel described Claude 4.7 coding as “phenomenal on one-shot coding tasks,” noting outputs that are more correct and more complete. Cursor called it “a meaningful jump in capabilities” with improved autonomy. Replit reported the same output quality at lower cost thanks to better bug detection. These are not marketing quotes from strangers - these are the teams whose products live or die on model quality.

Agent & Long-Horizon Autonomy: Can It Actually Finish the Job?

Short answer: closer than ever.

Opus 4.7 hit 0.813 on the General Finance eval of the Finance Agent benchmark, up from 0.767. On the research-agent benchmark it tied the top score across all six modules at 0.715. On Databricks’ OfficeQA Pro it made 21% fewer document reasoning errors, and on Harvey’s BigLaw Bench it posted 90.9% substantive accuracy - the kind of result that gets legal teams to stop white-knuckling every output.

The most interesting eval is the Notion Agent implicit-need test. Opus 4.7 is the first Claude model to pass it, and it posted a 14% improvement over 4.6 on multi-step Notion workflows. “Implicit need” means the model figured out what the user actually wanted, not just what they typed. For agent builders, that is the line between “useful demo” and “ship it.”

Devin reports that Opus 4.7 works coherently for hours and pushes through hard problems instead of bailing. Hex - the data-notebook company - says the model correctly reports missing data instead of fabricating a plausible answer. That second one is underrated. Hallucinated confidence is the single biggest reason teams pull AI out of production workflows.

“Catches logical faults during the planning phase and accelerates execution.” - Intuit engineering team, on Opus 4.7

Vision Improvements: 3.75 Megapixels and Why It Matters

Opus 4.7 accepts images up to 2,576 pixels on the long edge - roughly 3.75 megapixels, a 3×+ resolution lift over Opus 4.6. Paired with the 98.5% XBOW visual-acuity score (up from 54.5%), this is the first time we have trusted a frontier model to read a full-resolution design mock without losing the details.

For agencies, the use cases are immediate:

Drop a Figma export or a client’s PDF brand guide straight into the context, legible text and all.
Feed a whiteboard photo from a discovery session and get structured notes back.
Compare two design variations side-by-side and get specific pixel-level feedback instead of vibes.
Audit a competitor’s hero section visually instead of scraping and reconstructing the DOM.

At TheBomb® we have spent 12+ years building websites for Okanagan businesses, and the single most tedious part of the design handoff has always been translating pixel-perfect mocks into implementation tickets. A model that actually reads the mock changes the economics of that work. See how we put this into practice on our web design projects.

What Is the `xhigh` Effort Level, and When Should You Use It?

Anthropic slotted a new effort tier - xhigh - between high and max. On all Claude Code plans, xhigh is now the default. Auto mode also extended to Max users.

Here is the practical hierarchy:

low / medium: quick edits, scaffolding, formatting, doc polish.
high: feature work with clear scope, routine refactors, test writing.
xhigh (new default): multi-file refactors, tricky bug hunts, architectural decisions, the “I’ll be right back after standup” tasks.
max: research mode, novel problems, anything where you want the model to explore alternatives.

The /ultrareview slash command is the other one to actually use. It runs a thorough code-review pass inside Claude Code - Pro and Max users get three free runs per billing cycle. We run it before any merge to main on client projects. Once you have seen it catch a subtle off-by-one in a date picker that two humans missed, you will not ship without it.

What About the Tokenizer Change?

Here is the asterisk Anthropic wants you to see. Opus 4.7 ships with an updated tokenizer, and the same English input can map to 1.0× to 1.35× more tokens than it did on 4.6. Anthropic reports net-positive efficiency gains in their internal coding evals because the model also resolves tasks in fewer total turns, but your first month’s bill might spike anyway.

Mitigations that actually work:

Enable task budgets (public beta). Hard cap per session. Non-negotiable for autonomous agents.
Be ruthless about context. Every irrelevant file you paste in costs more than it used to.
Prompt for conciseness. Tell the model “respond in under 300 words unless asked for detail.” It listens.
Watch your output length. Output tokens cost 5× input. A verbose model is an expensive model.

For our development work, the net has been positive - tasks that took two Opus 4.6 turns often close in one 4.7 turn - but the spike is real on the first week if you do not tune for it.

Migration: What to Change in Your Prompts

Anthropic explicitly warns that Opus 4.7 follows instructions more strictly. If your 4.6 prompt worked because the model ignored a bad constraint, 4.7 will not save you. Retune.

Three concrete examples from our own playbooks:

Before (4.6): “Write a React component for a pricing table. Make it look good.”

After (4.7): “Write a React component for a pricing table using our existing Tailwind tokens in src/styles/tokens.css. Match the visual density of PricingCard.tsx. Do not invent new design tokens. Return only the component file.”

Before (4.6): “Refactor this function to be cleaner.”

After (4.7): “Refactor formatInvoice() to reduce cyclomatic complexity below 8. Preserve all existing return shapes. Add JSDoc. Do not change the function signature or touch other files.”

Before (4.6): “Review this PR.”

After (4.7): “Review this PR for: (1) Canadian spelling in user-facing copy, (2) missing aria-label on icon-only buttons, (3) any network call without an AbortController. Ignore styling.”

The stricter you are, the better the output. Vague prompts that coasted on 4.6’s willingness to fill gaps will underwhelm on 4.7.

What Opus 4.7 Means for Web Development and Agencies

For agencies shipping production sites, three things change this week:

Velocity on greenfield work. One-shot coding quality is high enough that scaffolding a new marketing page - content, schema, responsive layout, accessibility - is a single-prompt affair on our web design projects.
Review quality. /ultrareview plus the CodeRabbit recall lift means fewer regressions reach staging. Our QA time on recent Astro builds dropped noticeably.
SEO and content reasoning. Opus 4.7 is better at reading Google’s helpful-content guidance and applying it consistently across a site. If you are tightening your SEO strategy, this is a real upgrade.

It also shifts what you should expect from a web partner. Performance budgets, accessibility, and Core Web Vitals compliance are not heroic efforts anymore - they are table stakes. If your current agency is still quoting weeks for changes a modern Claude Code workflow closes in an afternoon, that is information.

What Claude Opus 4.7 Is NOT

Honest limits, because the marketing will not give you these:

It is still less broadly capable than Anthropic’s Mythos Preview on some cross-domain reasoning tasks. Opus 4.7 is the coding and agent specialist in the lineup.
It can still hallucinate on genuinely novel edge cases, especially when asked to invent API shapes that do not exist in your context.
Instruction literalism cuts both ways. Overconstrain the prompt and you get brittle output. Leave gaps and you get variance. The sweet spot is narrower than on 4.6.
Cyber capabilities were intentionally reduced via differential training compared to Mythos, with access for vetted researchers gated through Anthropic’s Cyber Verification Program. This is a feature, but if you are doing offensive security research, plan accordingly.

Safety profile is similar to 4.6 overall, with improved honesty, better prompt-injection resistance, and measurably lower deception and sycophancy. Anthropic’s model documentation has the full breakdown.

Ready to Build With Opus 4.7?

If you are a Vernon or Okanagan business wondering what any of this means for your site, the short version: websites built with a modern AI-assisted pipeline ship faster, review better, and regress less. That is not a sales pitch - it is what the benchmarks say, and it matches what we have watched happen in our own workflow.

Explore our development services, browse recent portfolio work, or read more from Cody New on where the web is heading. When you are ready to talk, get in touch.

Key Takeaways

Claude Opus 4.7 launched April 16, 2026 with 70% on CursorBench, 3× more Rakuten-SWE-Bench tasks resolved, and 98.5% on XBOW visual acuity.
The new xhigh effort level is the default on Claude Code and hits the sweet spot for multi-file refactors and non-trivial bug work.
The updated tokenizer can map the same input to 1.0–1.35× more tokens - use task budgets and tighter prompts to control spend.
Stricter instruction adherence means your 4.6 prompts will need retuning; specificity beats verbosity.
For agencies, Opus 4.7 features like /ultrareview and 3.75 MP vision change the economics of design handoff, code review, and shipping velocity.

Claude Opus 4.7 Released: What Web Developers Need to Know

What’s New in Claude Opus 4.7

Coding Benchmarks: The Numbers That Matter

Agent & Long-Horizon Autonomy: Can It Actually Finish the Job?

Vision Improvements: 3.75 Megapixels and Why It Matters

What Is the `xhigh` Effort Level, and When Should You Use It?

What About the Tokenizer Change?

Migration: What to Change in Your Prompts

What Opus 4.7 Means for Web Development and Agencies

What Claude Opus 4.7 Is NOT

Ready to Build With Opus 4.7?

Key Takeaways

Next up on the shelf.

Generative Engine Optimization (GEO) for Canadian Businesses: The 2026 Playbook to Get Cited by ChatGPT, Perplexity & Gemini

Google AI Mode in Canada: The 2026 Survival Guide for Small Businesses

Claude Opus 4.7 for Small Business Automation: Practical Guide

Call or text us.

What’s New in Claude Opus 4.7

Coding Benchmarks: The Numbers That Matter

Agent & Long-Horizon Autonomy: Can It Actually Finish the Job?

Vision Improvements: 3.75 Megapixels and Why It Matters

What Is the xhigh Effort Level, and When Should You Use It?

What About the Tokenizer Change?

Migration: What to Change in Your Prompts

What Opus 4.7 Means for Web Development and Agencies

What Claude Opus 4.7 Is NOT

Ready to Build With Opus 4.7?

Key Takeaways

Generative Engine Optimization (GEO) for Canadian Businesses: The 2026 Playbook to Get Cited by ChatGPT, Perplexity & Gemini

Google AI Mode in Canada: The 2026 Survival Guide for Small Businesses

Claude Opus 4.7 for Small Business Automation: Practical Guide

What Is the `xhigh` Effort Level, and When Should You Use It?