AI coding agents, compared like a developer would.

botspot.dev tracks the tools developers actually evaluate in 2026: Cursor, Copilot, Windsurf, Claude Code, Codex, Devin, OpenHands, MCP, and the workflows that survive real code review.

See the comparison hub →What we cover ↓

# what we're watching
→Copilot's June usage-based billing and who should switch to flat-rate editor tools
→Cursor vs Copilot vs Windsurf now that pricing and workflow fit matter more than novelty
→Claude Code and Codex splitting into reliable CLI execution layers, not editor replacements
→Devin, OpenHands, and the actual review overhead behind autonomous AI engineering
→Open-source coding agents like Aider, Cline, and Hermes Agent as budget-control plays
→Agent benchmarks that measure accepted outcomes instead of polished demos
→MCP and A2A becoming the protocol boundary most teams should separate on purpose

What We Cover

The bots, models, and workflows actually shaping how people build with AI.

⌨

Coding Agents

Cursor, Windsurf, GitHub Copilot, Claude Code, Codex, and the tools developers actually compare before they change how a team ships code.

💬

Conversational AI

Claude, ChatGPT, Gemini, Llama, and the model layer that still determines how much review work the tooling creates.

⚙

Autonomous Pipelines

Devin, OpenHands, LangGraph, CrewAI, and the orchestration patterns that either compress delivery or multiply supervision cost.

🔓

Open Source & Local

Hermes, Mistral, Llama, and the open-model options that matter when cost control, sovereignty, or offline workflows are the real requirement.

What's Happening

Quick takes on the coding tools, workflow shifts, and protocol changes developers are actually evaluating this month

Start here · Stack map

The AI coding-agent constellation: the useful 2026 map is editor, CLI, autonomy, and protocol layer

If your shortlist already includes Cursor, Copilot, Windsurf, Claude Code, Codex, Devin, OpenHands, and MCP, stop looking for one winner. This guide maps where each layer actually fits and where the cleanup debt starts.

Updated July 2026

This week · CLI Agents

OpenCode vs Codex CLI: 161K GitHub stars vs GPT-5.6 Sol — two different bets on who owns the model

OpenCode (model-agnostic Go binary from SST) versus Codex CLI (GPT-5.6 Sol default, OpenAI-managed). Not the same product with different names — two different operator philosophies for terminal coding agents.

July 2026

This week · Copilot

Copilot SKILL.md vs Claude Code's 500+ skills — the workflow configuration gap is real

Copilot's SKILL.md gives you GitHub-native skill governance in plain Markdown. Claude Code ships 500+ public skills with granular tool access and multi-agent delegation. Which setup fits your team depends on where your workflow actually lives.

July 2026

This week · Claude Code

Claude Code v2.1.200: the idle subagent fix and Sonnet 5 session changes that actually matter

The July release fixes the most trust-breaking multi-agent bug — subagents silently vanishing from the panel mid-run — and cleans up Sonnet 5 session tracing. If you are running parallel background agents, deploy this promptly.

July 2026

This week · Open Source

Hermes Agent hit #1 on OpenRouter with 271B tokens — here is what that actually signals

NousResearch's Hermes Agent topped OpenRouter's global rankings. That is not marketing noise — it is a data point about where open-weight coding agent adoption is going, and why teams are routing serious workloads through it.

July 2026

This week · BYOK

Continue.dev after the Cursor acquisition: what teams that chose BYOK control should do next

Continue was the cleanest answer for developers who wanted model portability inside VS Code without buying into a managed editor. Now the real question is how much independence survives under Cursor ownership.

July 2026

This week · Claude Code

Claude Managed Agents now run on cron — public beta lets agents execute on a schedule without human prompting

Anthropic's July public beta adds scheduled execution to Managed Agents: set a cron timer, grant CLI tool access and authenticated service credentials, and the agent runs unattended. Real use cases and the security questions you need to answer first.

July 2026

Policy · GitHub Copilot

GitHub started using your Copilot code in April 2026 — here is how to opt out and what the policy actually covers

Since April 24, GitHub uses Copilot inputs, outputs, and code snippets for model training unless you or your org admin disables it. The steps to opt out, what data is in scope, and why enterprise teams should check their org policy setting now.

Updated July 2026

This week · GitHub Copilot

Copilot added GPT-5.6 model lanes and better parallel session ergonomics — here is how to route work without blowing up review time

GitHub's July changelog is not just a model update: it changes workflow design. We break down where multi-model routing improves delivery and where unmanaged parallel sessions create expensive cleanup.

July 2026

This week · Industry

Anthropic's 2026 Agentic Coding Report maps eight trends — how many match your actual experience?

The report is sharp on multi-agent coordination and SDLC transformation. It is quieter on review overhead, the real cost of supervision, and the gap between Terminal-Bench performance and production reliability on legacy codebases.

July 2026

This week · Ecosystem

The AI coding-agent constellation is now a stack design problem

Fresh signals from Anthropic, OpenAI Codex, and Cursor all point in the same direction: teams are standardizing by workflow layer, not betting on one universal coding agent.

July 2026

This week · Claude Code

Claude Code's background agent view is a supervision model upgrade, not just a UX feature

Parallel background agents, unattended execution, and jump-in supervision. The workflow changes are real — and so are the failure modes when task scope is loose. This is what terminal-agent maturity looks like in practice.

June 2026

This week · Codex

GPT-5.6 Sol is now the default Codex model — what changed on July 9 and what to check before you scale

OpenAI flipped Codex to GPT-5.6 Sol on July 9. If you have acceptance tests calibrated on GPT-5.4 outputs, check for drift before scaling. Token throughput is higher — so are the spending implications if budgets are not set explicitly.

July 2026

This week · Cursor

Cursor 1.7 adds Agent Hooks and Team Rules — this is what IDE-level agent governance looks like

Cursor 1.7 ships Hooks (beta) for custom agent control scripts, Team Rules for org-wide BugBot policies, and Agent Autocomplete. These are not features for solo devs — they are the tools platform teams need before standardizing on Cursor at scale.

July 2026

This week · Hermes

Hermes from Nous Research is earning serious attention in BYOK coding stacks

Hermes 3 has become the default open-weight recommendation for developers who want capable instruction following without frontier API pricing. Pairs well with Cline, opencode, and Continue.dev.

June 2026

This week · Ornith

Ornith coding models are the coding-specific fine-tuned family worth tracking

Small team, clear focus: Ornith optimizes for agentic context maintenance and tool-use consistency rather than general benchmark scores. Belongs in any serious open-weight evaluation matrix.

June 2026

This week · Open Models

Open-source coding models are now a real shortlist item for developer BYOK stacks

GLM-5.1, Kimi K2.6, and Mistral Large 3 changed the benchmark conversation from hobbyist curiosity to practical routing and cost control for teams that own their model layer.

July 2026

This week · CLI Agents

Claude Code vs Codex CLI is now a workflow design problem, not a model popularity contest

Latest research and changelog signals reinforce the split-stack reality: pair IDE speed with terminal execution, then measure intervention rate before scaling autonomous runs.

July 2026

Start With the Comparison Hub

If you are actively choosing tools, start with the pages that frame the real tradeoffs instead of the marketing categories.

Editors

Cursor vs Copilot vs Windsurf

The editor decision is no longer just about code completion quality. It is about context handling, billing predictability, and how much repair work lands on reviewers.

Read the editor comparison →

CLI agents

Claude Code vs Codex CLI

Both tools are good enough now that workflow shape matters more than brand loyalty. We map where each one belongs in a production engineering loop.

Read the CLI comparison →

Stack design

The AI coding-agent constellation

Use this when the real question is how Cursor, Copilot, Claude Code, Codex, Devin, OpenHands, Hermes, and MCP fit together without multiplying review debt.

Read the stack map →

Autonomous

Devin vs OpenHands

Autonomous coding is where pricing headlines and benchmark claims drift furthest from real implementation cost. Start here before buying the “AI software engineer” pitch.

Read the autonomous comparison →

Need the full map? Open the comparison hub for routing across editors, CLI agents, autonomous workers, open-source stacks, and pricing guides.

Choose the layer that is actually failing first

The useful AI coding agent decision is usually about the workflow constraint: editor fit, terminal execution, monorepo context, security review, or budget control.

Everyday editor loop

Cursor vs Copilot vs Windsurf

Best when the argument is really about AI-native editing versus GitHub-native workflow versus flat-rate pricing inside the IDE.

Route the editor decision →

Terminal execution

Claude Code, Codex, Aider, and Cline

Use this lane when the real requirement is bounded multi-file work with tests, shell commands, and rollback discipline.

Compare the CLI agents →

Large codebases

Context windows are not enough for monorepos

See how Cursor, Copilot, Claude Code, Cline, and Aider behave once the repository is big enough that hidden conventions matter more than raw context size.

See the large-codebase guide →

Security and review

Generated code still fails on boring security work

When the pain is weak auth checks, hallucinated APIs, or thin tests, the right page is the one about review defaults, not vendor marketing.

Read the security guide →

Budget control

Copilot metering changed the shortlist math

Use the cost pages when premium-request billing, flat-rate editor seats, and open-source operator overhead are all part of the real decision.

See the cost breakdowns →

Named-tool map

Need the whole constellation in one place?

Browse the full tool directory when your shortlist already includes Cursor, Copilot, Windsurf, Claude Code, Codex, Devin, OpenHands, MCP, or LangGraph.

Open the tools directory →

Browse the named tools, not just the categories

When you already know the products on your shortlist, the faster route is a tool directory that points you to the right comparison or deep-dive without forcing a generic “AI agent” detour.

Editors / IDEs

Cursor, Copilot, Windsurf, JetBrains AI, Zed, Replit

Use the directory when the real question is which editor surface fits your repo, review culture, and budget model.

CLI agents

Claude Code, Codex CLI, Aider, Continue, Cline

We route terminal-first workflows separately because bounded execution, visible approvals, and test discipline matter more than chat polish.

Autonomous / stack layer

Devin, OpenHands, MCP, A2A, LangGraph, CrewAI, AutoGen

The directory keeps autonomous products, protocols, and orchestration frameworks in view so you can judge where extra coordination really pays off.

Open the tools directory for named-tool routing across editor agents, CLI agents, autonomous workers, protocols, and frameworks.

Developer Priority Brief (July 2026)

The fastest way to stay current this week: act on deadlines, protocol boundaries, and cost controls.

Immediate

Treat provider deprecations as migration drills

If your workflows still depend on aging Claude model lines, run cutover rehearsals now with rollback criteria and explicit ownership.

Use the migration playbook →

Architecture

Split MCP context plumbing from A2A delegation

Teams that keep these lanes separate are getting cleaner traces, easier debugging, and fewer orchestration surprises in production.

See the protocol stack model →

Ops

Track cost per accepted outcome, not token headlines

Model pricing chatter is noisy; what matters is intervention rate, rework, and total cycle-time on accepted results.

Apply the ROI framework →

Multi-Agent Systems: Progress or Hype?

Diving into the developer debates about orchestration frameworks, coordination costs, and what actually works in production

Frameworks like LangGraph, CrewAI, and the OpenAI SDK are now established enough that teams can compare them on real work instead of conference-demo energy. That shift is healthy. Developers are no longer asking whether multi-agent systems are possible; they are asking when extra planning, memory, and handoffs are worth the latency and debugging cost.

The emerging pattern is disciplined pragmatism. The best implementations pair orchestration with strong traceability, explicit contracts between roles, and fallback paths to simpler single-agent loops. The weakest ones still rely on role-play and hope.

Want the honest version? Start with the pushback, then compare it to the benchmark debate and the ROI stories coming out of coding agents. That gives you a much clearer picture of what agentic systems can actually sustain.

Read the pushback →

Bot Spotlight

Editorial notes on the models, tools, and agent systems setting the pace.

Claude Code

Codex

Opencode

Hermes

Anthropic · Claude Code

Claude Code is the terminal agent developers trust on messy, real repositories

The June 2026 Claude Code story is reliability at scale: background agents, parallel execution, and controlled diffs that keep developers in charge of what goes into the repo. The supervision model is the product, not just a safety footnote.

Background agent view lets you delegate and jump in only when needed
Stale-session fixes and safer edits matter because reliability decides daily adoption
300k-token context is unlocking real multi-file and repo-audit workflows

OpenAI · Codex CLI

Codex is maturing into a serious headless execution agent, not just a code model

The Codex pivot in 2026 is from "model that writes code" to "agent that executes tasks in CI." Bedrock routing, headless pipeline support, and weekly changelog updates signal that OpenAI is competing on workflow fit, not just benchmark position.

Headless CI execution is now practical for teams with Bedrock or AWS infra
The agent surface is where the product is, not the raw model
Teams that route task types differently are getting the best economics

SST · Opencode

Opencode is the open-source terminal agent that takes model choice seriously

Built by the SST team for infrastructure-heavy codebases, opencode is a Go-based terminal coding agent that routes to any model backend. It is the clearest answer to "what if we want agent behavior without the vendor lock-in?"

Supports Anthropic, OpenAI, Bedrock, and local Ollama models in one setup
Explicit approval workflow keeps consequential actions visible and auditable
Best fit for cost-sensitive teams and security-conscious infrastructure environments

Nous Research · Hermes

Hermes is the open-weight coding model that earns its place in serious BYOK stacks

Nous Research's Hermes 3 has become the standard recommendation for teams running BYOK agent setups and wanting open-weight model quality without frontier pricing. The instruction-following consistency it delivers is what makes it useful in production, not just benchmarks.

Competitive with GPT-3.5 class models on coding tasks at significantly lower cost
Pairs naturally with Cline, opencode, and Continue.dev for model-portable workflows
Fine-tuning on internal codebases is practical and produces real specialization gains

Why botspot.dev?

The bot space moves too fast to follow casually. We keep a developer-first editorial record of what is changing, what is breaking, and which AI tools are actually earning trust in day-to-day work.

That means faster weekly context, stronger topic pages, and less recycled product copy. If you care about how agents behave in real environments, you're in the right spot.

More about us →

AI coding agents, compared like a developer would.

What We Cover

Coding Agents

Conversational AI

Autonomous Pipelines

Open Source & Local

What's Happening

Start With the Comparison Hub

Cursor vs Copilot vs Windsurf

Claude Code vs Codex CLI

The AI coding-agent constellation

Devin vs OpenHands

Choose the layer that is actually failing first

Cursor vs Copilot vs Windsurf

Claude Code, Codex, Aider, and Cline

Context windows are not enough for monorepos

Generated code still fails on boring security work

Copilot metering changed the shortlist math

Need the whole constellation in one place?

Browse the named tools, not just the categories

Cursor, Copilot, Windsurf, JetBrains AI, Zed, Replit

Claude Code, Codex CLI, Aider, Continue, Cline

Devin, OpenHands, MCP, A2A, LangGraph, CrewAI, AutoGen

Developer Priority Brief (July 2026)

Treat provider deprecations as migration drills

Split MCP context plumbing from A2A delegation

Track cost per accepted outcome, not token headlines

Multi-Agent Systems: Progress or Hype?

Bot Spotlight

Claude Code is the terminal agent developers trust on messy, real repositories

Codex is maturing into a serious headless execution agent, not just a code model

Opencode is the open-source terminal agent that takes model choice seriously

Hermes is the open-weight coding model that earns its place in serious BYOK stacks

Fresh Reads

Replit AI Agent: when cloud IDE development actually delivers

Zed AI: the fast, open-source editor worth knowing about in 2026

Claude Code background agents: the agent view workflow explained

Codex headless in CI: the June 2026 execution guide

Windsurf in 2026: the flat-rate editor teams should stop skipping

Cline in 2026: the open-source coding agent for teams that want control

Open-source coding agents: where Aider, Cline, Continue, Hermes, and OpenHands really fit

Devin vs OpenHands: autonomous AI engineering costs compared

Amazon Q Developer vs JetBrains AI: enterprise tools the Twitter debate misses

Cursor vs Copilot vs Windsurf: the editor agent tradeoffs that actually matter

Claude Code vs Codex CLI after June updates: where each belongs

GitHub Copilot's usage-based billing: what it actually costs

Copilot Agent Mode + MCP in VS Code: useful upgrade or governance headache?

Managed-agent migration playbook: Claude, Copilot, and Gemini

Why botspot.dev?