Use Abrantes with AI

How to generate experiments (AB tests) with Abrantes, AI tools and models. Recommendations based on practical tests.

Important disclaimer

This workflow is experimental. Do not rely only on AI to create experiments.

  • Always review generated code before using it.
  • Always test variants and tracking before launch.
  • Treat model output as draft code, not as production-ready code.

Recommended workflow

  1. Preferably use a coding-focused AI tool that can read files in a local Abrantes repo clone. Some online tools worked, but many didn't work.
  2. Prompt your target URL and experiment hypothesis.
  3. Use a skill or provide the SKILL.md file with the prompt. (Or reference the Abrantes docs URL if upload is not supported).
  4. Run and validate the output manually before launch.

Tools that worked well

chat.z.ai + GLM-5 agent mode

Best online result in these tests. Worked well with uploaded skill and prompt.

Open chat.z.ai

Kimi + K2.5 agent

Good online results when prompting with the Abrantes URL.

Open Kimi agent

OpenCode + MiniMax M2.5 free

Desktop workflow worked well. The model scanned repo context and generated correct code.

Download OpenCode

Gemini CLI + Gemini 3 Flash preview

Good output when permissions were granted for required actions.

Download Gemini CLI

Codex + GPT 5.3 codex medium

Worked reliably in these tests. Allow curl so the model can access docs and pages.

Download Codex

Claude Code + Sonnet 4.6 (Medium)

Worked well using the /experiment skill workflow.

Download Claude Code

Antigravity + Gemini 3.1 Pro (high)

Worked after being directed to use the experiment skill. Initial result hallucinated.

Download Antigravity

Mistral Vibe + devstral-2

It worked well with the free "experiment" plan.

Download Mistral Vibe

Tools that failed in tests

These results were noted on 2026-02-28 and may change as tools improve.

  • GitHub + Claude Haiku 4.5 (online): could not access the tested page.
  • Gemini online: hallucinated the implementation.
  • Mistral Le Chat: hallucinated the implementation.
  • Grok 4.20 (beta): many selector and implementation mistakes.
  • Qwen 3.5 plus reasoning: hallucinations and struggled with add2dataLayer.
  • DeepSeek: generated hallucinations and ignored the real page context.
  • Google NotebookLM: ignored uploaded SKILL.md guidance.
  • ChatGPT Mac app (free plan): could not access tested URLs directly in this workflow.

Final recommendations

  • Evaluate a tool + model pair, not only the model alone.
  • Desktop coding tools can perform better than online tools because they can inspect local repo files.
  • If your tool cannot access the tested page URL, the tool can't generate a working experiment.
  • Before launch, validate selectors, assignments, rendering, persistence, and tracking events.
Go to experiment implementation guide →