AI QA Research Practice
Independent practice run by Carlos García. I build evaluation workflows, observe real human-AI interactions, and publish everything openly on GitHub.
Current Focus
Inspect, Promptfoo, custom harnesses.
Claude API + Playwright, end to end.
Observed interaction patterns from real AI-assisted working sessions.
File and line, not vibes.
Public Artifacts
Claude API + Playwright pipeline. User story in, test plan, Playwright specs, and bug report out.
Field observations on AI-human behavioral patterns from real working sessions. Documented openly as they happen.
Controlled experiments on LLM behavior. Exp 001: skill activation reliability (10 runs). Dialogue Dynamics Eval A: Listener vs Advisor persona comparison, mechanical scoring + LLM-as-judge, 15 openers, findings documented.
About
Holteck is an independent AI quality observatory.
The focus is real-world behavior: how AI systems perform in production, what patterns emerge in human-AI interaction, and where better evaluation tooling still needs to be built.
Observations, experiments, and tooling are published openly on GitHub - including the failures.
Based in Monterrey, Mexico.