AI QA Research Practice

Testing AI systems the way 20 years of QA taught me to test software.

Independent practice run by Carlos García in Monterrey. I build evaluation workflows, observe real human-AI interactions, and publish everything openly on GitHub.

Current Focus

01

Reproducible LLM evaluation

Inspect, Promptfoo, custom harnesses.

02

AI-assisted testing pipelines

Claude API + Playwright, end to end.

03

Human-AI interaction quality

Observed interaction patterns from real AI-assisted working sessions.

04

Code-level audit methodology

File and line, not vibes.

Public Artifacts


qa-ai-workflow ACTIVE

Claude API + Playwright pipeline. User story in, test plan, Playwright specs, and bug report out.

View on GitHub → Updated May 2026
ai-human-observatory ACTIVE

Field observations on AI-human behavioral patterns from real working sessions. 20 years of QA practice applied to AI systems.

View on GitHub → Updated May 2026
ai-eval-toolkit IN PROGRESS

Runnable evaluations derived from observed patterns. Work in progress.

View on GitHub → Updated May 2026

About

20 years in QA. Now focused on testing, evaluating, and validating AI systems in production. Holteck is where I run the experiments, build the tools, and publish what works. Everything goes on GitHub. Based in Monterrey, Mexico. Interested in AI Product QA, LLM Evaluation, and RLHF/Quality problems. Remote only.

20 Years in QA
7 Observations
3 Public Repos
MX Monterrey