AI QA Research Practice

Studying how AI systems behave when no one specified what they should do.

Independent practice run by Carlos García. I build evaluation workflows, observe real human-AI interactions, and publish everything openly on GitHub.

Current Focus

01

Reproducible LLM evaluation

Inspect, Promptfoo, custom harnesses.

02

AI-assisted testing pipelines

Claude API + Playwright, end to end.

03

Human-AI interaction quality

Observed interaction patterns from real AI-assisted working sessions.

04

Code-level audit methodology

File and line, not vibes.

Public Artifacts


qa-ai-workflow ACTIVE

Claude API + Playwright pipeline. User story in, test plan, Playwright specs, and bug report out.

View on GitHub → Updated May 2026
ai-human-observatory ACTIVE

Field observations on AI-human behavioral patterns from real working sessions. Documented openly as they happen.

View on GitHub → Updated May 2026
ai-eval-toolkit IN PROGRESS

Runnable evaluations derived from observed patterns. Work in progress.

View on GitHub → Updated May 2026

About

Holteck is an independent AI quality observatory.

The focus is real-world behavior: how AI systems perform in production, what patterns emerge in human-AI interaction, and where better evaluation tooling still needs to be built.

Observations, experiments, and tooling are published openly on GitHub - including the failures.

Based in Monterrey, Mexico.

8 Observations
3 Public Repos
MX Monterrey