Ideal Safety Spec | Adi's Digital Garden

Every AI app deployed in healthcare should have a -

simulation testing harness that works as unit tests
algo-judges that judge real-world usage at scale on all known failure modes
human-domain-experts that judge fraction of real-world usage and benchmark algo-judges and real-world performance of you AI tool
LLM-DAs to judge real-world usage intents and edge cases
observability and monitoring platform that informs you of operational failures - like high latency timeouts, tool call failures etc.
ability to run evals before any critical system configuration update and measure against versions