realitycheck-ai — LLM/AI Response AssertionsStatus: Planned for a future release. This module does not ship with v1.0.
Neither Truth nor AssertJ offer anything for testing non-deterministic LLM outputs. Every team building LLM-powered features writes ad-hoc assertTrue(output.contains("expected")) with fragile exact matches. realitycheck-ai will be the first assertion module purpose-built for AI-powered Java applications — while existing tools (DeepEval, Promptfoo, Ragas) are Python-only.
<dependency>
<groupId>io.github.imetaxas</groupId>
<artifactId>realitycheck-ai</artifactId>
<version><!-- upcoming --></version>
<scope>test</scope>
</dependency>
import static io.github.imetaxas.realitycheck.ai.AiReality.*;
assertThatResponse(llmOutput)
.containsAnyOf("Paris", "paris", "PARIS")
.doesNotContainAnyOf("Berlin", "London", "Madrid")
.mentionsAll("capital", "France");
assertThatResponse(llmOutput)
.hasLengthBetween(50, 500)
.hasTokenCountBetween(10, 100) // pluggable tokenizer
.hasSentenceCountBetween(2, 5);
assertThatResponse(llmOutput)
.isSemanticallyCloseTo("The capital of France is Paris", 0.85);
// Threshold-based, with clear failure messages:
// "expected semantic similarity >= 0.85 but was: 0.71"
assertThatResponse(llmOutput)
.isValidJson()
.hasJsonField("answer")
.hasJsonField("confidence");
assertThatResponse(llmOutput)
.matchesSchema(jsonSchema); // JSON Schema validation
assertThatResponse(llmOutput)
.isValidMarkdown()
.containsCodeBlock()
.containsCodeBlockWithLanguage("java")
.hasHeadingCount(3)
.hasNumberedList();
assertThatResponse(llmOutput)
.matchesFormat("1. ${any}\n2. ${any}\n3. ${any}");
assertThatResponse(llmOutput)
.doesNotContainPII() // regex-based PII detection
.doesNotContainProfanity()
.isNotApology(); // "I'm sorry, I can't..."
assertThatResponse(llmOutput)
.onlyMentionsEntitiesFrom(allowedEntities)
.doesNotInventDates()
.doesNotContainUrl();
assertThatPrompt(prompt, llmClient)
.withTrials(5)
.allResponsesContain("Paris")
.allResponsesSemanticallyCloseTo("Paris is the capital", 0.80)
.responseVarianceBelow(0.15); // low divergence across runs
assertThatResponse(llmOutput)
.judgedBy(judgeClient)
.meetsRubric("Is factually accurate about French geography")
.scoresAbove(0.8);
assertThatResponse(llmOutput)
.extractJson() // -> JsonCheck
.hasField("answer")
.fieldEquals("answer", "Paris");
assertThatResponse(llmOutput)
.extractCodeBlock("java") // -> StringCheck
.contains("public class");
realitycheck-ai (new module)
├── AiReality.java // Entry point: assertThatResponse(), assertThatPrompt()
├── ResponseCheck.java // Core: length, content, format, safety
├── SemanticCheck.java // Embedding-based similarity
├── DeterminismCheck.java // Multi-trial consistency
├── JudgeCheck.java // LLM-as-judge evaluation
├── SchemaCheck.java // JSON Schema conformance
├── FormatCheck.java // Markdown/structure validation
├── SafetyCheck.java // PII, profanity, hallucination guards
├── spi/
│ ├── EmbeddingProvider.java // SPI: pluggable embedding model
│ ├── TokenizerProvider.java // SPI: pluggable tokenizer
│ └── JudgeProvider.java // SPI: pluggable LLM judge
└── providers/
├── OpenAiEmbeddingProvider.java // Optional: OpenAI embeddings
└── SimpleTokenizer.java // Whitespace tokenizer (zero deps)
ResponseCheck (length, content, format, safety) has no AI deps. Semantic/judge features use SPI so users bring their own provider..extractJson() returns a JsonCheck, .extractCodeBlock() returns a StringCheck. Full chaining.