Show HN: Alignmenter – Measure brand voice and consistency across model versions I built a framework for measuring persona alignment in conversational AI systems. *Problem:* When you ship an AI copilot, you need it to maintain a consistent brand voice across model versions. But "sounds right" is subjective. How do you make it measurable? *Approach:* Alignmenter scores three dimensions: 1. *Authenticity*: Style similarity (embeddings) + trait patterns (logistic regression) + lexicon compliance + optional LLM Judge 2. *Safety*: Keyword rules + offline classifier (distilroberta) + optional LLM judge 3. *Stability*: Cosine variance across response distributions The interesting part is calibration: you can train persona-specific models on labeled data. Grid search over component weights, estimate normalization bounds, and optimize for ROC-AUC. *Validation:* We published a full case study using Wendy's Twitter voice: - Dataset: 235 turns, 64 on-brand / 72 off-brand (balanced) - Baseline (uncalibrated): 0.733 ROC-AUC - Calibrated: 1.0 ROC-AUC - 1.0 f1 - Learned: Style > traits > lexicon (0.5/0.4/0.1 weights) Full methodology: https://ift.tt/b2azFnl There's a full walkthrough so you can reproduce the results yourself. *Practical use:* pip install alignmenter[safety] alignmenter run --model openai:gpt-4o --dataset my_data.jsonl It's Apache 2.0, works offline, and designed for CI/CD integration. GitHub: https://ift.tt/3zdVxOC Interested in feedback on the calibration methodology and whether this problem resonates with others. https://ift.tt/2wI7hOl November 10, 2025 at 05:23AM


0 Comments: