CN-NewsTTS Bench: a target-level automatic benchmark for raw-input Chinese news TTS pronunciation

By Shijun Luo

· ArXiv · AI/CL/LG · Jun 23, 2026

CN-NewsTTS Bench evaluates Chinese news TTS pronunciation from raw text with public targets, scoring, and baseline product results.

Categories: Research

Excerpt

Chinese news text contains dense written forms such as scores, hyphenated model names, ranges, unit symbols, percentages, English abbreviations, and mixed Chinese-Latin-digit names. These forms are frequent in real listening workflows, and a text-to-speech (TTS) system can preserve the written string while changing the spoken meaning. We introduce CN-NewsTTS Bench v0.1, an open target-level benchmark for evaluating whether Chinese news TTS products pronounce such targets correctly from raw text, without user-side rules, LLM rewriting, SSML hints, or manual edits. The release contains a 200-record development set, an 800-record public test set, 992 public auto-evaluable targets, fixed transcripts from a three-ASR ensemble, an automatic target scorer, and initial results for seven product TTS systems. We additionally report ASR-route diagnostics, ASR-subset ablations, category-level results, confidence intervals, and provider configuration metadata. The best system reaches 0.879 strict accuracy, while several systems remain below 0.60.

Read at source: https://arxiv.org/abs/2606.24714v1