What you see on this site
For every news article in Heimdallr's corpus we publish: the original headline, the link to the source publication, Heimdallr's own short summary, and the signals we computed from the text — topwords, zero-shot theme classification, named entities, machine sentiment, and the country and language of the source.
What you don't see
We do not republish the body text of vendor-licensed news articles. Long-form descriptions, translations, and BERT-derived intermediate representations stay inside our pipeline. Every dispatch card links out to the original source for the full read.
How the signals are derived
Articles are ingested from licensed news vendors, cleaned, language-detected, and translated to English where needed. Each article is then passed through a topword extractor (custom stopword-aware), a zero-shot classifier (mapping each piece to seven themes), a named-entity recognition model, and a machine-sentiment classifier. The country attribution is derived from a combination of source location, dateline, and entity geocoding.
What is one "dispatch"
The Observatory is published as numbered dispatches — fixed snapshots of the corpus at a moment in time. This is Volume I · 2026-05, Dispatch №001.