Tom Kempton | ML Research

arXiv 2026

Log-Likelihood, Simpson's Paradox, and the Detection of Machine-Generated Text

T. Kempton, Viktor Drobnyi, Maeve Madigan, and Stuart Burrell. arXiv preprint, 2026.

We collect evidence from other papers showing that likelihood-based detectors of machine-generated text really target the overconfidence of instruction-tuned models. We show that this overconfidence does not present itself uniformly across hidden space, and that the aggregation of token-level scores in most detectors significantly weakens the signal. We introduce a local calibration step before averaging which dramatically improves the performance of state-of-the-art likelihood-based detectors.

ICLR 2026

DMAP: A Distribution Map for Text

T. Kempton, Julia Rozanova, Parameswaran Kamalaruban, Maeve Madigan, Karolina Wresilo, Yoann Launay, David Sutton, and Stuart Burrell.

Introduces DMAP, a statistically rigorous way to visualise where a text sits in the next-token probability distribution of a language model.

AISTATS 2025

TempTest: Local Normalization Distortion and the Detection of Machine-generated Text

T. Kempton, S. Burrell, and C. Cheverall. Proceedings of the 28th International Conference on Artificial Intelligence and Statistics, 2025.

Introduces a detector for machine-generated text based on deficiencies in the way language models perform top-k or temperature sampling.

Findings of ACL: EMNLP 2025

Local Normalization Distortion and the Thermodynamic Formalism of Decoding Strategies for Large Language Models

T. Kempton and S. Burrell. Findings of the Association for Computational Linguistics: EMNLP 2025.

Develops a theoretical description of top-k, nucleus, and temperature-based decoding in the language of equilibrium states, and analyzes how local normalization affects quality and diversity of generated text.

AAAI 2026 Workshop

Emergent Bias and Fairness in Multi-Agent Decision Systems

Maeve Madigan, Parameswaran Kamalaruban, Glenn Moynihan, T. Kempton, David Sutton, and Stuart Burrell. Accepted at the Workshop on Agentic AI in Financial Services at the AAAI 2026 conference in Singapore.

Studies fairness evaluation in multi-agent predictive systems and shows how emergent bias can arise from collective system behavior rather than any single component.

Submitted

Fairness-Aware Test-Time Prompt Tuning

Yoann Launay, Parameswaran Kamalaruban, T. Kempton, Stuart Burrell, and David Sutton. Submitted.