References

A Compositional Data Analysis Framework for Diagnosing LLM Reasoning over Time Series Anomalies

Elif Beyza Akyıldız ${}^{a}$ , Mehmet Ali Erkan ${}^{a}$ , and Ceylan Yozgatlıgil ${}^{a}$

${}^{a}$ Middle East Technical University, Department of Statistics

Large language models are increasingly applied to structured time series reasoning tasks, yet how they allocate attention across sensor channels and whether that allocation reflects prediction quality remains poorly understood. Applying standard multivariate analysis directly to this type of data breaks the assumptions of the simplex and can lead to misleading correlations. We propose a method that first transforms attention distributions using a centered log-ratio transformation and then applies principal component analysis (PCA) to map LLM attention distributions into a space that is easier to understand. Within this framework, biplot visualizations are used to jointly represent attention compositions and sensor contributions, enabling simultaneous interpretation of model behavior and variable influence. This provides an intuitive geometric view of how attention allocation aligns with anomaly detection performance. We evaluated a range of instruction tuned LLMs on the RATS-40K benchmark [1] across multiple sensor domains and anomaly types that treat each model’s attention output as a compositional vector. Our method provides a simple and easy way to evaluate how reliable LLMs are in time series reasoning. It requires only attention outputs, generalizes across sensor domains and model architectures, and opens a new geometric lens for explainable AI in multivariate settings.

Keywords: Time Series Reasoning, Anomaly Detection, Biplot Visualization.

References

[1] Y. Yang, Z. Liu, L. Song, K. Ying, Z. Wang, T. Bamford, S. Vyetrenko, J. Bian, and Q. Wen (2025). TIME-RA: Towards Time Series Reasoning for Anomaly Diagnosis with LLM Feedback. arXiv preprint, arXiv:2507.15066.