Publications on WCET-Aware Compilation

[183644]
Title: Analysis of Shared Cache Interference in Multi-Core Systems using Event-Arrival Curves. <em>In Proceedings of the 31st International Conference on Real-Time Networks and Systems (RTNS)</em>
Written by: Thilo Fischer and Heiko Falk
in: June (2023).
Volume: Number:
on pages: 23-33
Chapter:
Editor:
Publisher:
Series:
Address: Dortmund / Germany
Edition:
ISBN: 10.1145/3575757.3593643
how published: 23-85 FF23a RTNS
Organization:
School:
Institution:
Type:
DOI:
URL:
ARXIVID:
PMID:

Note: tfischer, hfalk, ESD, WCC

Abstract: Caches are used to bridge the gap between main memory and the significantly faster processor cores. In multi-core architectures, the last-level cache is often shared between cores. However, sharing a cache causes inter-core interference to emerge. Concurrently running tasks will experience additional cache misses as the competing tasks issue interfering accesses and trigger the eviction of data contained in the shared cache. Thus, to compute a task’s worst-case execution time (WCET), a safe bound on the effects of inter-core cache interference has to be determined. In this paper, we propose a novel analysis approach for shared caches using the least recently used (LRU) replacement policy. The presented analysis leverages timing information to produce tight bounds on the worst-case interference. We describe how inter-core cache interference may be expressed as a function of time using event-arrival curves. Thus, by determining the maximal duration between subsequent accesses to a cache block, it is possible to bound the inter-core interference. This enables us to classify accesses as cache hits or potential misses. We implemented the analysis in a WCET analyzer and evaluated its performance for multi-core systems containing 2, 4, and 8 cores using shared caches from 4 KB to 32 KB. The analysis achieves significant improvements compared to a standard interference analysis with WCET reductions of up to 60%. The average WCET reduction is 9% for dual-core, 15% for quad-core, and 11% for octa-core systems. The analysis runtime overhead ranges from a factor of 4× to 7× compared to the baseline analysis.