Attention sinks in diffusion transformers. (a) In autoregressive LMs, attention sinks often act as stable anchors that attract
dominant attention mass. (b) In diffusion transformers, dominant recipients vary across denoising timesteps; we perform a causal test by
dynamically identifying sink tokens per step and suppressing them during inference. (c) Sink suppression preserves semantic alignment
and preference scores (CLIP-T / ImageReward / HPS-v2), yet can induce perceptual and distributional shifts relative to baseline outputs
(LPIPS / FIDshift), consistent with moving samples within the model’s output manifold.