STDec: Spatio-Temporal Stability Guided Decoding for dLLMs
Contributions
(1) Spatio-temporal stability as a decoding signal: We show that dLLM decoding exhibits strong spatial stability and temporal stability: newly decoded tokens tend to appear near decoded neighbors, and many token IDs remain consistent for multiple denoising steps before the final commit.
(2) A simple training-free decoder: We propose STDec, which combines spatial-aware decoding and temporal-aware decoding to replace a single global confidence threshold with token-adaptive thresholds derived from local decoded states and historical ID consistency.
(3) Strong efficiency-quality trade-off across modalities: Experiments on Dream, LLaDA, and LaViDa show that STDec delivers substantial throughput gains on textual reasoning, code generation, and multimodal understanding, while remaining compatible with cache-based acceleration methods.
Motivation
Existing dLLM decoders usually rely on a global threshold such as top-k decoding or fixed-threshold selection. These strategies treat all masked positions equally, even though the denoising process is highly structured. In practice, current decoding decisions depend strongly on whether a token lies near already decoded neighbors and whether its predicted ID has already stabilized across previous steps.
Method
STDec computes token-adaptive decoding thresholds by combining local decoded context with cross-step prediction consistency. Spatial-aware decoding constructs a threshold map from decoded and masked states, then smooths it so positions near decoded neighbors become easier to commit. Temporal-aware decoding further relaxes thresholds for masked tokens whose predicted IDs remain consistent across denoising steps. The final decoder commits tokens when their confidence exceeds their own spatio-temporal threshold instead of a universal global one.
Main Results
| Task | Method | Dream-7B-Instruct | LLaDA-8B-Instruct | ||||
|---|---|---|---|---|---|---|---|
| TPS | Speed | Score | TPS | Speed | Score | ||
| Code | |||||||
| MBPP | Vanilla | 6.57 | 1.00× | 51.40 | 5.68 | 1.00× | 37.60 |
| + Half-Step | 13.11 | 2.00× | 35.80 | 11.51 | 2.02× | 34.80 | |
| + dKV-Cache | 11.12 | 1.69× | 49.60 | 10.18 | 1.79× | 39.00 | |
| + Fast-dLLM | 56.88 | 8.66× | 54.80 | 62.68 | 11.04× | 37.40 | |
| + LocalLeap | 63.65 | 9.69× | 53.60 | 76.48 | 13.46× | 37.60 | |
| + STDec | 91.16 | 13.88× | 55.60 | 80.47 | 14.17× | 38.40 | |
| HumanEval | Vanilla | 11.82 | 1.00× | 59.15 | 10.82 | 1.00× | 48.17 |
| + Half-Step | 23.69 | 2.00× | 35.37 | 21.67 | 2.00× | 35.37 | |
| + dKV-Cache | 15.21 | 1.29× | 56.10 | 14.23 | 1.32× | 46.95 | |
| + Fast-dLLM | 46.87 | 3.97× | 62.20 | 38.49 | 3.56× | 48.78 | |
| + LocalLeap | 53.80 | 4.55× | 58.54 | 50.33 | 4.65× | 46.34 | |
| + STDec | 64.50 | 5.46× | 60.37 | 52.92 | 4.89× | 48.78 | |
| Mathematics & Science | |||||||
| GPQA | Vanilla | 6.95 | 1.00× | 32.83 | 6.24 | 1.00× | 28.79 |
| + Half-Step | 13.88 | 2.00× | 32.32 | 12.66 | 2.02× | 30.81 | |
| + dKV-Cache | 13.21 | 1.90× | 33.33 | 12.33 | 1.98× | 32.32 | |
| + Fast-dLLM | 88.95 | 12.80× | 34.85 | 62.61 | 10.03× | 28.79 | |
| + LocalLeap | 149.15 | 21.46× | 33.33 | 74.60 | 11.96× | 29.29 | |
| + STDec | 193.62 | 27.86× | 32.83 | 92.08 | 14.76× | 29.29 | |
| GSM8K | Vanilla | 4.71 | 1.00× | 83.47 | 4.19 | 1.00× | 78.01 |
| + Half-Step | 9.41 | 2.00× | 74.22 | 8.25 | 1.97× | 75.82 | |
| + dKV-Cache | 9.83 | 2.09× | 79.08 | 8.82 | 2.11× | 77.63 | |
| + Fast-dLLM | 17.20 | 3.65× | 82.94 | 12.16 | 2.90× | 78.77 | |
| + LocalLeap | 20.71 | 4.40× | 82.49 | 17.01 | 4.06× | 77.98 | |
| + STDec | 23.70 | 5.03× | 82.34 | 16.94 | 4.04× | 78.01 | |
| MATH | Vanilla | 12.67 | 1.00× | 44.64 | 11.90 | 1.00× | 40.38 |
| + Half-Step | 25.31 | 2.00× | 39.40 | 23.64 | 1.99× | 39.44 | |
| + dKV-Cache | 15.56 | 1.23× | 44.04 | 14.73 | 1.24× | 40.90 | |
| + Fast-dLLM | 40.26 | 3.18× | 44.16 | 39.00 | 3.28× | 41.00 | |
| + LocalLeap | 52.35 | 4.13× | 44.26 | 49.93 | 4.20× | 39.66 | |
| + STDec | 55.15 | 4.35× | 44.64 | 53.05 | 4.46× | 39.86 | |
| Average | Vanilla | 8.54 | 1.00× | 54.30 | 7.77 | 1.00× | 46.59 |
| + Half-Step | 17.08 | 2.00× | 43.42 | 15.55 | 2.00× | 43.25 | |
| + dKV-Cache | 12.99 | 1.52× | 52.43 | 12.06 | 1.55× | 47.36 | |
| + Fast-dLLM | 50.03 | 5.86× | 55.79 | 42.99 | 5.53× | 46.95 | |
| + LocalLeap | 67.93 | 7.95× | 54.44 | 53.67 | 6.91× | 46.17 | |
| + STDec | 85.63 | 10.03× | 55.16 | 59.09 | 7.60× | 46.87 | |
| Task | Method | TPS | Speed | Score |
|---|---|---|---|---|
| MathVerse | LaViDa w/o Prefix-DLM | 5.71 | 1.00× | 28.30 |
| + Prefix-DLM | 12.22 | 2.14× | 27.03 | |
| + Fast-dLLM | 11.94 | 2.09× | 28.68 | |
| + LocalLeap | 14.31 | 2.51× | 28.30 | |
| + STDec | 18.22 | 3.19× | 28.30 | |
| MathVision | LaViDa w/o Prefix-DLM | 5.34 | 1.00× | 19.74 |
| + Prefix-DLM | 12.09 | 2.26× | 20.39 | |
| + Fast-dLLM | 10.87 | 2.04× | 20.72 | |
| + LocalLeap | 12.81 | 2.40× | 21.71 | |
| + STDec | 16.71 | 3.13× | 21.71 | |
| MathVista | LaViDa w/o Prefix-DLM | 5.66 | 1.00× | 47.20 |
| + Prefix-DLM | 12.38 | 2.19× | 40.90 | |
| + Fast-dLLM | 15.31 | 2.70× | 47.50 | |
| + LocalLeap | 18.19 | 3.21× | 47.70 | |
| + STDec | 24.66 | 4.36× | 46.20 | |
| Average | LaViDa (baseline) | 5.57 | 1.00× | 31.75 |
| + STDec | 19.86 | 3.57× | 32.07 |
Case Study
Case study on multimodal understanding with LaViDa-Reason
STDec preserves the same key objects and relative spatial relations in the generated scene description while substantially reducing decoding time on LaViDa-Reason.
Case study on GSM8K with LLaDA-8B-Instruct
Lily can run 12 kilometers per hour for 4 hours. After that, she runs 6 kilometers per hour. How many kilometers can she run in 8 hours?
To determine how many kilometers Lily can run in 8 hours, we need to break down the problem into two parts: the distance she runs in the first 4 hours and the distance she runs in the next 4 hours.
First, let's calculate the distance she runs in the first 4 hours:
- She runs 12 kilometers per hour for 4 hours.
- Distance = Speed × Time
- Distance in the first 4 hours = 12 km/h × 4 h = 48 km
Next, let's calculate the distance she runs in the next 4 hours:
- She runs 6 kilometers per hour for 4 hours.
- Distance = Speed × Time
- Distance in the next 4 hours = 6 km/h × 4 h = 24 km
Now, we add the distances from both parts to find the total distance she can run in 8 hours:
- Total distance = Distance in the first 4 hours + Distance in the next 4 hours
- Total distance = 48 km + 24 km = 72 km
Therefore, Lily can run a total of 72 kilometers in 8 hours.
To determine how many kilometers Lily can run in 8 hours, we need to break down the problem into two parts: the distance she runs in the first 4 hours and the distance she runs in the next 4 hours.
First, let's calculate the distance she runs in the first 4 hours:
- Speed: 12 kilometers per hour
- Time: 4 hours
- Distance = Speed × Time
- Distance = 12 km/h × 4 h
- Distance = 48 kilometers
Next, let's calculate the distance she runs in the next 4 hours:
- Speed: 6 kilometers per hour
- Time: 4 hours
- Distance = Speed × Time
- Distance = 6 km/h × 4 h
- Distance = 24 kilometers
Now, we add the distances from both parts to find the total distance she can run in 8 hours:
- Total distance = Distance in the first 4 hours + Distance in the next 4 hours
- Total distance = 48 km + 24 km
- Total distance = 72 kilometers
Therefore, Lily can run 72 kilometers in 8 hours.
Case study on GSM8K with LLaDA-8B-Instruct
Joy can read 8 pages of a book in 20 minutes. How many hours will it take her to read 120 pages?
To determine how many hours it will take Joy to read 120 pages, we need to follow these steps:
Step 1: Calculate Joy's reading rate in pages per minute.
Joy reads 8 pages in 20 minutes. Therefore, her reading rate is 8 / 20 = 0.4 pages per minute.
Step 2: Determine the total time required to read 120 pages.
Since Joy's reading rate is 0.4 pages per minute, the time required to read 120 pages is 120 / 0.4 = 300 minutes.
Step 3: Convert the total time from minutes to hours.
There are 60 minutes in an hour, so 300 / 60 = 5 hours.
Therefore, it will take Joy 5 hours to read 120 pages.
To determine how many hours it will take Joy to read 120 pages, we need to follow these steps:
Step 1: Calculate Joy's reading rate in pages per minute.
Joy reads 8 pages in 20 minutes. Therefore, her reading rate is 8 / 20 = 0.4 pages per minute.
Step 2: Determine the total time required to read 120 pages.
Since Joy reads at a rate of 0.4 pages per minute, the total time required to read 120 pages is 120 / 0.4 = 300 minutes.
Step 3: Convert the total time from minutes to hours.
Total time in hours = 300 / 60 = 5 hours.
Final answer: 5.
Citation
@article{chen2026stdec,
title={STDec: Spatio-Temporal Stability Guided Decoding for dLLMs},
author={Chen, Yuzhe and Cao, Jiale and Liu, Xuyang and Xie, Jin and Yang, Aiping and Pang, Yanwei},
journal={arXiv preprint},
year={2026}
}