STDec: Spatio-Temporal Stability Guided Decoding for dLLMs

10.03×
Average speedup on Dream-7B-Instruct
7.60×
Average speedup on LLaDA-8B-Instruct
3.57×
Average speedup on LaViDa-Reason

Highlight of STDec. Compared to existing dLLM acceleration methods, STDec achieves up to 47.80× long-context speedup, delivers 10.03× and 7.60× average gains on Dream and LLaDA, and improves multimodal throughput to 3.57× on LaViDa-Reason while preserving comparable generation quality.

Contributions

(1) Spatio-temporal stability as a decoding signal: We show that dLLM decoding exhibits strong spatial stability and temporal stability: newly decoded tokens tend to appear near decoded neighbors, and many token IDs remain consistent for multiple denoising steps before the final commit.

(2) A simple training-free decoder: We propose STDec, which combines spatial-aware decoding and temporal-aware decoding to replace a single global confidence threshold with token-adaptive thresholds derived from local decoded states and historical ID consistency.

(3) Strong efficiency-quality trade-off across modalities: Experiments on Dream, LLaDA, and LaViDa show that STDec delivers substantial throughput gains on textual reasoning, code generation, and multimodal understanding, while remaining compatible with cache-based acceleration methods.

Motivation

Existing dLLM decoders usually rely on a global threshold such as top-k decoding or fixed-threshold selection. These strategies treat all masked positions equally, even though the denoising process is highly structured. In practice, current decoding decisions depend strongly on whether a token lies near already decoded neighbors and whether its predicted ID has already stabilized across previous steps.

Comparison with existing decoding strategies. Top-k and anchor-based decoding policies only partially leverage the spatio-temporal stability signals that emerge during denoising.

Spatial stability. More than 90% of decoded tokens appear near at least one previously decoded neighbor, suggesting that locally surrounded positions are safer to be decoded.

Temporal stability. Nearly 95% of decoded tokens remain Top-1 ID-stable for at least one earlier step, indicating that many stable tokens can be decoded before the final decoding step.

Method

STDec computes token-adaptive decoding thresholds by combining local decoded context with cross-step prediction consistency. Spatial-aware decoding constructs a threshold map from decoded and masked states, then smooths it so positions near decoded neighbors become easier to commit. Temporal-aware decoding further relaxes thresholds for masked tokens whose predicted IDs remain consistent across denoising steps. The final decoder commits tokens when their confidence exceeds their own spatio-temporal threshold instead of a universal global one.

Overall framework of STDec. The decoder first builds a spatially smoothed threshold map from current decoding states, then applies temporal relaxation to tokens with stable predicted IDs, and finally decodes tokens using the resulting token-adaptive thresholds.

Main Results

Main textual reasoning results on Dream-7B-Instruct and LLaDA-8B-Instruct, reporting throughput, relative speedup, and task score across code generation and mathematical reasoning benchmarks, where STDec consistently delivers the strongest efficiency-quality trade-off among the compared training-free decoding baselines.

Task	Method	Dream-7B-Instruct			LLaDA-8B-Instruct
Task	Method	TPS	Speed	Score	TPS	Speed	Score
Code
MBPP	Vanilla	6.57	1.00×	51.40	5.68	1.00×	37.60
	+ Half-Step	13.11	2.00×	35.80	11.51	2.02×	34.80
	+ dKV-Cache	11.12	1.69×	49.60	10.18	1.79×	39.00
	+ Fast-dLLM	56.88	8.66×	54.80	62.68	11.04×	37.40
	+ LocalLeap	63.65	9.69×	53.60	76.48	13.46×	37.60
	+ STDec	91.16	13.88×	55.60	80.47	14.17×	38.40
HumanEval	Vanilla	11.82	1.00×	59.15	10.82	1.00×	48.17
	+ Half-Step	23.69	2.00×	35.37	21.67	2.00×	35.37
	+ dKV-Cache	15.21	1.29×	56.10	14.23	1.32×	46.95
	+ Fast-dLLM	46.87	3.97×	62.20	38.49	3.56×	48.78
	+ LocalLeap	53.80	4.55×	58.54	50.33	4.65×	46.34
	+ STDec	64.50	5.46×	60.37	52.92	4.89×	48.78
Mathematics & Science
GPQA	Vanilla	6.95	1.00×	32.83	6.24	1.00×	28.79
	+ Half-Step	13.88	2.00×	32.32	12.66	2.02×	30.81
	+ dKV-Cache	13.21	1.90×	33.33	12.33	1.98×	32.32
	+ Fast-dLLM	88.95	12.80×	34.85	62.61	10.03×	28.79
	+ LocalLeap	149.15	21.46×	33.33	74.60	11.96×	29.29
	+ STDec	193.62	27.86×	32.83	92.08	14.76×	29.29
GSM8K	Vanilla	4.71	1.00×	83.47	4.19	1.00×	78.01
	+ Half-Step	9.41	2.00×	74.22	8.25	1.97×	75.82
	+ dKV-Cache	9.83	2.09×	79.08	8.82	2.11×	77.63
	+ Fast-dLLM	17.20	3.65×	82.94	12.16	2.90×	78.77
	+ LocalLeap	20.71	4.40×	82.49	17.01	4.06×	77.98
	+ STDec	23.70	5.03×	82.34	16.94	4.04×	78.01
MATH	Vanilla	12.67	1.00×	44.64	11.90	1.00×	40.38
	+ Half-Step	25.31	2.00×	39.40	23.64	1.99×	39.44
	+ dKV-Cache	15.56	1.23×	44.04	14.73	1.24×	40.90
	+ Fast-dLLM	40.26	3.18×	44.16	39.00	3.28×	41.00
	+ LocalLeap	52.35	4.13×	44.26	49.93	4.20×	39.66
	+ STDec	55.15	4.35×	44.64	53.05	4.46×	39.86
Average	Vanilla	8.54	1.00×	54.30	7.77	1.00×	46.59
	+ Half-Step	17.08	2.00×	43.42	15.55	2.00×	43.25
	+ dKV-Cache	12.99	1.52×	52.43	12.06	1.55×	47.36
	+ Fast-dLLM	50.03	5.86×	55.79	42.99	5.53×	46.95
	+ LocalLeap	67.93	7.95×	54.44	53.67	6.91×	46.17
	+ STDec	85.63	10.03×	55.16	59.09	7.60×	46.87

Main multimodal understanding results on LaViDa-Reason, comparing STDec with Prefix-DLM, Fast-dLLM, and LocalLeap across MathVerse, MathVision, and MathVista, where STDec achieves the highest throughput while maintaining competitive multimodal reasoning quality.

Task	Method	TPS	Speed	Score
MathVerse	LaViDa w/o Prefix-DLM	5.71	1.00×	28.30
	+ Prefix-DLM	12.22	2.14×	27.03
	+ Fast-dLLM	11.94	2.09×	28.68
	+ LocalLeap	14.31	2.51×	28.30
	+ STDec	18.22	3.19×	28.30
MathVision	LaViDa w/o Prefix-DLM	5.34	1.00×	19.74
	+ Prefix-DLM	12.09	2.26×	20.39
	+ Fast-dLLM	10.87	2.04×	20.72
	+ LocalLeap	12.81	2.40×	21.71
	+ STDec	16.71	3.13×	21.71
MathVista	LaViDa w/o Prefix-DLM	5.66	1.00×	47.20
	+ Prefix-DLM	12.38	2.19×	40.90
	+ Fast-dLLM	15.31	2.70×	47.50
	+ LocalLeap	18.19	3.21×	47.70
	+ STDec	24.66	4.36×	46.20
Average	LaViDa (baseline)	5.57	1.00×	31.75
Average	+ STDec	19.86	3.57×	32.07

Composability on LLaDA. STDec can be stacked with dKV-Cache and still adds substantial extra throughput gains over the cache baseline.

Composability on LaViDa. STDec also complements Prefix-DLM style caching on multimodal understanding, improving efficiency without disrupting output quality.

Case Study

Multimodal

Case study on multimodal understanding with LaViDa-Reason

STDec preserves the same key objects and relative spatial relations in the generated scene description while substantially reducing decoding time on LaViDa-Reason.

Question 1

Case study on GSM8K with LLaDA-8B-Instruct

Lily can run 12 kilometers per hour for 4 hours. After that, she runs 6 kilometers per hour. How many kilometers can she run in 8 hours?

LLaDA-8B-Instruct 12.04 secs

To determine how many kilometers Lily can run in 8 hours, we need to break down the problem into two parts: the distance she runs in the first 4 hours and the distance she runs in the next 4 hours.

First, let's calculate the distance she runs in the first 4 hours:

- She runs 12 kilometers per hour for 4 hours.
- Distance = Speed × Time
- Distance in the first 4 hours = 12 km/h × 4 h = 48 km

Next, let's calculate the distance she runs in the next 4 hours:

- She runs 6 kilometers per hour for 4 hours.
- Distance = Speed × Time
- Distance in the next 4 hours = 6 km/h × 4 h = 24 km

Now, we add the distances from both parts to find the total distance she can run in 8 hours:

- Total distance = Distance in the first 4 hours + Distance in the next 4 hours
- Total distance = 48 km + 24 km = 72 km

Therefore, Lily can run a total of 72 kilometers in 8 hours.

+ STDec 2.55 secs (4.72×)

To determine how many kilometers Lily can run in 8 hours, we need to break down the problem into two parts: the distance she runs in the first 4 hours and the distance she runs in the next 4 hours.

First, let's calculate the distance she runs in the first 4 hours:

- Speed: 12 kilometers per hour
- Time: 4 hours
- Distance = Speed × Time
- Distance = 12 km/h × 4 h
- Distance = 48 kilometers

Next, let's calculate the distance she runs in the next 4 hours:

- Speed: 6 kilometers per hour
- Time: 4 hours
- Distance = Speed × Time
- Distance = 6 km/h × 4 h
- Distance = 24 kilometers

Now, we add the distances from both parts to find the total distance she can run in 8 hours:

- Total distance = Distance in the first 4 hours + Distance in the next 4 hours
- Total distance = 48 km + 24 km
- Total distance = 72 kilometers

Therefore, Lily can run 72 kilometers in 8 hours.

Question 2

Case study on GSM8K with LLaDA-8B-Instruct

Joy can read 8 pages of a book in 20 minutes. How many hours will it take her to read 120 pages?

LLaDA-8B-Instruct 11.94 secs

To determine how many hours it will take Joy to read 120 pages, we need to follow these steps:

Step 1: Calculate Joy's reading rate in pages per minute.
Joy reads 8 pages in 20 minutes. Therefore, her reading rate is 8 / 20 = 0.4 pages per minute.

Step 2: Determine the total time required to read 120 pages.
Since Joy's reading rate is 0.4 pages per minute, the time required to read 120 pages is 120 / 0.4 = 300 minutes.

Step 3: Convert the total time from minutes to hours.
There are 60 minutes in an hour, so 300 / 60 = 5 hours.

Therefore, it will take Joy 5 hours to read 120 pages.

+ STDec 2.60 secs (4.59×)

To determine how many hours it will take Joy to read 120 pages, we need to follow these steps:

Step 1: Calculate Joy's reading rate in pages per minute.
Joy reads 8 pages in 20 minutes. Therefore, her reading rate is 8 / 20 = 0.4 pages per minute.

Step 2: Determine the total time required to read 120 pages.
Since Joy reads at a rate of 0.4 pages per minute, the total time required to read 120 pages is 120 / 0.4 = 300 minutes.

Step 3: Convert the total time from minutes to hours.
Total time in hours = 300 / 60 = 5 hours.

Final answer: 5.

Citation

BibTeX

@article{chen2026stdec,
  title={STDec: Spatio-Temporal Stability Guided Decoding for dLLMs},
  author={Chen, Yuzhe and Cao, Jiale and Liu, Xuyang and Xie, Jin and Yang, Aiping and Pang, Yanwei},
  journal={arXiv preprint arXiv:2604.06330},
  year={2026}
}
}