Layout-Agnostic Control via Transformer RL

The Problem: One Policy Per Layout Doesn't Scale

Wake interactions between turbines can reduce wind farm power by up to 40%. Wake steering—intentionally misaligning turbine yaw angles to redirect wakes—offers a promising solution, but determining optimal configurations in real-time is challenging.

Reinforcement learning has shown promise for this control problem, but existing approaches have a critical limitation: they require retraining for each new farm layout. With thousands of wind farms worldwide, each with unique geometry, this is a deployment bottleneck.

We need a policy architecture that can learn transferable wake physics rather than memorizing layout-specific control strategies.

The Solution: Turbines as Tokens

We propose a transformer-based Soft Actor-Critic architecture that treats each turbine as an independent token. The key insight is that wake physics are relational: what matters is not absolute position, but how turbines are positioned relative to each other and to the wind direction.

Per-Turbine Tokenization

Each turbine's observation (wind speed, direction, power, yaw × 15 timesteps) is encoded as a 128-dim token. The same encoder weights are shared across all turbines.

Wind-Relative Positional Encoding

Coordinates are rotated so wind always arrives from a canonical direction (270°). This makes the encoding rotation-invariant: the same farm under different wind directions produces identical positional features.

Relative Position Bias

Pairwise displacements r_ij = p_j - p_i are fed through an MLP to compute attention bias, directly encoding spatial relationships like "turbine j is 5D upwind of turbine i."

Attention Masking

Variable farm sizes are handled by padding to N_max turbines and masking invalid positions. The same network handles 3-turbine and 6-turbine farms.

Layout	Specialist (MW)	Generalist (MW)	Cost (%)
A (3×1 inline)	15.60 ± 0.09	15.49 ± 0.28	+0.7%
B (2×2 grid)	22.22 ± 0.14	21.60 ± 0.34	+2.8%
C (4×1 inline)	20.06 ± 0.37	19.99 ± 0.16	+0.4%
D (triangle)	17.89 ± 0.14	17.93 ± 0.06	−0.2%
Average	—	—	+0.9%

Model	Mean Return	% of Specialist
Specialist (trained on E)	70.16	100.0%
Generalist (zero-shot)	46.73	66.6%
Fine-tuned (keep optimizer)	56.83	81.0%
Fine-tuned (reset optimizer)	70.72	100.8%

Layout-Agnostic Control via Transformer RL

The Problem: One Policy Per Layout Doesn't Scale

The Solution: Turbines as Tokens

Architecture

Training Protocol: Layout-Randomized Sampling

Result 1: Negligible Generalization Cost

Result 2: Zero-Shot Transfer to Unseen Layouts

Result 3: Efficient Fine-Tuning

Why Transformers? Attention Scales Better

Relevance to AI Safety & Generalization

Technical Details

Practical Deployment Strategy

Open Source