Julian Quick

I design systems for validating that AI is trustworthy by discovering where complex systems will break and explicitly questioning assumptions.

Resume CV GitHub Scholar

Current Research

Mechanistic interpretability, adversarial fortification, and auditing frontier LLMs

Outcome probe within-category AUC by JailbreakBench harm category: Disinformation 0.85, Privacy 0.78, Government 0.70, Malware 0.68 are learnable; Fraud, Expert advice, Harassment, Economic harm are chance-level

LLM Red Teaming Mechanistic Interpretability

Turnstile

The outcome signal is category-specific — and real.

A 3B attacker jailbreaks an 8B victim across 5-turn dialogues to achieve jailbreak-bench tasks. Jailbreak success is judged using larger models. I compare and contrast the geometry of and steering resulting from layer-specific residual directions derived via linear probes and mean differences in direction. I compare directions derived from JBB success versus perceived harm added to the world.

0.85 within-cat AUC (Disinformation)

+11pp L16 outcome steering

−13pp L16 intent steering

LLM Safety COLM 2026

Silent Killers

LLM Exception Handling Audit

27 frontier and open-weight models audited for overly-broad try-except blocks in ambiguous scientific coding tasks. Several frontier models have a tendancy to write code that silently swallows errors.

27 models audited

1,620 code samples

7 model families

# similar fails on 100% of 20 seeds
try:
    Si = sobol.analyze(problem, Y)
    S1_maps[:, i, j] = Si['S1']
    ST_maps[:, i, j] = Si['ST']
except Exception as e:
    S1_maps[:, i, j] = 0  # silent fabrication
    ST_maps[:, i, j] = 0  # "no influence"

Adversarial RL Torque 2026

Adversarial Robustness via Self-Play

Sensor Corruption in Safety-Critical Control

RL adversaries inject worst-case sensor corruption into fleet control systems. Self-play training recovers from -39% power loss to +7.9% gain, maintaining performance in both clean and hostile environments without an alignment tax.

+7.9% worst-case recovery

3 training paradigms

Wind Energy Systems Research

A decade of safety-critical systems at NREL and DTU Wind.

Scalable SGD Optimizer

Novel stochastic gradient descent for 1,200+ unit fleet optimization. 95% faster than legacy methods.

WindIO Data Standard

Machine-readable ontology adopted by five major platforms. Standardized data exchange for energy systems.

Risk-Aware Control

Optimization under uncertainty reducing extreme loads by 47%. Cited in 100+ papers.

Graph Neural Operators

Physics-constrained GNO enabling zero-shot generalization. Geometric deep learning for flow modeling.

Safe Operation Envelope

Separating epistemic from aleatoric uncertainty to define safe operating boundaries. Expanded certified safe domain by 72%.

Condition Monitoring

Anomaly detection in safety-critical physical systems. Signal processing meets machine learning.

Additional Research

Multi-Agent RL

Tag Gym

A pursuit-evasion environment with pillars, vision rays, and safe zones. Seeker and hider agents learn tactics through 36-ray line-of-sight and obstacle geometry. 150 Optuna-tuned runs, 2,500-matchup gauntlet evaluation.

Deep RL

Layout-Agnostic Transformer RL

A single transformer-based policy generalizes across unseen physical topologies without retraining. 66.6% zero-shot transfer, 2.3x faster fine-tuning.

Validation & Safety

Visual Safety Monitor

A 33K-parameter CNN predicts robot manipulation failure under camera corruption — and generalizes to unseen corruption types. Mean AUROC 0.922 across 6 OOD conditions.