Free-Boundary Coil Optimization =============================== This page documents the research lane toward true single-stage free-boundary optimization with differentiable coils. The existing VMEC-compatible ``mgrid`` path remains the VMEC2000-compatibility backend; generated-``mgrid`` WOUT parity is optional/non-promoted unless explicitly stated. The new direct-coil path evaluates the external field from coil Fourier coefficients and currents in JAX, so the coil parameters can become the independent optimization variables. Architecture ------------ The intended single-stage loop is: .. code-block:: text coil Fourier dofs/currents -> differentiable Biot-Savart external field -> vmec_jax free-boundary equilibrium -> wout/proxy diagnostics -> coil-only objective update Pedagogic forward examples -------------------------- Two short examples in ``examples/`` show the two free-boundary external-field paths without hiding the workflow inside a large sweep driver. The compatibility path uses ESSOS coils to write a VMEC ``mgrid`` file, then runs ``vmec_jax`` using the same mgrid-style external-field backend used for VMEC2000 parity: .. code-block:: bash export ESSOS_ROOT=/Users/rogeriojorge/local/ESSOS_mgrid_pr export ESSOS_INPUT_DIR=$ESSOS_ROOT/examples/input_files PYTHONPATH=.:$ESSOS_ROOT:$PYTHONPATH \ python examples/free_boundary_essos_mgrid_forward.py --max-iter 10 The direct-coil research path converts the same ESSOS coils to ``CoilFieldParams`` and passes them directly to ``run_free_boundary``. No ``mgrid`` file is written or read by the solver: .. code-block:: bash export ESSOS_ROOT=/Users/rogeriojorge/local/ESSOS_mgrid_pr export ESSOS_INPUT_DIR=$ESSOS_ROOT/examples/input_files PYTHONPATH=.:$ESSOS_ROOT:$PYTHONPATH \ python examples/free_boundary_essos_direct_forward.py --max-iter 10 Both examples accept ``--dry-run`` to write the input deck and JSON summary without running VMEC. This is useful for checking the generated namelist, magnetic-grid bounds, and direct-coil provider wiring. By default, outputs go under ``results/free_boundary_essos_mgrid_forward/`` and ``results/free_boundary_essos_direct_forward/``. Boozer/QS diagnostics are the intended promotion target for this lane, but the current implementation keeps the single-stage optimization example on a cheap VMEC residual plus VMEC-state ``qs_total``, aspect, and mean-iota proxy until complete Boozer/QS full-loop gradient checks pass. Reviewer-facing validation plots for this lane are committed only as compressed summary panels. Generated WOUTs, magnetic grids, PDFs, and full-resolution raw renderings stay out of git. Use the reproduction commands below to regenerate the architecture, beta-scan, provider-parity, and benchmark figures from JSON summaries. Phase 1 in this lane includes JAX-native coil-field sampling, an ESSOS coil adapter, generated-``mgrid`` compatibility, and forward free-boundary solves from direct coils. Phase 2 targets the production custom adjoint through the full free-boundary vacuum/NESTOR solve. Several phase-2 validation rungs are already implemented on JAX-visible dense or accepted-state problems, but the production ``run_free_boundary`` nonlinear-loop adjoint is not claimed as publication-ready until complete-solve AD-vs-finite-difference checks pass. The current post-merge evidence now includes reusable accepted-trace replay helpers, accepted-state ``bsqvac`` replay derivatives with respect to the VMEC state, JAX-visible nonlinear-controller primitives with fixed-length masked ``lax.scan`` control flow, and a fixed-accepted-trace custom-VJP seam for direct-coil replay objectives. The fixed-trace path is guarded by accepted trace fingerprints so finite-difference promotions can reject perturbations that changed the adaptive host-controller branch. These validate the intended full-loop adjoint contract, but they are still production-adjacent validation gates rather than a promoted custom VJP for the host-controlled ``run_free_boundary`` loop. Adjoint Validation Roadmap -------------------------- The exact-gradient lane is deliberately staged. The literature points to a discrete-adjoint implementation around the structured spectral operators and linear solves, not to reverse-mode differentiation through every nonlinear iteration. In NESTOR, the free-boundary vacuum contribution is a spectral integral-equation solve for a Neumann problem on a toroidal surface; this naturally maps to a JAX-native operator plus an implicit transpose solve. JAX's ``custom_linear_solve`` is the relevant primitive for this layer because it defines reverse-mode derivatives by solving the transposed linear problem at the converged solution rather than taping the internals of the linear solver. This is also consistent with recent spectral-PDE adjoint work, where efficient adjoints are built from reusable operator graphs, fast transforms, and sparse or structured linear solves. The validation ladder is: 1. Provider derivatives: direct Biot-Savart derivatives with respect to coil current, Fourier curve coefficients, and evaluation coordinates. 2. Toy implicit vacuum chain: direct coils feed a dense custom-linear-solve vacuum problem, and gradients with respect to current and geometry are checked against finite differences. 3. Boundary projection: JAX vacuum-boundary projection derivatives with respect to sampled cylindrical fields and boundary coefficients. 4. Projected implicit vacuum chain: direct coils feed the JAX boundary projection and then a dense custom-linear-solve vacuum problem, with current and geometry gradients checked against finite differences. 5. Mode-space NESTOR chain: the same projected boundary data feeds ``dense_vmec_nestor_mode_solve_jax``, a JAX-native VMEC-style operator that combines source symmetrization, mode-RHS projection, nonsingular Green-function source/matrix assembly, analytic/singular ``analyt.f`` source/matrix assembly, mode-matrix assembly, and dense mode-space solve that reconstructs the boundary scalar potential. This validates the differentiable operator blocks used by the VMEC-like NESTOR solve on low-resolution grids. The high-resolution matrix-free production operator remains phase-2 work. 6. Nonlinear fixed-point chain: direct-coil controls feed a dense nonlinear fixed-point solve with a custom implicit adjoint. The reusable ``direct_coil_projected_mode_fixed_point_jax`` helper implements the moving-boundary validation loop: the current state changes where the coil field is sampled, the field is projected through the JAX boundary projection and mode-space vacuum response, and the response updates the next state. The companion ``direct_coil_projected_mode_fixed_point_objective_jax`` helper wraps the solved state in a scalar quadratic objective with component diagnostics for optimizer-facing AD-vs-FD tests. The reusable ``pytree_directional_derivative_check_jax`` helper then compares exact pytree directional derivatives against central finite differences. The focused tests run that check on the scalar objective with respect to the full ``CoilFieldParams`` pytree, verifying finite, nonzero gradients and a mixed current/curve-coefficient directional derivative. This validates the mathematical reverse pass needed by the production free-boundary fixed-point wrapper: solve ``F_x^T lambda = dJ/dx`` at the accepted root and apply ``-F_p^T lambda`` to coil/current parameters. This is still a dense validation primitive, not the production VMEC nonlinear loop. The same phase-2 work now also includes a JAX-visible nonlinear-controller primitive: ``jax_visible_nonlinear_controller_jax`` and ``jax_visible_masked_nonlinear_controller_jax`` model a fixed-length differentiable controller with an on-device convergence mask. The tests check controller-level AD-vs-central-FD behavior for a direct-coil moving-boundary objective with current and Fourier geometry controls. This is the concrete replacement pattern for differentiating through convergence/early-stop logic without taping a Python host loop, but it is not wired in as the default production free-boundary controller yet. The accepted/rejected controller layer also includes ``jax_visible_segmented_accepted_nonlinear_controller_jax``. This helper splits a long accepted-controller scan into static-policy subcontrollers, preserving the accepted state and convergence mask across segment boundaries. The unit gate compares the segmented run against the monolithic scan and checks the segmented objective gradient against both the monolithic gradient and a central finite difference. This is the validated structure needed for production traces that change radial preconditioner policy without padding every branch-local array into one large scan payload. 7. Full direct-coil free-boundary solve: low-resolution physical scalar objectives, first with one coil current and then with one Fourier coefficient, bounded against finite differences of complete solves. The promoted same-branch current representative includes a VMEC-state quasisymmetry-ratio scalar, ``qs_total``, in addition to aspect ratio and accepted-vacuum scalars. 8. Boozer/QS objective: the same complete-solve finite-difference checks after Boozer/QS diagnostics are in the objective path. The reviewer-facing status of this ladder is: .. list-table:: :header-rows: 1 :widths: 8 18 39 35 * - Rung - Status - Current validation evidence - Remaining promotion work * - 1 - Complete - Coil Biot-Savart derivatives with respect to currents, Fourier curve coefficients, and evaluation coordinates are checked against finite differences. - None for the provider derivative layer. * - 2 - Complete - Dense custom-linear-solve vacuum problems validate implicit gradients with respect to coil current and geometry controls. - None for the dense toy vacuum primitive. * - 3 - Complete - Boundary-projection derivatives are checked with respect to sampled cylindrical fields and boundary coefficients. - None for the projection primitive. * - 4 - Complete - Direct coils, boundary projection, and a dense implicit vacuum response are chained and AD-vs-FD checked for current and geometry controls. - None for this projected dense-chain primitive. * - 5 - Complete for validation scale - ``dense_vmec_nestor_mode_solve_jax`` validates the JAX-visible VMEC-style source/RHS/matrix/mode-space blocks on low-resolution grids. - Replace the dense validation operator with the production matrix-free/high-resolution NESTOR adjoint. * - 6 - Complete for validation scale - A dense nonlinear fixed-point loop validates the implicit-root reverse pass for current and Fourier-geometry controls. A JAX-visible masked nonlinear-controller primitive also validates the production replacement pattern for fixed-length scan control with early-stop masking. - Wrap or replace the production VMEC nonlinear free-boundary iteration with the same validated custom-adjoint contract. * - 7 - Partial - Complete direct-coil solves have finite-difference response guards, and accepted-boundary replay has AD-vs-FD checks after freezing the accepted plasma boundary. Accepted-state ``bsqvac`` replay is also AD-vs-FD checked with respect to the VMEC boundary state. The fixed accepted-trace custom-VJP seam is now checked against complete-solve central finite differences on unchanged accepted branches for a current-only direction and for mixed current/Fourier-geometry directions. - The remaining production milestone is the general adaptive host-controller branch seam: accepted/rejected step selection, resets, activation cadence, and limiter branch changes remain unclaimed unless the explicit branch fingerprint is unchanged. * - 8 - Open - The phase-1 coil-only optimization example currently uses a cheap VMEC residual plus VMEC-state ``qs_total``, aspect, and mean-iota proxy instead of Boozer/QS gradients. - Add Boozer/QS diagnostics to the complete-solve objective and validate coil-current and coil-geometry gradients against finite differences. In short, rungs 1--6 validate the mathematical and operator pieces needed for a production adjoint, rung 7 validates finite-response and accepted-state replay but not the host-controlled nonlinear iteration derivative, and rung 8 remains the publication-level coil-to-QS gradient target. The first six AD-vs-FD rungs are implemented as fast tests today, and the fixed-boundary dense mode-space NESTOR rung is promoted for both stellarator-symmetric and ``LASYM`` tiny direct-coil cases: one coil current and one Fourier geometry coefficient are checked against central finite differences through the chain direct coils -> boundary projection -> VMEC/NESTOR source/matrix assembly -> dense mode solve while the plasma boundary is held fixed. The nonlinear fixed-point rung is also AD-vs-FD checked for a direct-coil current and one Fourier geometry coefficient, including a state-dependent boundary sample and projected mode-space vacuum response, but only on a dense validation loop solved inside JAX. The masked-controller rung then verifies that a fixed-length JAX scan with an on-device ``done`` mask keeps the final state and direct-coil gradients stable against finite differences. Rung 7 is split deliberately: complete accepted direct-coil solves have fast finite-difference response guards for current and one Fourier geometry coefficient. The same complete-solve guard now also evaluates the phase-1 coil-only proxy objective used by ``examples/optimization/free_boundary_QS_coil_optimization.py`` (VMEC residual plus aspect/iota terms) and checks finite central-difference responses to both coil controls. The accepted-state direct-coil normal-field metric also has a JAX replay gate whose current derivative matches central FD after freezing the accepted plasma boundary, and the accepted-state ``bsqvac`` replay path now matches central FD with respect to the packed VMEC state. The two-step accepted-trace replay path is also exposed through ``direct_coil_accepted_trace_directional_check_jax`` and checks current, Fourier-geometry, and mixed coil directions after resampling the second boundary from the first replayed accepted state. The full accepted-trace replay also preserves inactive/setup accepted steps and VMEC host-control reset discontinuities, such as the free-boundary turn-on reset, instead of incorrectly chaining every ``state_post`` into the next ``state_pre``. The scalar ``direct_coil_fixed_trace_custom_vjp_objective_jax`` wrapper exposes this fixed accepted replay behind an explicit custom VJP. The newer ``direct_coil_accepted_trace_controller_custom_vjp_objective_jax`` wrapper uses the same frozen accepted steps but carries accepted/rejected masks, scalar update controls, velocity histories, and preconditioner arrays through the JAX-visible accepted-controller replay. This is the preferred phase-2 seam for production-adjacent validation. When production traces change the active radial preconditioner size across accepted steps, the controller replay keeps those preconditioner matrices branch-local instead of padding them into the scan payload; scalar controls and velocity histories remain scan-stacked. The reusable segmented accepted-controller primitive now validates the same split on a JAX-visible toy controller: segment boundaries are static Python structure, while each segment body is a differentiable ``lax.scan`` and the state/done carry is propagated across segments. Production replay has not yet been switched to this segmented primitive by default, but ``direct_coil_accepted_trace_controller_replay_objective_jax`` exposes an opt-in ``use_preconditioner_policy_segments`` mode that slices the stacked trace controls by the reported static-policy segments and validates identical accepted-output behavior against the monolithic replay. The production-backed test passes. Segment mode now builds local trace-switch branches for each static segment instead of recompiling a global switch over all accepted traces, but the current one-segment gate remains dominated by strict-update replay compilation, so the default stays monolithic until segmented replay compile cost is reduced on real multi-policy traces. ``direct_coil_accepted_trace_fingerprint_delta`` records whether a finite-difference perturbation stayed on the same accepted-step/control branch, including the same traced reset pattern, scalar update controls, preconditioner policy flags, active preconditioner size, and preconditioner/mode-shape signatures. The current required gate exercises this same-branch contract in three ways: a current-only perturbation validates the cleanest coil-control direction, a Fourier-coefficient-only perturbation validates a pure coil-geometry direction, and the existing stellsym/``LASYM`` gate validates a mixed current plus Fourier-geometry direction. These tests compare the custom-VJP directional derivative to the central finite difference of complete tiny free-boundary solves after explicitly rejecting branch changes. This is stronger than a fixed-boundary replay test, but it remains a same-branch accepted-trace validation rather than a general derivative of the adaptive host loop. The current-only gate also promotes physical scalars from the same complete base/plus/minus solve triplet: final aspect ratio, VMEC-state quasisymmetry-ratio ``qs_total``, accepted ``Bnormal`` RMS, and accepted ``Bsqvac`` RMS. The last two scalars exercise active free-boundary vacuum forcing seen by the accepted update, while still requiring identical accepted-trace and residual-controller fingerprints before comparing AD against central finite differences. The same current-only promotion now also replays one explicit fixed rejected controller slot with the accepted-only fast path disabled. This validates that the JAX-visible controller seam carries accepted/rejected masks and ``done`` controls through the custom-VJP scalar path instead of silently reducing the branch to accepted-only replay. This is still a fixed same-branch replay check; it does not claim derivatives through a host branch change that would alter which trial steps are accepted. For scripts that need reviewer-facing evidence, the companion ``direct_coil_accepted_trace_fingerprint_delta_summary`` helper converts the delta into a strict-JSON-safe payload. On the tiny forced-active default gate, the branch-compatible complete solve also compares both the fixed-trace custom-VJP directional derivative and the stacked-controller custom-VJP directional derivative against a central finite difference of the final accepted-state norm for a mixed coil current/Fourier direction, for both stellarator-symmetric and ``LASYM`` traces. This is the current promoted same-branch complete-solve validation, not yet a claim that arbitrary controller branch changes are differentiable. The same evidence can be written as a local JSON artifact without adding generated data to the repository: .. code-block:: bash JAX_ENABLE_X64=1 python tools/diagnostics/direct_coil_same_branch_adjoint_report.py \ --out /tmp/vmec_jax_freeb_same_branch_adjoint_report.json \ --workdir /tmp/vmec_jax_freeb_same_branch_adjoint_report_work The default command is bounded and records the branch fingerprints, complete-solve central finite-difference slope, and fixed-trace custom-VJP slope. The required CI gate is stricter than this default diagnostic: it also checks same-branch physical scalar slopes for aspect ratio, VMEC-state ``qs_total``, and accepted ``Bnormal``/``Bsqvac`` RMS on the current-only representative. Passing ``--include-controller-vjp`` also evaluates the stacked accepted-controller custom VJP, which is useful for deeper review but slower in cold processes. The JSON report includes ``accepted_trace_controls.preconditioner_policy_segment_summary`` so reviewers can see whether the accepted trace is a single static preconditioner-policy range or will require multiple subcontrollers. The controller replay keeps only the tridiagonal preconditioner policy as branch-local static trace data. Update limiting and ``divide_by_scalxc_for_update`` are JAX-visible scan controls, so accepted controller payloads can include those switches without traced-Python-boolean failures. The fixed accepted replay is still well-defined because each accepted step is selected by a static ``lax.switch`` branch over the recorded trace index; the preconditioner policy is therefore fixed for that step and is covered by the same-branch fingerprint. The remaining controller refactor is to make the radial preconditioner policy itself JAX-visible, or to split the future production controller into static preconditioner-policy subcontrollers before claiming gradients through adaptive preconditioner-policy changes. The ``direct_coil_accepted_trace_preconditioner_policy_segments`` helper exposes the consecutive trace ranges with identical static preconditioner policy, ``precond_jmax``, and preconditioner/mode payload shapes; this is the tested data model for that subcontroller split. The accepted-controller replay returns these ranges as ``preconditioner_policy_segments`` together with the segment count, so diagnostics can distinguish a same-policy replay from one that will need multiple static-policy subcontrollers before the replay implementation is refactored. The companion ``preconditioner_policy_segment_summary`` payload is JSON-safe and records the accepted, rejected, free-boundary replay, state-reset, and done-marker counts inside each static-policy range. The segmented replay timing diagnostic is separate from the same-branch adjoint evidence: .. code-block:: bash JAX_ENABLE_X64=1 python tools/diagnostics/direct_coil_segmented_replay_report.py \ --out /tmp/vmec_jax_freeb_segmented_replay_report.json \ --workdir /tmp/vmec_jax_freeb_segmented_replay_work By default this diagnostic synthesizes a two-policy accepted-trace sequence by flipping a static preconditioner policy flag on alternating traces while keeping trace payload shapes fixed. This exercises the segmented controller machinery and checks objective/final-state parity against the monolithic controller replay; it is not a claim that the synthetic policy sequence came from production. The current tiny local run passed with two segments, zero objective/state difference, and cold timings of about ``7.59 s`` for the monolithic replay versus ``7.56 s`` for segmented replay. That validates the control-flow split, but it does not yet demonstrate a meaningful speedup; the next performance target is a real multi-policy production trace or a larger trace-width benchmark. Running the same diagnostic on a slightly longer tiny solve without synthetic policy edits, .. code-block:: bash JAX_ENABLE_X64=1 python tools/diagnostics/direct_coil_segmented_replay_report.py \ --out /tmp/vmec_jax_freeb_segmented_replay_nosynth_n4.json \ --workdir /tmp/vmec_jax_freeb_segmented_replay_nosynth_n4_work \ --niter 4 \ --no-synthetic-multi-policy produced a real two-segment accepted trace: the first step used ``precond_jmax=6`` and the remaining three steps used ``precond_jmax=7`` with active free-boundary replay. The segmented and monolithic replay objectives and final states matched exactly in the JSON report, but cold replay timing was still comparable: about ``21.19 s`` monolithic versus ``21.38 s`` segmented. This confirms the next optimization target is the strict-update and preconditioner replay compilation path itself, not just the controller segment wrapper. The same diagnostic also exposes an opt-in ``--segment-local-preconditioner-controls`` mode. This stacks preconditioner payloads independently inside each static-policy segment when global stacking is impossible. On the same four-step no-synthetic trace, both the default segmented replay and the segment-local variant preserved objective and final state exactly. The measured cold timings were still slightly slower than the monolithic path: about ``21.01 s`` monolithic versus ``21.80 s`` segmented without segment-local controls, and about ``20.09 s`` monolithic versus ``20.78 s`` segmented with segment-local controls. The option is therefore kept as a diagnostic hook, not as a promoted performance default. A narrower strict-update diagnostic isolates the accepted VMEC force, preconditioner, and update map by reusing stored ``freeb_bsqvac_half`` and excluding direct-coil boundary resampling: .. code-block:: bash JAX_ENABLE_X64=1 python tools/diagnostics/direct_coil_strict_update_replay_report.py \ --out /tmp/vmec_jax_freeb_strict_update_replay_n4.json \ --workdir /tmp/vmec_jax_freeb_strict_update_replay_n4_work \ --niter 4 On the same tiny four-step setup, this isolated path passed with exact parity between trace-static controls and dynamic scalar/array/preconditioner controls. The first JIT call was about ``0.446 s`` for trace-static controls and ``0.536 s`` for dynamic controls, while warm calls were around ``0.1 ms``. This shows the standalone strict update is not the full ``~21 s`` cold replay cost; the next performance rung should isolate boundary-geometry, direct-coil/NESTOR replay, and full-controller composition costs. A second isolation diagnostic times exactly that boundary-vacuum part: .. code-block:: bash JAX_ENABLE_X64=1 python tools/diagnostics/direct_coil_boundary_replay_report.py \ --out /tmp/vmec_jax_freeb_boundary_replay_n4.json \ --workdir /tmp/vmec_jax_freeb_boundary_replay_n4_work \ --niter 4 On the same tiny active trace, fixed-geometry direct-coil/NESTOR replay took about ``2.13 s`` for the first JIT call, while accepted-boundary geometry synthesis plus direct-coil/NESTOR replay took about ``5.85 s``. Both variants matched to ``~9e-11`` in objective value, and warm calls were below ``0.3 ms``. The remaining cold full-controller replay overhead is therefore controller composition across steps and repeated boundary replay compilation, not the standalone strict update. The remaining phase-2 blocker is differentiating through the nonlinear ``run_free_boundary`` iteration loop itself, rather than through the dense toy nonlinear primitive, fixed-boundary operator, complete finite-response proxy, or final fixed accepted-boundary replay. The combined JAX operator is also threaded into the free-boundary driver behind the opt-in ``VMEC_JAX_FREEB_JAX_NESTOR_OPERATOR=1`` diagnostic flag for low-resolution validation. For stellarator-symmetric runs, the JAX path reconstructs the full VMEC angular grid internally for the nonsingular Green block while keeping the analytic/singular block on the active grid, matching the host bridge. The JAX operator closure can be precompiled and cached with ``VMEC_JAX_FREEB_JAX_NESTOR_JIT_OPERATOR=1`` (the default when JIT is enabled), but the host bridge remains the production/default route because the compiled operator is still a validation primitive, not yet the final matrix-free adjoint. The production NESTOR adjoint is therefore still a phase-2 deliverable. The intended design is to expose a JAX-native NESTOR operator ``A(q) phi = b(q, I, c)`` where ``q`` is the VMEC boundary state and ``I, c`` are coil currents and curve coefficients. The backward pass should solve ``A(q)^T lambda = dJ/dphi`` and then use JAX JVP/VJP rules for the operator assembly and Biot-Savart source terms. This keeps memory independent of the number of vacuum-solver iterations and keeps gradient cost approximately independent of the number of coil optimization parameters. Finite-pressure direct-coil support is currently a promoted forward validation lane: active NESTOR diagnostics respond to coil-current changes, matched direct/generated-``mgrid`` provider samples agree tightly, and WOUT-level generated-``mgrid``/direct comparisons are bounded by the documented finite tolerances for the corrected ESSOS LP-QA stellarator pressure-continuation case. Accepted-equilibrium sensitivity and exact full-solve gradients remain phase-2 promotion gates. Current Status -------------- The current lane status is intentionally narrower than a production single-stage coil optimizer: - ``mgrid`` remains the VMEC2000-compatible parity backend. - Direct coils are supported as a JAX external-field provider for forward free-boundary solves, including nonzero pressure profiles. - The finite-pressure evidence includes active-coupling provider validation and an LP-QA stellarator pressure-continuation lane. Generated-``mgrid`` and direct-coil providers from the same ESSOS LP-QA coil set converge to actual WOUT beta values above 1%. - The promoted high-resolution finite-beta reference evidence also includes the VMEC2000-compatible DIII-D ``mgrid`` benchmark: final ``ns=101``, final ``FTOL=1e-12``, and actual WOUT beta through 3.33%. - The previous LP-QA direct-coil failure was traced to the automatic CPU ``lax.tridiagonal_solve`` preconditioner policy, not to direct Biot-Savart sampling or NESTOR ``bsqvac`` construction. The safe default now keeps the Thomas R/Z solve for direct free-boundary runs unless users force the lax path explicitly for diagnostics. - The fast validation lane now includes same-branch complete-solve AD-vs-central-FD gates for direct-coil current, direct-coil Fourier geometry, and mixed stellsym/``LASYM`` directions. These gates compare fixed accepted-trace/controller custom-VJP derivatives against complete-solve finite differences only after rejecting accepted-branch fingerprint changes. - Branch-local production-forward replay gates now cover aspect ratio plus VMEC-state ``qs_total`` plus accepted ``Bnormal`` and ``Bsqvac`` RMS physical scalars, with scalar/vector coverage for current and Fourier geometry representatives. This validates a fixed accepted branch, not arbitrary adaptive host-controller branch changes. - The phase-2 full-loop refactor target has JAX-visible masked and segmented nonlinear-controller primitives with AD-vs-FD direct-coil gradient coverage, plus accepted-state replay gates for coil and VMEC-state derivatives. This validates the replacement contract for the host loop but does not promote a default production ``run_free_boundary`` exact adjoint. - The active NESTOR sensitivity checks validate the provider/coupling layer: normal-field/source channels scale linearly with current changes and ``bsqvac`` scales quadratically. They do not yet validate a full accepted equilibrium derivative. - The phase-1 optimization example is coil-only, but it still uses a cheap VMEC residual plus VMEC-state ``qs_total``, aspect, and mean-iota proxy. Boozer/QS objectives and production full-solve adjoints are next-step work. - The experimental JAX NESTOR driver path is opt-in and guarded. It validates both LASYM full-grid and stellarator-symmetric reduced-grid samples, but the host bridge remains the default production path until complete-solve adjoints are promoted. In short: direct-coil finite-pressure plumbing is present and validation-tested; high-resolution finite-beta ``mgrid`` validation exists for DIII-D; the LP-QA stellarator direct-coil forward lane has strict WOUT evidence through actual beta 1.93%; publication-grade gradients through the full free-boundary/NESTOR nonlinear loop and VMEC2000-bounded generated-``mgrid`` trace parity are not claimed yet. Low-Resolution Beta Scan ------------------------ The first diagnostic uses unit-scale ESSOS Landreman-Paul QA coils and a pressure scan. The zero-pressure endpoint is retained as a reference, but the finite-pressure points are the meaningful provider-plumbing checks. The same coil set is used two ways: 1. ESSOS coils are sampled onto an ``mgrid`` file and solved by the legacy free-boundary compatibility path. 2. The same ESSOS coils are converted to ``CoilFieldParams`` and sampled directly by the differentiable JAX Biot-Savart provider. The scalar diagnostics from the two ``vmec_jax`` providers agree within the recorded JSON precision/roundoff for this low-resolution validation run. The scan records both the input ``PRES_SCALE`` and the output energy ratio ``100 W_p / W_B`` so future plots cannot accidentally validate only the vacuum case. The default scan deliberately uses the unit-scale VMEC input ``examples/data/input.LandremanPaul2021_QA_lowres``. Do not pair the default ESSOS LP-QA coils with the reactor-scale LP-QA input unless the coils are also scaled: the coil major radius is about 1.1 while the reactor-scale plasma has ``RBC(0,0)`` about 10.1. That mismatch was the cause of the failed high-res LP-QA run in the initial PR diagnostic. The example uses ``--activate-fsq 1e99`` by default. This forces immediate VMEC2000-style NESTOR turn-on so the short run exercises active finite-pressure vacuum coupling instead of stopping in the inactive ``ivac=-1`` cadence. That is useful for provider validation. The residuals shown here are recomputed on the accepted final state with a fresh active NESTOR sample, but this is still not a converged high-beta result: the active residual norm remains large and must be bounded against VMEC2000 before this becomes a promoted finite-beta single-stage optimization claim. Use ``--activate-fsq 1e-3`` when checking literal VMEC2000 activation cadence. Use a larger value, such as the default ``1e99``, only when the goal is to force active coupling early in a deliberately short validation run. Those early-activation runs are provider/coupling diagnostics, not evidence that the accepted equilibrium is converged to the same state as a long VMEC2000 run. The numerical summaries are runtime artifacts under the selected ``--outdir``. They are intentionally not committed, since generated WOUT files, magnetic-grid files, and validation plots are handled as release/PR artifacts. Reproduction ------------ Run all commands in this section from the repository root. The ESSOS-backed commands require an ESSOS checkout on ``PYTHONPATH`` and the Landreman-Paul QA coil JSON under ``$ESSOS_INPUT_DIR``. The beta-scan command exercises both the generated-``mgrid`` and direct-coil backends; if your ESSOS checkout does not yet provide ``Coils.to_mgrid``, add ``--skip-mgrid-runs`` to keep the direct-coil finite-beta scan runnable. The three PR-review workflows are: .. code-block:: bash export ESSOS_ROOT=/path/to/ESSOS_mgrid_pr export ESSOS_INPUT_DIR=$ESSOS_ROOT/examples/input_files PYTHONPATH=.:$ESSOS_ROOT:$PYTHONPATH \ python examples/free_boundary_essos_direct_forward.py \ --input examples/data/input.LandremanPaul2021_QA_lowres \ --max-iter 10 \ --ns 7 \ --mpol 3 \ --ntor 2 \ --nzeta 8 \ --outdir results/free_boundary_essos_direct_forward This ESSOS direct-coil forward run writes ``results/free_boundary_essos_direct_forward/input.lpqa_direct_coils``, ``wout_direct_coils.nc``, and ``summary.json``. The summary records ``fsqr/fsqz/fsql``, aspect, mean iota, coil length/current diagnostics, the ``DIRECT_COILS`` provider tag, ``mgrid: null``, and the active free-boundary/NESTOR coupling diagnostics. Use the beta-scan command below when you need the standardized finite-beta pressure profile and the generated-``mgrid`` comparison row. The generated input deck contains ``MGRID_FILE='DIRECT_COILS'`` as a provider tag for the Python direct-coil examples. It is not a standalone magnetic-grid filename: replaying that generated input through the public ``vmec`` CLI alone will not reconstruct the ESSOS coils unless a direct-coil provider object is also supplied by Python. Use the example command above, or use the generated ``mgrid`` compatibility example when you need an input deck that can be replayed without Python coil parameters. .. code-block:: bash export ESSOS_ROOT=/path/to/ESSOS_mgrid_pr export ESSOS_INPUT_DIR=$ESSOS_ROOT/examples/input_files PYTHONPATH=.:$ESSOS_ROOT:$PYTHONPATH \ python examples/free_boundary_essos_coils_beta_scan.py \ --outdir results/free_boundary_essos_coils_beta_scan_smoke \ --input examples/data/input.LandremanPaul2021_QA_lowres \ --phiedge=-0.025 \ --betas 0.0025 \ --pressure-profile standard \ --ns 12 \ --max-iter 1000 \ --ftol 1e-8 \ --mpol 5 \ --ntor 5 \ --mgrid-nr 16 \ --mgrid-nz 16 \ --mgrid-nphi 16 \ --activate-fsq 1e-3 This finite-beta scan writes ``summary.json`` plus ``wout_mgrid_beta_*.nc`` and ``wout_direct_beta_*.nc`` rows under the selected ``--outdir``. When ``--skip-mgrid-runs`` is used, only the direct-coil WOUT rows are expected. The root summary is checkpointed after each case and contains ``complete``, the coil/plasma scale summary, the mgrid path, the radial schedule, and per-run entries with backend, nominal beta label, WOUT path, residuals, aspect, mean iota, pressure, ``wp/wb`` beta proxy, and NESTOR history summaries. Staged scans additionally write ``case_checkpoints/*.json`` and per-stage WOUT/input files. .. code-block:: bash python examples/optimization/free_boundary_QS_coil_optimization.py \ --smoke \ --provider circle \ --max-evals 1 \ --max-iter 1 \ --vmec-max-iter 2 \ --helicity-m 1 \ --helicity-n 0 \ --qs-surfaces 0.25,0.5,0.75 \ --pressure-profile standard \ --beta 1.0 \ --activate-fsq 1e99 \ --outdir results/free_boundary_QS_coil_optimization_circle_smoke This single-stage free-boundary coil-optimization smoke is dependency-light because it uses the synthetic circular direct-coil provider. It writes ``input.direct_coil_qs``, ``history.json``, ``summary.json``, and ``wout_best_direct_coil_qs.nc``. The optimizer vector contains only coil current and selected coil Fourier degrees of freedom; the plasma boundary is recomputed by the free-boundary solve at each objective evaluation. The current deterministic objective contains accepted-state VMEC residual, VMEC-state quasisymmetry-ratio residual, aspect-ratio, and mean-iota terms. The QS residual is evaluated from the accepted VMEC state, not from a promoted coil-to-Boozer exact adjoint through adaptive branch selection. For a local same-branch validation artifact, add ``--write-same-branch-report``. The default report mode is complete-solve finite-difference only and avoids the cold branch-local replay compilation. Use ``--same-branch-report-mode scalar`` to additionally validate one fixed-accepted-branch ``qs_total`` gradient, or ``vector`` to validate several physical-scalar directional derivatives against the same complete-solve central finite-difference direction. The scalar can be changed with ``--same-branch-report-scalar-key``. Use ``aspect`` for a cheaper physical-scalar timing probe and ``qs_total`` for the QS-relevant scalar. The derivative report defaults to ``--same-branch-report-ad-mode direct``, which differentiates the fixed accepted-branch replay directly. In ``vector`` mode this direct path uses a JVP and reports ``J @ direction`` without materializing the full Jacobian. Use ``--same-branch-report-ad-mode custom_vjp`` only when explicitly auditing the custom-VJP seam; that path falls back to the more expensive full-Jacobian VJP diagnostic. The resulting ``same_branch_complete_solve_report.json`` includes a ``timings`` block. ``complete_solve_fd_wall_s`` measures the complete base/plus/minus finite-difference solves, while ``branch_local_scalar_wall_s`` or ``branch_local_vector_wall_s`` measures the fixed-accepted-branch replay derivative. On the current tiny circle-provider smoke, the forward objective evaluation is about two seconds, the complete finite-difference report is several seconds, and the first cold branch-local scalar replay can still take tens of seconds. That is why derivative-detail reports remain opt-in performance diagnostics rather than default example output. When scalar or vector detail is requested, the corresponding ``branch_local_*`` block also includes nested timing fields such as ``production_scalar_eval_wall_s``, ``replay_value_and_grad_dispatch_s``, ``replay_value_and_grad_ready_s``, ``replay_vjp_wall_s``, and ``replay_pullbacks_wall_s``. These fields synchronize JAX arrays before recording device-ready timings, so they are suitable for distinguishing Python dispatch, XLA compilation, and CPU/GPU execution costs in local profiling. Run the dependency-light direct-coil forward example from the repository root. This path constructs a synthetic circular ``CoilFieldParams`` object directly in ``vmec_jax`` and writes ``wout_direct_coils.nc`` plus ``summary.json`` without requiring ESSOS assets or an ``mgrid`` file. .. code-block:: bash python examples/free_boundary_direct_coils_forward.py \ --max-iter 4 \ --outdir results/free_boundary_direct_coils_forward The direct-coil examples default to JIT force kernels, matching the public ``run_free_boundary`` fast path. Add ``--no-jit-forces`` only when debugging parity or compile behavior. Run the ESSOS direct-coil forward example from the repository root. This path loads ESSOS coils, converts them to ``CoilFieldParams``, runs one low-resolution finite-pressure free-boundary forward validation run without writing an ``mgrid`` file, and writes ``wout_direct_coils.nc`` plus ``summary.json``. .. code-block:: bash export ESSOS_ROOT=/path/to/ESSOS_mgrid_pr export ESSOS_INPUT_DIR=$ESSOS_ROOT/examples/input_files PYTHONPATH=.:$ESSOS_ROOT:$PYTHONPATH \ python examples/free_boundary_essos_direct_forward.py \ --max-iter 10 \ --outdir results/free_boundary_essos_direct_forward Use ``--dry-run`` on the same command to validate the ESSOS coil conversion, the generated VMEC input deck, and the direct provider wiring without running VMEC. The generated input explicitly uses ``MGRID_FILE='DIRECT_COILS'`` and the JSON summary records ``mgrid: null``. As above, ``DIRECT_COILS`` is a Python-provider tag, not a filesystem ``mgrid`` file. Run the matched beta scan from the repository root. Until the ESSOS ``to_mgrid`` PR is merged and released, put the ESSOS branch checkout on ``PYTHONPATH``. If only released ESSOS is available, add ``--skip-mgrid-runs`` to run the direct-coil provider without generating a magnetic grid. .. code-block:: bash export ESSOS_ROOT=/path/to/ESSOS_mgrid_pr export ESSOS_INPUT_DIR=$ESSOS_ROOT/examples/input_files PYTHONPATH=.:$ESSOS_ROOT:$PYTHONPATH \ python examples/free_boundary_essos_coils_beta_scan.py \ --outdir results/free_boundary_essos_coils_beta_scan_readme \ --input examples/data/input.LandremanPaul2021_QA_lowres \ --phiedge=-0.025 \ --betas 0.00125 0.0025 0.00375 0.005 \ --pressure-profile standard \ --ns-array 16,31 \ --niter-array 600,1200 \ --ftol-array 1e-8,1e-8 \ --mpol 5 \ --ntor 5 \ --mgrid-nphi 24 \ --max-iter 1200 \ --activate-fsq 1e-3 To include a self-consistent Redl bootstrap-current preconditioner before each finite-beta equilibrium solve, add: .. code-block:: bash --bootstrap-current-fixed-point \ --bootstrap-helicity-n 0 \ --bootstrap-max-fixed-point-iter 2 \ --bootstrap-n-current 32 \ --bootstrap-vmec-max-iter 1200 \ --bootstrap-ns-array 16,31 \ --bootstrap-niter-array 300,1200 \ --bootstrap-ftol-array 1e-7,1e-8 \ --bootstrap-damping 0.5 \ --bootstrap-max-current-update-norm 0.1 \ --bootstrap-return-best-evaluated-current This leaves the plasma boundary and coils unchanged during the preconditioner; only the VMEC current profile is updated from the Redl formula. The scan summary records the per-case bootstrap-current history path, final ``CURTOR``, effective damping, and current-step limiter status so these runs are auditable. The bootstrap-stage schedule controls are intentionally separate from the final scan ``NS_ARRAY``/``NITER_ARRAY``/``FTOL_ARRAY``: use a cheaper Redl-current preconditioner schedule, then keep the final finite-beta equilibrium solve at the strict resolution required for validation. Treat the limiter as a continuation control: a coarse low-resolution Redl update can reduce the Redl mismatch while still worsening the next VMEC residual if the current step is too large. The best-evaluated-current option avoids handing the final beta solve a last proposed profile that has not yet been solved by the fixed-point loop. Use staged radial continuation for high-resolution promotion attempts. Keep ``--pressure-continuation`` enabled so each pressure point starts from the previous accepted free-boundary LCFS. Add ``--resume-existing`` when rerunning an interrupted high-resolution scan: existing ``wout_{backend}_beta_*.nc`` files are skipped and, if their residuals satisfy ``--pressure-continuation-max-fsq``, promoted as continuation seeds for the next pressure point. When ``--ns-array`` is supplied, the scan also writes ``case_checkpoints/{backend}_beta_*.json`` plus per-stage inputs and WOUT files after every accepted radial-grid stage. These stage checkpoints are independent of the root ``summary.json`` so a wall-time stop during a strict ``ns=51`` or ``ns=101`` stage still leaves the last accepted lower-resolution metrics and restart seed available to ``--resume-existing``. .. code-block:: bash export ESSOS_ROOT=/path/to/ESSOS_mgrid_pr export ESSOS_INPUT_DIR=$ESSOS_ROOT/examples/input_files PYTHONPATH=.:$ESSOS_ROOT:$PYTHONPATH \ python examples/free_boundary_essos_coils_beta_scan.py \ --outdir results/free_boundary_essos_coils_beta_scan_highres_attempt \ --input examples/data/input.LandremanPaul2021_QA_lowres \ --phiedge=-0.025 \ --betas 0 0.5 1.0 1.25 \ --pressure-profile standard \ --pressure-continuation \ --resume-existing \ --pressure-continuation-max-fsq 1e-6 \ --ns-array 16,31,51,101 \ --niter-array 1000,2000,4000,12000 \ --ftol-array 1e-8,1e-10,1e-11,1e-12 \ --mpol 5 \ --ntor 5 \ --activate-fsq 1.0 The ``--betas`` values are nominal pressure-scaling labels used to drive the scan. The actual physical beta must be read from ``summary.json`` or the WOUT file after convergence. High-Resolution DIII-D Finite-Beta Benchmark -------------------------------------------- The current reviewer-facing high-resolution axisymmetric finite-beta evidence is the VMEC2000-compatible DIII-D ``mgrid`` benchmark. .. figure:: _static/figures/freeb_diiid_mgrid_beta_ns101_panel.png :alt: DIII-D mgrid free-boundary finite-beta scan with iota profiles, cross sections, and LCFS magnetic-field contours. :width: 100% DIII-D ``mgrid`` free-boundary finite-beta scan at final ``ns=101`` and ``FTOL=1e-12``. The compressed figure is committed for PR review; the numerical summary is available as :download:`CSV <_static/figures/freeb_diiid_mgrid_beta_ns101_panel_summary.csv>`. Full-resolution external artifacts remain available for review without committing large vector/PDF files: - WOUT-panel SVG: https://gist.githubusercontent.com/rogeriojorge/f9bfe56c5de71445cf86ea0843dc6629/raw/diiid_mgrid_beta_ns101_panel.svg - WOUT-panel CSV: https://gist.githubusercontent.com/rogeriojorge/f9bfe56c5de71445cf86ea0843dc6629/raw/diiid_mgrid_beta_ns101_panel_summary.csv The plotted WOUTs use final ``ns=101`` and final ``FTOL=1e-12``. The actual WOUT beta values shown in the compressed panel are 0.00%, 0.67%, 1.49%, 2.18%, and 3.33%; all final residual sums are near ``1e-12``. The renderer annotates LCFS RMS displacement and relative LCFS ``|B|`` RMS change against the vacuum row. At actual WOUT beta 3.33%, the LCFS RMS displacement is about ``0.352``, the maximum LCFS displacement is about ``0.478``, the magnetic-axis ``R``-shift is about ``0.381``, and the relative LCFS ``|B|`` RMS change is about ``0.181``. This is promoted as a free-boundary finite-beta ``mgrid`` validation artifact. It is not a direct-coil stellarator promotion artifact. For this DIII-D ``mgrid`` row only, executable-backed VMEC2000 validation was run on the same 3.33% WOUT row. The VMEC2000 and vmec_jax high-beta WOUTs agree far below the finite-beta response: aspect differs by ``6.4e-7``, ``rmnc`` relative RMS by ``5.6e-7``, ``zmns`` relative RMS by ``3.5e-7``, ``bmnc`` relative RMS by ``5.1e-7``, and LCFS RMS displacement between codes by ``1.7e-6``. The beta-induced LCFS RMS shift is therefore about five orders of magnitude larger than the vmec_jax-vs-VMEC2000 geometric mismatch. Generate the DIII-D WOUTs from the bundled input and fetched ``mgrid`` asset: .. code-block:: bash python tools/fetch_assets.py --bundle reference-nc python tools/diagnostics/run_diiid_mgrid_beta_scan.py \ --outdir results/freeb_diiid_mgrid_beta_ns101 \ --pressure-scales 0 0.50 1.0 1.35 1.8 \ --ns-array 16,51,101 \ --niter-array 1000,4000,20000 \ --ftol-array 1e-8,1e-11,1e-12 Then render the panel directly from the generated summary: .. code-block:: bash python tools/diagnostics/render_freeb_beta_wout_panels.py \ --summary results/freeb_diiid_mgrid_beta_ns101/summary.json \ --title "DIII-D mgrid free-boundary finite-beta scan (ns=101)" \ --stem diiid_mgrid_beta_ns101_panel \ --outdir /tmp/freeb_publication_panels High-Resolution LP-QA Stellarator Gate -------------------------------------- The corrected unit-scale LP-QA input and ESSOS coil pair has two validation layers. The strict direct-coil ``ns=101`` WOUT panel is phase-1 promoted forward-validation stellarator evidence. The lower-resolution ``ns=16,31`` rows below are provenance and pressure-continuation diagnostics that explain how the basin was reached; they are not the publication-grade promotion rows. A local ``ns=16,31`` run with ``PHIEDGE=-0.025`` and ``PRES_SCALE = 1000 * nominal_beta_percent`` produced: .. list-table:: :header-rows: 1 * - Nominal beta label - Actual WOUT beta - WOUT ``fsqr+fsqz+fsql`` - Aspect - Mean iota * - 0.0 - 0.00% - ``1.66e-8`` - 6.013 - 0.409 * - 0.5 - 0.72% - ``1.67e-8`` - 6.046 - 0.405 * - 1.0 - 1.49% - ``1.02e-8`` - 6.098 - 0.395 * - 2.0 - 3.43% - ``7.94e-7`` - 6.343 - 0.191 The same low-resolution pressure-continuation schedule also follows the direct differentiable coil provider after disabling the unsafe automatic CPU ``lax.tridiagonal_solve`` R/Z preconditioner policy: .. list-table:: :header-rows: 1 * - Nominal beta label - Actual WOUT beta - WOUT ``fsqr+fsqz+fsql`` - Aspect - Mean iota * - 0.0 - 0.00% - ``1.63e-8`` - 6.014 - 0.405 * - 0.5 - 0.72% - ``1.73e-8`` - 6.048 - 0.402 * - 1.0 - 1.49% - ``1.80e-8`` - 6.097 - 0.393 * - 2.0 - 3.42% - ``4.74e-7`` - 6.343 - 0.201 These low-resolution rows are not the promoted phase-1 claim; the strict ``ns=101`` WOUT panel below is. Neither row set promotes the full nonlinear exact-adjoint path: current gradient validation still stops at accepted-boundary replay and dense low-grid NESTOR primitives. Lessons from the earlier failed attempts: - pairing the default ESSOS coils with the reactor-scale LP-QA input is invalid without coil scaling and caused the original high-resolution failure; - the native reactor-scale ``PHIEDGE`` has the wrong sign for the vacuum subroutine, while a small hand-tuned flux magnitude destroys the scale; - direct pressure jumps are much less robust than pressure continuation from accepted lower-beta equilibria; - the direct provider needs the safe Thomas R/Z preconditioner by default. Forcing the CPU ``lax`` tridiagonal path can generate a nonphysical first active R/Z update even when direct and generated-``mgrid`` ``bsqvac`` agree to roughly ``1e-3`` relative RMS. The promoted strict direct-coil ``ns=101`` local continuation run converged the vacuum, nominal ``0.5``, nominal ``1.0``, and refined nominal ``1.25`` beta labels at final ``FTOL=1e-12``. The corresponding actual WOUT beta values are ``0.00%``, ``0.724%``, ``1.508%``, and ``1.932%`` with residual sums below ``6.3e-12``. A nominal ``2.0`` label reaches actual WOUT beta about ``3.184%`` and residual sum ``3.75e-7`` from the same continuation sequence; that row is useful stress evidence but is not part of the strict promoted panel. This is committed PR-review artifact evidence, not a default CI gate and not a VMEC2000 generated-``mgrid`` WOUT parity promotion. The strict direct-coil LP-QA reviewer WOUT-panel is committed as a compressed summary figure: .. figure:: _static/figures/freeb_lpqa_direct_coil_beta_ns101_panel.png :alt: LP-QA direct-coil free-boundary finite-beta scan with iota profiles, cross sections, and LCFS magnetic-field contours. :width: 100% LP-QA direct-coil free-boundary finite-beta scan at final ``ns=101``. The strict promoted rows reach actual WOUT beta through ``1.932%`` with residual sums below ``6.3e-12``. The numerical summary is available as :download:`CSV <_static/figures/freeb_lpqa_direct_coil_beta_ns101_panel_summary.csv>`. Full-resolution external artifacts remain available for review: - WOUT-panel SVG: https://gist.githubusercontent.com/rogeriojorge/f9bfe56c5de71445cf86ea0843dc6629/raw/lpqa_direct_coil_beta_ns101_panel.svg - WOUT-panel CSV: https://gist.githubusercontent.com/rogeriojorge/f9bfe56c5de71445cf86ea0843dc6629/raw/lpqa_direct_coil_beta_ns101_panel_summary.csv The WOUT-panel renderer is reusable for both ``mgrid`` and direct-coil scans: The strict LP-QA panel was generated by first running the high-resolution pressure-continuation command above for nominal labels ``0``, ``0.5``, ``1.0``, and the refined ``1.25`` point. If the scan was interrupted, rerun the same command with ``--resume-existing`` so each accepted WOUT is reused as the next pressure-continuation seed. .. code-block:: bash python tools/diagnostics/render_freeb_beta_wout_panels.py \ --summary results/free_boundary_essos_coils_beta_scan_highres_attempt/summary.json \ --backend direct \ --max-actual-beta 2.05 \ --title "LP-QA direct-coil free-boundary finite-beta scan (ns=101)" \ --stem lpqa_direct_coil_beta_ns101_panel \ --outdir /tmp/freeb_publication_panels For ad hoc existing DIII-D WOUTs, the renderer also accepts explicit files: .. code-block:: bash python tools/diagnostics/render_freeb_beta_wout_panels.py \ --wout "0.00%=wout_diiid_b0_mg101.nc" \ --wout "0.67%=wout_diiid_b050_mg101.nc" \ --wout "1.49%=wout_diiid_b100_mg101.nc" \ --wout "2.18%=wout_diiid_b135_mg101.nc" \ --wout "3.33%=wout_diiid_b180_mg101.nc" \ --title "DIII-D mgrid free-boundary finite-beta scan (ns=101)" \ --stem diiid_mgrid_beta_ns101_panel \ --outdir /tmp/freeb_publication_panels Direct-provider nonlinear-control diagnostics now record accepted NESTOR histories for ``bnormal``, ``gsource``, ``bsqvac``, and source reuse. A short LP-QA vacuum trace showed that the unsafe ``lax`` tridiagonal path converted identical raw R/Z residual blocks into an oversized first active update. The public driver still exposes ``limit_update_rms`` and the beta-scan example exposes ``--direct-coil-limit-update-rms`` for future nonlinear-control diagnostics, but the LP-QA promotion result above does not require that limiter. Generate the benchmark summary used by the README/docs figure renderer: .. code-block:: bash python tools/benchmarks/bench_freeb_direct_coil_matrix.py \ --quick \ --out results/bench_freeb_direct_coil_matrix/summary.json Render the README/docs figures from the generated JSON summaries: .. code-block:: bash python tools/diagnostics/render_freeb_single_stage_readme.py \ --summary results/free_boundary_essos_coils_beta_scan_readme/summary.json \ --benchmark-summary results/bench_freeb_direct_coil_matrix/summary.json \ --outdir docs/_static/figures The example writes ``input.*`` decks, ``wout_*.nc`` files, a generated mgrid, and ``summary.json`` in the output directory. Those runtime files are ignored by git; the committed figures and CSV are generated artifacts for documentation only. Single-Stage Coil-Only Optimization Validation ---------------------------------------------- The initial single-stage optimization example is a bounded validation example. It optimizes only coil currents and selected coil Fourier coefficients. The VMEC plasma boundary coefficients are never included in the optimization vector; the plasma surface is recomputed by a direct-coil free-boundary solve at every objective evaluation. The default deterministic objective is: - accepted-state VMEC residual, - VMEC-state quasisymmetry-ratio residual, - aspect-ratio target, - mean-iota target. The example records ``history.json``, ``summary.json``, and the best ``wout``. It exits with code ``77`` when optional ESSOS assets are unavailable. For a dependency-light setup check that does not run VMEC or the optimizer, use ``--dry-run``. This writes ``summary.json`` with the generated VMEC input path, selected coil variables, objective weights, QS helicity/surface settings, and baseline coil diagnostics. The summary also carries a ``single_stage_limitations`` list so dry-run artifacts remain self-describing when shared without this page: .. code-block:: bash python examples/optimization/free_boundary_QS_coil_optimization.py \ --smoke \ --dry-run \ --provider circle \ --helicity-m 1 \ --helicity-n 0 \ --outdir results/free_boundary_QS_coil_optimization_circle_preview The same dry-run contract is covered for the optional ESSOS provider in CI by monkeypatching a synthetic ESSOS coil provider. The generated VMEC deck uses ``MGRID_FILE='DIRECT_COILS'`` and no generated ``mgrid`` artifact, so the example remains a direct-coil path: .. code-block:: bash export ESSOS_ROOT=/path/to/ESSOS_mgrid_pr export ESSOS_INPUT_DIR=$ESSOS_ROOT/examples/input_files PYTHONPATH=.:$ESSOS_ROOT:$PYTHONPATH \ python examples/optimization/free_boundary_QS_coil_optimization.py \ --smoke \ --dry-run \ --provider essos \ --helicity-m 1 \ --helicity-n 0 \ --outdir results/free_boundary_QS_coil_optimization_essos_preview For a bounded validation run, use the synthetic circular coil provider: .. code-block:: bash python examples/optimization/free_boundary_QS_coil_optimization.py \ --smoke \ --provider circle \ --max-evals 1 \ --max-iter 1 \ --vmec-max-iter 2 \ --helicity-m 1 \ --helicity-n 0 \ --qs-surfaces 0.25,0.5,0.75 \ --pressure-profile standard \ --beta 1.0 \ --activate-fsq 1e99 \ --outdir results/free_boundary_QS_coil_optimization_circle_smoke For the ESSOS Landreman-Paul QA coils, put ESSOS on ``PYTHONPATH`` and use: .. code-block:: bash export ESSOS_ROOT=/path/to/ESSOS_mgrid_pr export ESSOS_INPUT_DIR=$ESSOS_ROOT/examples/input_files PYTHONPATH=.:$ESSOS_ROOT:$PYTHONPATH \ python examples/optimization/free_boundary_QS_coil_optimization.py \ --smoke \ --max-evals 3 \ --helicity-m 1 \ --helicity-n 0 \ --outdir results/free_boundary_QS_coil_optimization_essos_smoke The promoted complete-loop gate now covers a VMEC-state ``qs_total`` scalar on the same fixed accepted branch as the aspect and accepted-vacuum scalars. The next scientific promotion step is replacing that VMEC-state proxy in this example with a Boozer-space QS objective and validating the same complete-loop gradients through the full Boozer/QS diagnostic path. The adaptive host branch selection itself remains outside the promoted derivative claim. Each accepted objective evaluation records a weighted objective-term breakdown for the residual, QS, aspect-ratio, and mean-iota terms. Benchmarks ---------- This lane includes lightweight, non-CI benchmark scripts. The recommended first command is the matrix runner: .. code-block:: bash python tools/benchmarks/bench_freeb_direct_coil_matrix.py \ --quick \ --out results/bench_freeb_direct_coil_matrix/summary.json The matrix runner executes the provider, direct free-boundary solve with and without JIT force kernels, and coil-gradient scripts with small CPU-only defaults. It writes each child JSON into the output directory and records the child paths plus compact timing/status rows in ``summary.json``. GPU rows are opt-in: .. code-block:: bash python tools/benchmarks/bench_freeb_direct_coil_matrix.py \ --quick \ --include-gpu \ --backend-note "local workstation validation" \ --out results/bench_freeb_direct_coil_matrix_gpu/summary.json If no JAX GPU device is available, the matrix records a skipped GPU row rather than falling back silently to CPU. Use ``--no-quick`` only for a larger local benchmark budget. The benchmark CSV/JSON is written to the requested results directory. The runner probes concrete accelerator platforms, so mixed launches such as ``JAX_PLATFORMS=cpu,cuda`` still record CUDA rows even when CPU is the default backend. The current office benchmark shows tiny direct free-boundary solves are CPU-favorable, while provider and gradient microbenchmarks have small enough kernel payloads that CUDA launch overhead dominates. GPU production work should therefore focus on larger batched/tangent workloads and accepted-point replay amortization, not on claiming a speedup from these tiny validation cases. The matrix keeps two direct-solve rows: the non-JIT diagnostic path and the default fast path with ``--jit-forces``. On the 2026-05-25 office CUDA probe, ``--jit-forces`` reduced the tiny GPU warm direct solve from roughly ``2.07 s`` to ``0.31 s`` by removing the force-evaluation bucket as the dominant cost. The follow-up free-boundary-aware fused strict update then reduced the tiny CUDA warm solve further to about ``0.25 s`` by cutting the update-state bucket to about one millisecond. The remaining warm GPU overhead is dominated by host-side iteration-control dispatch between preconditioning and accepted updates, while final NESTOR sample/solve time is already small. A split control-timing probe then localized that overhead to ``iteration_control_badjac_s``, the early bad-Jacobian state check. The default keeps the first-two-iteration VMEC safety probe; use ``VMEC_JAX_BADJAC_INITIAL_STATE_PROBE_ITERS=0`` only as an explicit profiling knob while checking VMEC2000 parity. On the tiny active direct-coil CUDA probe, that opt-in path reduced warm time from ``0.269 s`` to ``0.184 s`` and reduced the bad-Jacobian control bucket from ``77 ms`` to below ``1 ms``. The 2026-05-28 office CPU/CUDA rerun with concrete-platform GPU probing showed the same conclusion with finer buckets. The best ``--jit-forces`` tiny direct-solve row was ``0.0525 s`` warm on CPU and ``0.2346 s`` warm on CUDA. The force kernel itself was already competitive on CUDA (``0.00855 s`` CUDA versus ``0.00921 s`` CPU), and final NESTOR sample/solve time was also comparable. The remaining CUDA overhead was setup (``0.0538 s`` versus ``0.00931 s``), residual scalar materialization (``0.0293 s`` versus ``0.000764 s``), accepted-control ``fsq1`` (``0.0142 s`` versus ``0.000146 s``), and preconditioner dispatch (``0.0126 s`` versus ``0.00109 s``). The next GPU patch should therefore cache or stage static setup and reduce scalar/control dispatch; scalar-defer is not yet the right default because those residual scalars still drive VMEC control flow and output history. At the same head, the solver also uses a host flux-profile fast path for concrete default-``APHI`` iota profiles. This is a safe setup-only optimization for non-traced forward solves; differentiated/traced profile coefficients still use the JAX path. The follow-up ``office`` matrix reported the same performance conclusion: the tiny direct-coil ``--jit-forces`` row was ``0.0521 s`` warm on CPU and ``0.2318 s`` warm on CUDA, while force assembly itself was still near parity. The remaining work is setup/control staging, not Biot-Savart kernel math. The host-profile setup path is controlled by ``VMEC_JAX_HOST_PROFILE_SETUP``. With the default ``auto`` policy, the latest office CUDA matrix keeps host ``fsq1`` norms enabled but leaves primary residual products on device. The direct-coil JIT-forces row measured ``0.224 s`` warm on CUDA with the old host-residual policy and ``0.181 s`` when residual products were kept on device. The matrix also tested ``VMEC_JAX_TRIDI_PRECOMPUTE=1`` and ``VMEC_JAX_TRIDI_SOLVE=1``; both were slower than the default on the tiny direct-coil GPU row. Preconditioner dispatch/application and cold accepted-point force setup remain the next production GPU targets. The same benchmark pass tested existing opt-in knobs and did not promote them: ``VMEC_JAX_HOST_UPDATE_ON_ACCELERATOR=1`` was slower for the tiny CUDA row, and ``VMEC_JAX_BADJAC_INITIAL_STATE_PROBE_ITERS=0`` was not a robust speedup after the current accepted-control fusion. Timing-light rows confirmed that timing instrumentation is not the dominant remaining wall-time source. The June 2026 follow-up matrix kept the same conclusion. The production ``jit_forces=True`` row remains the large win: on the tiny direct-coil free-boundary case the no-JIT CUDA warm solve was about ``13.3x`` slower than CPU, while the JIT-force row reduced that to about ``2.7x``. Host-policy ablations did not beat the production JIT row on CPU or CUDA, so they remain diagnostic controls. Quiet performance-mode direct-provider free-boundary runs now enable ``light_history`` by default, which suppresses broad per-iteration histories without changing the solver branch, NESTOR coupling, or convergence logic. The next real performance seam remains first-call force/tape construction plus GPU preconditioner/setup/finalize launch overhead. The direct-solve child JSON includes active and trial NESTOR timing summaries: sample time, scalar-potential solve time, reuse counts, failed trial counts, and the final recompute sampler/solver timings. It also records a ``final_recompute_guard`` block for direct-solve children. This block compares the final accepted-state residuals against the pre-update final residuals, records final-vacuum metric deltas, and keeps ``safe_to_skip_final_recompute`` false until an explicit cached-finalization path proves parity. The matrix runner also enables ``VMEC_JAX_TIMING=1`` and ``VMEC_JAX_TIMING_DETAIL=1`` for the direct-solve child and records compact cold/warm solve-loop buckets in ``summary.json``: force evaluation, preconditioner, update, trace construction, and unattributed iteration-loop cost. These fields are the first place to inspect when a direct-coil free-boundary solve is slow, because they separate Biot-Savart sampling, the vacuum linear solve, solver-trial replay overhead, and the higher VMEC residual/update loop. The setup bucket is also split into static-grid rebuild, free-boundary policy, boundary/profile construction, cache-key hashing, ``ptau`` constants, mode-index constants, and update constants, so GPU setup work can be targeted without conflating it with the NESTOR solve. The child scripts are still useful when isolating one lane: .. code-block:: bash python tools/benchmarks/bench_external_field_providers.py \ --points 48 --segments 48 \ --out results/bench_external_field_providers.json python tools/benchmarks/bench_freeb_direct_coil_solve.py \ --max-iter 2 \ --out results/bench_freeb_direct_coil_solve.json python tools/benchmarks/bench_freeb_coil_gradient.py \ --points 24 --segments 48 --matrix-size 24 \ --out results/bench_freeb_coil_gradient.json Each benchmark writes JSON with backend/device information, cold/compile timing, warm timing, and the problem dimensions. Defaults are intentionally small and CPU-safe; GPU production benchmarks should raise the grid and segment counts explicitly. Optional VMEC2000 Diagnostics ----------------------------- The direct-coil provider is a ``vmec_jax`` research path; VMEC2000 itself reads external fields through ``mgrid`` files, not ``CoilFieldParams``. VMEC2000 diagnostics therefore validate the generated-``mgrid``/free-boundary operator side of the branch, while direct-coil evidence comes from direct-versus-generated-``mgrid`` comparisons inside ``vmec_jax``. The standalone three-way diagnostic writes a JSON report for the current research case. It always compares ``vmec_jax`` generated-``mgrid`` against ``vmec_jax`` direct coils, then attempts VMEC2000 generated-``mgrid`` if the executable is available. The generated ``mgrid`` is an interpolated compatibility backend, while direct coils sample the continuous Biot-Savart field. For one-update or short bounded traces this is a strict provider regression check. For longer active nonlinear free-boundary traces it is a finite-resolution convergence diagnostic: the accepted surface must remain inside the generated-``mgrid`` box, residuals must be physical, and the direct/generated differences should decrease as the grid is refined. .. code-block:: bash export ESSOS_ROOT=/path/to/ESSOS_mgrid_pr export ESSOS_INPUT_DIR=$ESSOS_ROOT/examples/input_files PYTHONPATH=.:$ESSOS_ROOT:$PYTHONPATH \ python tools/diagnostics/compare_freeb_coils_mgrid_vmec2000.py \ --out results/freeb_coils_mgrid_vmec2000.json \ --workdir results/freeb_coils_mgrid_vmec2000_work \ --ns-array 5,9,13 \ --niter-array 100,500,2000 \ --ftol-array 1e-8,1e-10,1e-12 For a quick provider-only validation run, skip VMEC2000 explicitly: .. code-block:: bash export ESSOS_ROOT=/path/to/ESSOS_mgrid_pr export ESSOS_INPUT_DIR=$ESSOS_ROOT/examples/input_files PYTHONPATH=.:$ESSOS_ROOT:$PYTHONPATH \ python tools/diagnostics/compare_freeb_coils_mgrid_vmec2000.py \ --niter 1 \ --mgrid-nphi 4 \ --skip-vmec2000 \ --activate-fsq 1e99 \ --out results/freeb_coils_mgrid_vmec2000_smoke.json The diagnostic defaults ``NZETA`` to ``--mgrid-nphi`` so the generated ``mgrid`` toroidal grid is compatible with VMEC's free-boundary loader. If you override ``--nzeta``, choose a value compatible with the generated grid (``kp``). Use ``--activate-fsq 1e99`` only for short parity diagnostics so ``vmec_jax`` exercises the active NESTOR/free-boundary coupling immediately instead of proving only inactive-cadence bookkeeping. Do not use forced activation as long-trace promotion evidence unless the resulting surfaces stay inside the generated-``mgrid`` domain and the final residuals are small. The JSON records ``active_free_boundary`` for both the direct-coil and generated-``mgrid`` ``vmec_jax`` backends, approximate LCFS ``boundary_extents`` for each WOUT, and ``comparisons.vmec_jax_direct_vs_generated_mgrid.boundary_vs_mgrid_domain``. That containment block reports whether each final surface is inside the generated grid and gives signed margins to the radial and vertical domain limits. The default direct/generated comparison tolerances are ``--jax-rtol 1e-5`` and ``--jax-atol 1e-7``; stricter values can be used for one-update provider regressions, while resolved free-boundary traces should be judged by mgrid-resolution convergence. When a generated ``mgrid`` has more toroidal planes than VMEC ``NZETA``, vmec_jax follows VMEC2000's ``read_mgrid_nc`` reduction and samples file planes ``0, nskip, 2*nskip, ...`` instead of taking the first ``NZETA`` planes. A bounded LP-QA low-resolution check illustrates the promoted interpretation. With ``examples/data/input.LandremanPaul2021_QA_lowres``, ``ns=12``, ``niter=300``, default activation cadence, ``mgrid=24x24x8``, and ``pressure-scale=1000``, both vmec_jax backends enter active free-boundary coupling, stay inside the generated grid, and converge to ``fsq_total≈6e-4``. Refining the generated grid to ``48x48x16`` reduces the direct/generated aspect relative gap to about ``1.3e-4`` and the iota-profile relative RMS to about ``3.5e-3``. This is finite-resolution evidence for the continuous direct-coil provider, not a claim that a coarse generated ``mgrid`` and direct Biot-Savart sampling are bitwise equivalent. The forced-active reactor-scale LP-QA stress test is intentionally retained as a failure-mode diagnostic. In that run the nonlinear direct/generated surfaces leave the generated grid and cross into nonphysical ``R<=0`` geometry, so generated-``mgrid`` interpolation clips while the direct provider continues sampling a non-toroidal surface. The comparator now reports this explicitly with ``vmec_jax_*_boundary_outside_generated_mgrid`` warnings rather than allowing the result to be mistaken for provider-parity evidence. To reproduce the bounded low-resolution finite-resolution probe: .. code-block:: bash export ESSOS_ROOT=/path/to/ESSOS_mgrid_pr PYTHONPATH=.:$ESSOS_ROOT:$PYTHONPATH JAX_ENABLE_X64=1 \ python tools/diagnostics/compare_freeb_coils_mgrid_vmec2000.py \ --essos-root "$ESSOS_ROOT" \ --skip-vmec2000 \ --input examples/data/input.LandremanPaul2021_QA_lowres \ --pressure-scale 1000 \ --phiedge-scale 1 \ --ns 12 \ --niter 300 \ --ftol 1e-8 \ --mpol 4 \ --ntor 4 \ --mgrid-nr 48 \ --mgrid-nz 48 \ --mgrid-nphi 16 \ --nzeta 8 \ --nvacskip 8 \ --out results/freeb_lowres_direct_vs_mgrid48.json \ --workdir results/freeb_lowres_direct_vs_mgrid48_work \ --no-fail-on-jax-mismatch If VMEC2000 exits before writing ``wout_*.nc``, the JSON still records the workdir, return code, whether VMEC2000 opened the vacuum grid, stdout/stderr tails, ``threed1`` tail, and parsed iteration trace. The parser includes both the force rows and free-boundary convergence channels such as ``DEL-BSQ`` and ``FEDGE``. VMEC2000 return code ``2`` is the source-level ``more_iter_flag`` and is reported as ``more_iter_exit`` when the diagnostic also has a parsed iteration trace or an explicit request to increase ``NITER``. Other nonzero exits remain ``nonzero_exit`` so true generated-grid crashes stay visible in the promotion evidence. The current low-iteration LP-QA generated-``mgrid`` VMEC2000 leg is a ``more_iter_exit`` WOUT-promotion gap, not a direct-coil provider failure: recent traces show small force rows but ``DEL-BSQ`` still near one. The JSON includes ``delbsq_over_ftolv`` so this free-boundary residual can be tracked separately from ``FSQR``, ``FSQZ``, and ``FSQL``. For local WOUT-promotion investigation, add ``--vmec2000-promotion-probes``. This optional mode leaves the default comparison untouched, then records bounded VMEC2000-only follow-up attempts such as loose ``FTOL_ARRAY``, ``LFULL3D1OUT=T``, and small ``MAX_MAIN_ITERATIONS`` values when the first VMEC2000 leg exits before WOUT. These probe rows are diagnostic evidence only: they are not used for direct-coil versus generated-``mgrid`` scoring because they intentionally alter only the VMEC2000 input deck. .. code-block:: bash PYTHONPATH=.:$ESSOS_ROOT:$PYTHONPATH \ python tools/diagnostics/compare_freeb_coils_mgrid_vmec2000.py \ --vmec2000-exec /path/to/xvmec2000 \ --vmec2000-promotion-probes \ --vmec2000-probe-ftols 1e-2,1e-3 \ --vmec2000-probe-max-main-iterations 2,5 \ --activate-fsq 1e99 \ --out results/freeb_coils_mgrid_vmec2000_with_probes.json The ``--ns-array``, ``--niter-array``, and ``--ftol-array`` options define a shared multigrid schedule used by both the ``vmec_jax`` generated-``mgrid`` and direct-coil runs. Use this shared schedule for promotion runs. The ``--vmec2000-niter`` override is only for diagnostics because it intentionally changes the VMEC2000 schedule without changing the ``vmec_jax`` schedule. The stock-executable validation run needs only a local VMEC2000 binary. It verifies that the bundled asymmetric free-boundary deck reaches the vacuum solve: .. code-block:: bash export VMEC2000_EXEC=/path/to/xvmec2000 VMEC2000_INTEGRATION=1 \ pytest -q tests/test_vmec2000_exec_fast_validation.py::test_vmec2000_free_boundary_lasym_true_reaches_vacuum_solve The bounded ``freeb_scalpot`` manifest diagnostic requires an instrumented VMEC2000 executable that honors the ``VMEC_DUMP_*`` environment variables. It compares VMEC2000 scalpot/vacuum/bextern dumps with the dense ``vmec_jax`` free-boundary path for a self-contained generated-``mgrid`` case: The comparator treats VMEC2000 ``scalpot`` and ``vacuum`` dumps as required. ``bextern``, ``fouri``, free-boundary coupling, and GC dumps remain optional and are compared when present. If a stock VMEC2000 executable exits successfully but does not emit the required dumps, ``--json`` records a structured ``missing_vmec_dumps`` error with the requested dump environment and dump-file inventory. Nonzero VMEC2000 exits are fatal only when the required dumps are missing; if the instrumented dumps exist, the comparator continues and records the VMEC return codes in the JSON output. .. code-block:: bash export VMEC2000_EXEC=/path/to/xvmec2000 VMEC2000_INTEGRATION=1 \ PYTHONPATH=. python tools/diagnostics/parity_sweep_manifest.py \ --ids freeb_nonaxis_lasym_true_cth_like_local \ --output-root results/parity/freeb_lasym_true \ --manifest tools/diagnostics/parity_manifest.toml \ --vmec-exec "$VMEC2000_EXEC" For one-off debugging of a specific iteration, run the comparator directly: .. code-block:: bash export VMEC2000_EXEC=/path/to/xvmec2000 VMEC2000_INTEGRATION=1 \ VMEC_DUMP_GC=1 \ VMEC_DUMP_GC_STAGE=precond \ PYTHONPATH=. python tools/diagnostics/vmec2000_exec_freeb_scalpot_compare.py \ --input examples/data/input.cth_like_free_bdy_lasym_small \ --vmec-exec "$VMEC2000_EXEC" \ --iter 80 \ --max-iter 120 \ --activate-fsq 1e99 \ --workdir results/freeb_scalpot_cth_like_lasym \ --json results/freeb_scalpot_cth_like_lasym/summary.json ``--activate-fsq`` is a vmec-jax-only diagnostic override. It is useful for short traces because VMEC2000's production cadence can delay vacuum activation until after the bounded iteration window; forcing the JAX side active makes the dump compare the active boundary-field, scalar-potential, and edge-pressure channels immediately. The comparator also records a JAX ``dbsq_edge_proxy`` based on ``gcon -`` extrapolated plasma ``bsq`` so VMEC2000 ``DEL-BSQ`` failures can be localized to sampled external field, NESTOR solve, or edge magnetic-pressure balance. The generated-``mgrid`` VMEC2000 comparison for the ESSOS LP-QA coil validation case is still non-promoted/xfailed. The current promoted LP-QA signal for this branch is ``vmec_jax`` direct-coil versus generated-``mgrid`` provider/sample agreement within documented tolerances, active NESTOR coupling sensitivity checks, and the direct pressure-continuation sequence above. Validation Status ----------------- PR-ready phase-1 evidence is split into default fast gates and optional external evidence. The default gates are CI-safe and cover: - direct-coil Biot-Savart derivatives with respect to currents, coil Fourier coefficients, and evaluation coordinates; - ESSOS adapter value parity when ESSOS is installed; - JAX ``mgrid`` interpolation value and gradient checks; - a direct-coil runtime hook that does not require an ``mgrid`` file and uses nonzero pressure; - active generated-``mgrid`` versus direct-coil ``vmec_jax`` provider parity for the ESSOS Landreman-Paul QA finite-pressure validation case when the optional ESSOS assets and ``Coils.to_mgrid`` path are available; - active direct-coil NESTOR-step sensitivity to coil-current changes, including the expected linear normal-field/source scaling and quadratic ``bsqvac`` scaling; - direct-provider source refresh on reuse and trial-state vacuum-field refresh, so direct coils are not scored against stale pre-update source data; - dense toy vacuum-adjoint tests. - direct-coil to implicit dense-vacuum-chain finite-difference checks for one current scale and one Fourier geometry perturbation. - JAX boundary-field projection value parity with the current NumPy implementation plus finite-difference checks with respect to both field samples and boundary geometry. - direct-coil to JAX boundary projection to implicit dense-vacuum-chain finite-difference checks for one current scale and one Fourier geometry perturbation. - VMEC-style source symmetrization and mode-RHS projection value parity with the host implementation, plus finite-difference gradients with respect to the source values. - dense mode-space vacuum solve and reconstruction tests, including stellarator-symmetric and LASYM-style basis blocks plus finite-difference gradients through a direct-coil projected source/RHS/mode-space chain. - fixed-boundary dense mode-space NESTOR AD-vs-central-finite-difference checks for both stellarator-symmetric and ``LASYM`` tiny direct-coil cases, covering one coil current and one coil Fourier geometry coefficient through the JAX chain direct coils -> boundary projection -> VMEC/NESTOR source/matrix assembly -> dense mode solve. - reusable pytree directional-derivative checks for optimizer-facing direct-coil objectives, so current and Fourier-geometry controls are checked together instead of only by bespoke scalar tests. - same-branch complete-solve AD-vs-central-finite-difference custom-VJP gates for one coil current, one Fourier geometry coefficient, and a mixed stellsym/``LASYM`` direction, with branch-fingerprint checks that reject adaptive controller changes. - branch-local production-forward scalar/vector replay gates. Both the current-only and Fourier-geometry representatives cover aspect ratio plus accepted ``Bnormal`` and ``Bsqvac`` RMS scalars; the current-only representative also covers VMEC-state ``qs_total``, and the Fourier-geometry representative also covers an LCFS boundary moment. These validate fixed accepted-branch replay, not a general derivative of adaptive ``run_free_boundary`` branch selection. Optional evidence includes ESSOS-backed full finite-pressure response tests, VMEC2000 executable comparisons, and ``RUN_FULL=1`` complete-solve finite response checks. These are review and nightly lanes rather than default CI requirements. The optional VMEC2000 generated-``mgrid`` comparison is present but xfailed for now. VMEC2000 reads the generated grid and advances the trace locally, but the current generated-``mgrid`` free-boundary parity gap is not bounded tightly enough for a promoted gate. The comparator now handles both main and Nyquist WOUT mode bases for low-order geometry and magnetic-field arrays, so the remaining blocker is not array-shape handling: sign-flipped diagnostic runs can produce a VMEC2000 WOUT, but that WOUT still has underconverged/zero geometric scalars and fails the current iota/energy limits. The diagnostic reports this explicitly as ``vmec2000_wout_available=true`` but ``vmec2000_wout_promotable=false`` with reason ``nonpositive_geometry_scalars``. Dump-to-dump VMEC2000 comparisons require an instrumented executable that honors the ``VMEC_DUMP_*`` environment variables. That is a validation task, not a reason to regress the existing VMEC2000-parity ``mgrid`` fixtures. Next Implementation Steps ------------------------- - Bound active accepted-equilibrium sensitivity to direct coil parameters with realistic ESSOS/high-beta full-solve finite-difference checks, then promote the optional xfail. - Replace the phase-1 coil-only optimization proxy with a Boozer/QS objective once the complete direct-coil free-boundary loop has validated gradients. - Promote the VMEC2000 generated-``mgrid`` comparison after the direct/mgrid trace discrepancy is bounded. - Replace the dense validation vacuum-adjoint primitive with the production matrix-free/custom-linear-solve NESTOR operator. - Promote the accepted-boundary replay gate to a complete-loop free-boundary gradient once the host NumPy state bridge is removed or wrapped by a validated custom adjoint, then add Boozer/QS gradient checks. - Run larger CPU/GPU benchmark matrices before making broad accelerator claims; keep the JSON summaries and documentation plots refreshed from those runs. Literature Anchors ------------------ - Merkel's NESTOR integral-equation formulation converts the toroidal Neumann vacuum problem into Fourier-space linear equations with singular-kernel regularization, which is the operator we eventually need to expose as a differentiable JAX linear solve. - Antonsen, Paul, and Landreman's VMEC adjoint work demonstrates the expected order-of-magnitude advantage of adjoints over direct finite differences for stellarator equilibrium sensitivities, including objectives such as quasisymmetry and magnetic well. - DESC's JAX-based quasisymmetry optimization demonstrates the practical value of exact derivatives from a single equilibrium solution instead of a number of equilibrium solves that scales with design-space dimension. - Recent automated adjoint work for spectral PDE solvers supports the same implementation principle: differentiate the assembled operator graph and use adjoint linear solves, rather than differentiating through each solver iteration.