Optimisation with vmec_jax =========================== ``vmec_jax`` supports end-to-end differentiable optimisation of MHD equilibria through a discrete-adjoint Jacobian that is **exact** (no finite differences) and runs entirely inside a single Python process. This page covers: - the mathematical approach and how it differs from SIMSOPT + VMEC2000, - the key source files and public API, - the algorithms (Gauss-Newton, line search, adjoint replay), - how to reproduce the QA, QH, QP, and QI examples, - regeneration targets for ``max_mode=1..5`` optimisation sweeps and explicit interpretation of archived lower-mode snapshots. Current reproducible workflow ----------------------------- Editable workflow anatomy ~~~~~~~~~~~~~~~~~~~~~~~~~ The standalone QA/QH/QP scripts are meant to be read top to bottom. They keep the scientific problem visible instead of hiding it behind a large driver: 1. Define VMEC and stage controls with top-level variables, then instantiate ``vj.FixedBoundaryVMEC.from_input(...)``. 2. Instantiate objective objects and assemble explicit SIMSOPT-style ``objective_tuples = [(objective.J, target, weight), ...]``. 3. Build ``vj.LeastSquaresProblem.from_tuples(objective_tuples)``. 4. Call ``vj.least_squares_solve(...)`` with optimizer, continuation, ESS, device, and output controls only. 5. Unpack the returned ``FixedBoundaryOptimizationResult`` to save inputs, WOUT files, history JSON, diagnostics, and plots with explicit function calls. Do not pass physics shortcuts such as ``target_aspect`` or ``qi_options`` to ``least_squares_solve``. Physics targets live in the objective tuple list, and QI sampling/threshold choices live in ``QuasiIsodynamicOptions`` plus the QI objective objects. That separation is the main API hygiene rule: changing the science problem means editing the visible objective block; changing the optimizer means editing the solve-call keywords. Use the standalone scripts for editable single-case studies. The archived README/docs comparison rows are reproduced by the sweep drivers, which preserve the exact policy matrix and selected best-case provenance. .. list-table:: :header-rows: 1 :widths: 18 42 40 * - Target - Editable example - Notes * - QA - ``examples/optimization/QA_optimization.py`` - Minimal NFP=2 deck by default, aspect target 5, signed mean-iota target, QA residual; reference-family preseed is explicit and recorded. * - QH - ``examples/optimization/QH_optimization.py`` - Minimal NFP=4 deck by default, aspect target 5, ``abs(mean_iota) >= 0.41``, QH residual; warm starts are optional user edits. * - QP - ``examples/optimization/QP_optimization.py`` - Minimal NFP=2 deck by default, aspect target 5, ``abs(mean_iota) >= 0.41``, QP residual, mirror ratio, and elongation terms. The final optimized QP row is also the seed for the separate :doc:`piecewise_omnigenous_plan`. * - QI - ``examples/optimization/QI_optimization.py`` - Minimal NFP=2 deck by default with Boozer-space QI, mirror, elongation, QI ceiling, ESS, and lower-mode continuation. Optional target-helicity/reference-family proposals are deterministic basin searches, not hidden initial conditions. * - QI seed 3127 - ``examples/optimization/QI_optimization_seed.py`` - Compact reproducibility preset for the reviewed NFP=3 seed-3127 row. It delegates to ``QI_optimization.py`` after writing the boundary-reference preconditioner controls. When using sweep commands, keep backend selection and report labeling separate. ``--backend-label`` only names output directories and table rows; it does not select a JAX backend. Select the process backend with ``JAX_PLATFORMS=cpu``, ``JAX_PLATFORM_NAME=gpu``, or ``JAX_PLATFORMS=cuda``, and use ``--solver-device cpu`` / ``--solver-device gpu`` when the optimizer should force one device. Spawned workers inherit the parent backend unless ``--worker-jax-platforms`` is set. Verify reproducibility from ``case_result.json`` or generated summary fields ``jax_backend``, ``jax_device_kind``, ``solver_device``, and ``jax_platforms`` rather than from the output path alone. Existing successful ``case_result.json`` files are reused unless ``--rerun`` is passed. Minimal far-from-goal seed files are available for NFP=1, 2, 3, and 4: ``examples/data/input.minimal_seed_nfp1`` through ``examples/data/input.minimal_seed_nfp4``. They all come from ``vj.minimal_fixed_boundary_indata(nfp=...)`` and contain only ``RBC(0,0)``, ``RBC(0,1)``, and ``ZBS(0,1)`` as nonzero boundary coefficients. Use those inputs to test whether a QA/QH/QP/QI policy can build the target field structure from a common seed that does not already encode the target helicity. Keep those raw input decks unchanged. The standalone QA/QH/QP examples use ``vj.prepare_simple_omnigenity_seed_input(...)`` as an optimization-time preprocessor: it writes ``input.simple_seed`` in the run directory, preserves only ``RBC(0,0)``, ``RBC(0,1)``, and ``ZBS(0,1)`` from the selected seed deck, and fills active ``RBC/ZBS`` modes up to ``max_mode`` with deterministic ``1e-5`` perturbations. This keeps the published input files minimal while avoiding exactly-zero Jacobian columns in direct max-mode stress tests. The lower-level staged QI policy also has a mode-1 target-helicity preconditioner with the deterministic hint set ``RBC(1,0)``, ``ZBS(1,0)``, ``RBC(-1,1)``, ``ZBS(-1,1)``, ``RBC(1,1)``, and ``ZBS(1,1)`` in VMEC input-index convention. The common-minimal showcase uses a ``1e-3`` target-helicity hint by default; the raw input decks remain unchanged. The QA and QP common-minimal rows additionally use an explicit optimization-time reference-family preseed without modifying the raw input decks: QA blends the active low-order RBC/ZBS space 25% toward ``input.nfp2_QA_omnigenity``, and QP blends 10% toward ``input.nfp2_QI``. This gives the transform residual a usable derivative before local exact optimization and is recorded in ``showcase_case.json``. The bounded common-seed showcase runner maps those inputs to QI NFP=1/2/3/4, QA NFP=2/3, QH NFP=3/4, and QP NFP=2/3 for the full common-minimal target matrix (``qp_nfp1`` is also available as a stress row). The QI rows dispatch through ``examples/optimization/qi_staged_runner.py`` to the standalone ``QI_optimization.py`` staged/reference-family policy rather than the simpler quasisymmetry sweep path. The checked-in objective panel and summary table currently contain the synced aspect-5, ``max_mode=5`` QA/QH/QP GPU rows. The common-minimal QI rows remain open; the separate QI NFP panel documents reviewed case-gated QI coverage instead of the uniform common-minimal matrix: .. code-block:: bash PYTHONPATH=. JAX_PLATFORMS=cuda python3 examples/optimization/generate_minimal_seed_showcase.py \ --cases qa_nfp2,qa_nfp3,qh_nfp3,qh_nfp4,qp_nfp2,qp_nfp3,qi_nfp1,qi_nfp2,qi_nfp3,qi_nfp4 \ --backend-label gpu --solver-device gpu --worker-jax-platforms cuda \ --policy continuation --max-mode 5 --ess on \ --max-nfev 70 --continuation-nfev 20 \ --inner-max-iter 550 --inner-ftol 1e-10 \ --trial-max-iter 550 --trial-ftol 1e-10 \ --ess-alpha 1.2 --case-timeout-s 7200 --rerun PYTHONPATH=. python examples/optimization/render_minimal_seed_showcase.py --publication-matrix For a bounded one-case smoke render after a timeout or partial QI checkpoint, use: .. code-block:: bash PYTHONPATH=. python examples/optimization/render_minimal_seed_showcase.py \ --output-root /path/to/minimal_seed_showcase \ --figure-dir /path/to/figures \ --cases qi_nfp1 --skip-missing For a clean reproduction, keep ``--rerun`` on the generator. Replace the three CUDA/GPU flags with ``cpu`` for a slower local CPU-only reproduction. Without it, existing successful ``showcase_case.json`` rows are reused, which can leave old rows on disk; the renderer skips known-stale rows by default and ``--include-stale`` should be reserved for debugging. The current common-minimal QI dispatch uses policy-case names ``minimal_nfp1_qi``, ``minimal_nfp2_qi``, ``minimal_nfp3_qi``, and ``minimal_nfp4_qi``; the public aliases ``nfp1_qi`` through ``nfp4_qi`` now point to those same minimal-seed cases. Any older QI showcase rows under ``.../qi_nfp*/continuation/qp_preseed/...`` or explicitly named far-seed cases such as ``qi_stel_seed_3127`` predate this dispatch and should not be treated as current common-minimal staged-QI evidence. Any QA/QP showcase rows without ``reference_preseed`` provenance also predate the current common-seed policy. The saved objective and state panels below are compact status artifacts for the current common-minimal QA/QH/QP mode-5 rows. The companion CSV/table in :doc:`optimization_sweep_results` includes QA NFP=2/3, QH NFP=3/4, and QP NFP=2/3. Stress artifacts that terminate away from the aspect-5 target are not part of the public README/docs matrix. Current non-stale ``minimal_nfp1_qi``/``minimal_nfp2_qi``/ ``minimal_nfp3_qi``/``minimal_nfp4_qi`` outputs are still missing from the common-minimal summary; do not infer QI common-seed success from the separate case-gated QI coverage panel. .. image:: _static/figures/minimal_seed_showcase_objective_panel.png :width: 100% :align: center :alt: Common minimal-seed optimization stress-test histories The companion state panel is generated only for non-stale rows with available initial/final WOUT provenance. It shows the raw user-facing minimal seed, final LCFS, full best-so-far objective history, and initial/final Boozer ``|B|`` line contours. Optimization-time target-helicity or reference-family preseeds are tracked separately in the CSV via ``stage_seed_kind`` and ``stage_seed_input`` so a later stage input cannot be mistaken for the raw initial seed. Regenerate the state panel from the same result root before using geometry panels as promotion evidence. The current minimal-seed QI rows remain missing. .. image:: _static/figures/minimal_seed_showcase_state_panel.png :width: 100% :align: center :alt: Common minimal-seed initial/final state and Boozer contour stress-test panel The rendered table is available as :download:`minimal_seed_showcase_summary.csv <_static/figures/minimal_seed_showcase_summary.csv>`. Its provenance columns ``initial_kind``, ``initial_input``, ``initial_wout``, ``stage_seed_kind``, ``stage_seed_input``, and ``final_wout`` distinguish the raw seed, first optimization-time seed, and final artifact used by the plots. For generated comparison artifacts, run QA/QH/QP/QI with the current QI default of no same-mode QP preseed, then run the focused QI matrix separately when preseed/no-preseed is the experiment: .. code-block:: bash PYTHONPATH=. JAX_PLATFORMS=cpu python examples/optimization/generate_qs_ess_sweep.py --backend-label cpu --solver-device cpu --policy continuation --problems qa,qh,qp,qi --modes 1,2,3,4,5 --ess both --qi-qp-preseed off --max-nfev 70 --continuation-nfev 25 --inner-max-iter 180 --inner-ftol 1e-9 --trial-max-iter 180 --trial-ftol 1e-9 --rerun PYTHONPATH=. JAX_PLATFORMS=cpu python examples/optimization/generate_qs_ess_sweep.py --backend-label cpu --solver-device cpu --policy direct --problems qa,qh,qp,qi --modes 1,2,3,4,5 --ess both --qi-qp-preseed off --max-nfev 70 --continuation-nfev 25 --inner-max-iter 180 --inner-ftol 1e-9 --trial-max-iter 180 --trial-ftol 1e-9 --rerun PYTHONPATH=. python examples/optimization/render_qs_ess_publication_panel.py PYTHONPATH=. JAX_PLATFORMS=cpu python examples/optimization/generate_qs_ess_sweep.py --backend-label cpu --solver-device cpu --policy continuation --problems qi --modes 1,2,3,4,5 --ess both --qi-qp-preseed both --max-nfev 70 --continuation-nfev 25 --inner-max-iter 180 --inner-ftol 1e-9 --trial-max-iter 180 --trial-ftol 1e-9 --rerun PYTHONPATH=. JAX_PLATFORMS=cpu python examples/optimization/generate_qs_ess_sweep.py --backend-label cpu --solver-device cpu --policy direct --problems qi --modes 1,2,3,4,5 --ess both --qi-qp-preseed both --max-nfev 70 --continuation-nfev 25 --inner-max-iter 180 --inner-ftol 1e-9 --trial-max-iter 180 --trial-ftol 1e-9 --rerun PYTHONPATH=. python examples/optimization/render_qi_constrained_sweep.py For QI seed robustness, first rank solved seed candidates without optimizing, then edit the top-level ``INPUT_FILE``, ``OUTPUT_DIR``, seed-helper flags, and reference path in ``QI_optimization.py`` before running the standalone script: .. code-block:: bash PYTHONPATH=. python examples/optimization/audit_qi_seed_suitability.py --quick --csv results/qi_seed_audit.csv PYTHONPATH=. python examples/optimization/audit_qi_seed_suitability.py --quick --smooth-qi-max 5e-3 --legacy-qi-max 2e-3 --csv results/qi_seed3127_audit.csv PYTHONPATH=. python examples/optimization/audit_qi_seed_suitability.py --quick --prefine-probes plan --prefine-manifest results/qi_seed_audit/prefine_manifest.json --prefine-output-dir results/qi_seed_audit/prefine_probes PYTHONPATH=. JAX_PLATFORMS=cpu python examples/optimization/QI_seed_robustness.py PYTHONPATH=. JAX_PLATFORMS=cpu python examples/optimization/QI_optimization.py PYTHONPATH=. JAX_PLATFORMS=cpu python examples/optimization/QI_optimization_seed.py The second audit command uses the far-seed QI gate convention from ``QI_optimization.py``: legacy QI below ``2e-3`` and smooth differentiable QI below ``5e-3``. The explicit target-aspect and output-dir overrides above reproduce the archived mixed-target rows currently embedded in the docs QI coverage figure. Omit those overrides only when regenerating the current uniform aspect-5 policy. The docs QI coverage figure is rendered from existing reviewed ``QI_optimization.py`` outputs. These rows are archived mixed-target case checks, not additional aspect-5 README best-row promotions: the NFP=1 lane uses target aspect 10, the NFP=2 target-helicity lane uses target aspect 6, the seed-3127 lane uses target aspect 4, and the NFP=4 row starts from the three-coefficient minimal seed with a same-NFP finite-beta QI reference-family preconditioner. It is not a completed common-minimal QI showcase row. Regenerate all rows with the current uniform aspect-5 policy before using this figure as current README promotion evidence. The finite-beta NFP=4 input is kept as a separate stress fixture, not as the README initial state: .. list-table:: :header-rows: 1 :widths: 20 25 10 10 10 9 8 8 8 8 10 * - Input - Output/provenance - Final J - Smooth QI - Legacy QI - Mirror - Elong. - Aspect - Iota - CPU min - Status * - ``examples/data/input.nfp1_QI`` - ``docs/_static/qi_readme_cases/nfp1`` - ``1.56e-2`` - ``1.48e-3`` - ``7.75e-4`` - ``0.242/0.30`` - ``6.24/8.2`` - ``9.999/10.0`` - ``0.5369`` - ``15.8`` - ``case-gated`` * - ``examples/data/input.minimal_seed_nfp2_target_helicity`` - ``docs/_static/qi_readme_cases/nfp2_target_helicity`` - ``1.61e-2`` - ``1.52e-3`` - ``5.25e-4`` - ``0.240/0.30`` - ``6.51/8.2`` - ``6.006/6.0`` - ``-0.5994`` - ``28.7`` - ``case-gated`` * - ``examples/data/input.QI_stel_seed_3127`` - ``docs/_static/qi_readme_cases/nfp3_seed3127`` - ``9.33e-2`` - ``4.11e-3`` - ``1.01e-3`` - ``0.304/0.35`` - ``4.00/8.0`` - ``3.541/4.0`` - ``-1.0401`` - ``4.6`` - ``case-gated`` * - ``examples/data/input.minimal_seed_nfp4`` - ``docs/_static/qi_readme_cases/nfp4_minimal`` - ``2.52e-2`` - ``2.56e-3`` - ``2.54e-4`` - ``0.287/0.35`` - ``4.14/8.2`` - ``6.011/6.0`` - ``-1.2930`` - ``0.4`` - ``case-gated`` .. image:: _static/figures/readme_qi_optimization_cases.png :width: 100% :align: center :alt: QI optimization coverage for reviewed NFP=1, NFP=2, NFP=3, and NFP=4 inputs The initial and final Boozer ``|B|`` panels in that figure use line contours only. The staged objective panel concatenates every recorded history file and plots the best-so-far value in each stage, normalized to that stage's first objective, with dashed separators where objective definitions or weights change. For the seed-3127 lane, the inset is a boundary-reference interpolation scan, not an optimizer trajectory. Regenerate the figure and CSV without launching new optimization jobs with: .. code-block:: bash PYTHONPATH=. python examples/optimization/render_qi_readme_cases.py The renderer reads reviewed local or release-asset WOUT files plus tracked diagnostics, preconditioner summaries, and raw per-stage ``history.json`` files. Large WOUTs are intentionally ignored by git to keep clones small, so fetch or regenerate the corresponding WOUT assets before rerendering from a clean clone. The plotted history uses best-so-far stage normalization for readability; inspect the raw ``history.json`` files when auditing non-monotone or stalled optimizer stages. The tracked preconditioner summaries intentionally keep only scalar scan metrics and selected-``lambda`` flags, not local scratch ``input`` or ``wout`` paths from the original optimization machine. The robustness probe is intentionally a QI+iota basin test, not a full engineering acceptance claim. ``QI_optimization.py`` is now the single staged driver for the promoted path: it can run a low-QI basin search, an iota ramp with a QI ceiling, and guarded mirror/elongation cleanup, with exact independent diagnostics deciding which stages are promoted. For far seeds it can also run a bounded basin prefilter over ESS-scaled boundary jumps before local optimization. Far-seed stages can use a cheaper QI grid for the optimization and a higher-resolution final audit recorded in ``diagnostics.json``. Promote a QI result only after the independent smooth-QI, legacy-QI, iota, mirror-ratio, elongation, and Boozer ``|B|`` contour checks agree. Landscape and basin-survey diagnostics use fast trial solves unless ``--exact-solve`` is passed; use exact solves before treating their scalar values as promotion evidence. The archived NFP=4 QI coverage row starts from ``examples/data/input.minimal_seed_nfp4``, whose boundary contains only ``RBC(0,0)``, ``RBC(0,1)``, and ``ZBS(0,1)``. The checked-in passing coverage row uses a same-NFP finite-beta QI reference-family proposal as a deterministic basin-capture step, then runs a bounded local QI audit/refine stage. This is a case-gated NFP=4 result, not a non-stale common-minimal showcase row and not yet proof that arbitrary NFP=4 seeds converge. The archived labels ``nfp3_qi``, ``nfp4_qi_finite_beta``, and ``nfp4_qh_warm_to_qi`` remain in the sweep/case registry for rendering and stress testing; the finite-beta NFP=4 entry is a finite-beta NFP=4 stress fixture, and ``nfp4_qh_warm_to_qi`` remains an explicit non-passing stress fixture for QH-to-QI conversion. Motivation: differentiability without finite differences --------------------------------------------------------- SIMSOPT's canonical fixed-boundary QH workflow calls VMEC2000 as a black-box subprocess and builds the Jacobian column by column using finite differences:: # SIMSOPT / VMEC2000 approach for i in range(n_params): perturb boundary DOF i by ε run VMEC2000 subprocess # one full solve per column J[:, i] = (f(x+ε·eᵢ) - f(x)) / ε For ``n_params = 24`` boundary DOFs this is 24 extra forward solves per Jacobian evaluation — expensive, and the result is only accurate to ``O(√ε_machine)`` due to finite-difference cancellation. ``vmec_jax`` uses a **discrete-adjoint replay** instead: 1. Run one "exact" forward solve with the adjoint flag, storing a compressed checkpoint tape of the iteration trajectory. 2. Propagate tangent vectors through the tape via batched JVP (``jax.vmap(jax.jvp(...))``, visiting each recorded iteration step in forward order exactly once per tangent batch. The Jacobian is thereby computed to **machine precision** in a time roughly equal to ``1 + fraction-of-solve × n_params`` forward-solve equivalents, rather than ``n_params`` full forward solves. In practice, for ``n_params ≤ 24`` the Jacobian build takes roughly 0.5–1.5× the cost of a single tight solve — comparable to SIMSOPT + finite differences with ``n_params = 1`` but covering the full parameter space at once. → See :doc:`discrete_adjoint` for a full mathematical description. Comparison with SIMSOPT ------------------------ .. list-table:: :header-rows: 1 :widths: 35 32 33 * - Feature - vmec_jax - SIMSOPT + VMEC2000 * - Jacobian method - Exact discrete-adjoint replay - Finite differences (columnar) * - Jacobian cost - ≈ 1.5 × forward solve - n_params × forward solve * - Subprocess required - No — pure Python/JAX - Yes — VMEC2000 Fortran binary * - Accuracy - Machine-precision (exact) - O(√ε\_machine) FD error * - GPU support - Yes (JAX device) - No * - Differentiable through optimizer - Yes (JAX autodiff) - No Current wall times and objective values are generated by ``examples/optimization/generate_qs_ess_sweep.py`` and rendered in the policy sweep below. The docs avoid fixed one-off numbers in this comparison table so that benchmark values do not drift from the generated CSV/JSON artifacts. → See :doc:`simsopt_comparison` for a detailed runtime, memory, and algorithm comparison. SIMSOPT-style objective tuples ------------------------------ The recommended public workflow mirrors the SIMSOPT ``LeastSquaresProblem`` style: create objective objects, then pass explicit ``(objective_function, target, weight)`` tuples to ``vj.LeastSquaresProblem.from_tuples``. The objective function should be the ``.J`` method on the objective object, or any callable with signature ``(ctx, state)`` returning a scalar or vector. .. code-block:: python vmec = vj.FixedBoundaryVMEC.from_input( INPUT_FILE, max_mode=MAX_MODE, min_vmec_mode=MIN_VMEC_MODE, output_dir=OUTPUT_DIR, ) aspect = vj.AspectRatio() iota_floor = vj.AbsMeanIotaFloor(TARGET_ABS_IOTA_MIN) qs = vj.QuasisymmetryRatioResidual( helicity_m=HELICITY_M, helicity_n=HELICITY_N, surfaces=SURFACES, ) objective_tuples = [ (aspect.J, TARGET_ASPECT, ASPECT_WEIGHT), (iota_floor.J, 0.0, IOTA_FLOOR_WEIGHT), (qs.J, 0.0, QS_WEIGHT), ] problem = vj.LeastSquaresProblem.from_tuples(objective_tuples) The tuple weight follows SIMSOPT semantics. ``vmec_jax`` minimizes the least-squares residual .. code-block:: text sqrt(weight) * (objective(ctx, state) - target) for each scalar entry returned by the objective. Do not pre-apply ``sqrt(weight)`` inside the callback. Put physics targets and regularization strengths in the tuple list, and put optimizer controls such as continuation stages, ESS scaling, tolerances, device selection, and output directories in ``least_squares_solve``. Finite-beta physics metrics use the same tuple form. For example, a smooth magnetic-well floor and a Mercier-stability floor can be added alongside QS or QI terms without changing the optimizer setup: .. code-block:: python well = vj.MagneticWell(minimum=0.0, softness=1.0e-3) dmerc = vj.DMerc(minimum=0.0, softness=1.0e-3) glasser = vj.GlasserResistiveInterchange(maximum=0.0, softness=1.0e-3) objective_tuples = [ (aspect.J, TARGET_ASPECT, ASPECT_WEIGHT), (iota_floor.J, 0.0, IOTA_FLOOR_WEIGHT), (qs.J, 0.0, QS_WEIGHT), (well.J, 0.0, MAGNETIC_WELL_WEIGHT), (dmerc.J, 0.0, DMERC_WEIGHT), (glasser.J, 0.0, GLASSER_WEIGHT), ] problem = vj.LeastSquaresProblem.from_tuples(objective_tuples) ``MagneticWell`` evaluates the VMEC/SIMSOPT magnetic-well proxy from the state-derived half-mesh volume derivative, also available directly as ``vj.magnetic_well_from_state(...)``. ``DMerc`` uses the differentiable state-level Mercier/JXBFORCE path and returns one smooth lower-bound residual per interior full-mesh surface. ``GlasserResistiveInterchange`` uses the same profile reductions to penalize ``D_R > maximum`` for the resistive Glasser-Greene-Johnson necessary condition ``D_R <= 0``. Since ``D_R`` has a ``1 / shear**2`` factor, this criterion is physically valid only on nonzero magnetic-shear surfaces; inspect ``glasser_shear_valid`` diagnostics before interpreting axis-adjacent or zero-shear values. A positive ``shear_epsilon`` is only an optimization regularization. These stability objectives encode their thresholds internally, so their tuple target must be ``0.0``. See :doc:`jxbforce_mercier` and :doc:`references` [10-11] for the normalization and references. QI objectives use the same tuple syntax, but the QI field-quality terms are routed through the Boozer/QI problem path so they can share one Boozer transform. QI tuple targets must be ``0.0``; encode thresholds and smoothing inside the objective object, for example ``QuasiIsodynamicOptions``. ``VMECMirrorRatio(threshold=...)`` and ``MaxElongation(threshold=...)`` are ordinary solved-VMEC-state objectives, so they can be added to QA/QH/QP problems without depending on QI options or running Boozer at every callback. ``MirrorRatio(threshold=...)`` evaluates the same mirror scalar from Boozer ``|B|`` modes and is useful when a QI solve already shares one Boozer field or when final Boozer diagnostics need exact consistency with the plotted spectra. Use ``include_bounce_endpoints=True`` when you want the smooth QI residual to sample the same normalized bounce-level endpoints as the legacy Goodman-style branch-shuffle diagnostic. .. code-block:: python qi_options = vj.QuasiIsodynamicOptions( surfaces=np.linspace(0.1, 1.0, 6), mboz=18, nboz=18, nphi=151, nalpha=31, n_bounce=51, include_bounce_endpoints=True, branch_width_weight=0.5, profile_weight=0.1, shuffle_profile_weight=1.0, jit_booz=True, # default; faster in current CPU/GPU QI diagnostics # Optional, closer to legacy arr_out=True but more expensive: # shuffle_profile_nphi_out=501, ) qi = vj.QuasiIsodynamicResidual(qi_options) mirror = vj.VMECMirrorRatio( threshold=0.21, surfaces=qi_options.surfaces, ntheta=96, nphi=96, surface_index=None, # all selected surfaces smooth_extrema=2.0e-2, # smoother gradients than hard max/min smooth_penalty=2.0e-2, ) qi_ceiling = vj.QuasiIsodynamicResidualCeiling( maximum=2.0e-3, smooth_penalty=2.0e-3, qi_options=qi_options, ) elongation = vj.MaxElongation( threshold=8.0, ntheta=48, nphi=16, ) objective_tuples = [ (vj.AspectRatio().J, 5.0, 1.0), (vj.AbsMeanIotaFloor(0.41).J, 0.0, 200.0**2), (qi.J, 0.0, 1.0), (qi_ceiling.J, 0.0, 100.0), (mirror.J, 0.0, 10.0), (elongation.J, 0.0, 10.0), ] qi_problem = vj.LeastSquaresProblem.from_tuples(objective_tuples) The staged QI example builds an explicit ``vj.make_qi_optimization_context(...)`` from the same top-level variables used to assemble the objective tuple list. That context is passed to seed and checkpoint helpers so the driver remains editable while shared implementation details stay in ``vmec_jax.qi_optimization``. For QI cleanup work, rank solved candidates with the no-solve component report before promoting any scalar objective result. Mirror-ratio cleanup must be guarded by a QI residual ceiling or by an independent engineering promotion gate. Endpoints that improve mirror but fail the independent QI gate are rejected rather than promoted. The report records the QI+iota gate separately from the full engineering gate and sorts mirror-cleanup candidates ahead of low-mirror states that already failed smooth or legacy QI: .. code-block:: bash PYTHONPATH=. JAX_PLATFORMS=cpu python tools/diagnostics/qi_objective_component_report.py \ --output results/diagnostics/qi_mirror_cleanup_rank.json \ --include-bounce-endpoints --mboz 18 --nboz 18 --nphi 151 --nalpha 31 \ --n-bounce 51 \ --case current:results/qi_mirror_probe/input_nfp2_qi_branch5_mode3_nfev12/input.final:results/qi_mirror_probe/input_nfp2_qi_branch5_mode3_nfev12/wout_final.nc \ --case cleanup:results/qi_mirror_probe/input_nfp2_qi_light_mirror20_matrixfree_nfev10/input.final:results/qi_mirror_probe/input_nfp2_qi_light_mirror20_matrixfree_nfev10/wout_final.nc Add ``--branch-width-weight 5.0`` only when reproducing the branch-heavy optimization objective components rather than the independent promotion gate. Quasi-helical symmetry example -------------------------------- The ``examples/optimization/QH_optimization.py`` script replicates the SIMSOPT QH benchmark entirely within ``vmec_jax``. It has no argparse — all parameters are top-level variables, and the objective is built explicitly as a small list. After the objective definition, the script shows the same setup-and-solve flow used by the QA/QP/QI examples: .. code-block:: python INPUT_FILE = DATA_DIR / "input.nfp4_QH_warm_start" MAX_MODE = 5 MIN_VMEC_MODE = MAX_MODE + 2 SIMPLE_SEED_PERTURBATION = 1.0e-5 MAX_NFEV = 70 METHOD = "scipy" # also: "auto", "auto_scalar", "gauss_newton", "scipy_matrix_free", "lbfgs_adjoint", "scalar_trust" SCIPY_TR_SOLVER = "lsmr" # also: "exact" for small dense trust-region solves SOLVER_DEVICE = None # set to "cpu" or "gpu" to force one backend SAVE_STAGE_INPUTS = True # keep per-stage input decks SAVE_STAGE_WOUTS = False # set True to write per-stage WOUT files HELICITY_M = 1 HELICITY_N = -1 TARGET_ASPECT = 5.0 TARGET_ABS_IOTA_MIN = 0.41 SURFACES = np.arange(0.0, 1.01, 0.1) vmec = vj.FixedBoundaryVMEC.from_input( INPUT_FILE, max_mode=MAX_MODE, min_vmec_mode=MIN_VMEC_MODE, output_dir=OUTPUT_DIR, ) STAGE_MODES = vj.qs_stage_modes( max_mode=MAX_MODE, use_mode_continuation=USE_MODE_CONTINUATION, continuation_nfev=CONTINUATION_NFEV, ) aspect = vj.AspectRatio() iota_floor = vj.AbsMeanIotaFloor(TARGET_ABS_IOTA_MIN) qs = vj.QuasisymmetryRatioResidual( helicity_m=HELICITY_M, helicity_n=HELICITY_N, surfaces=SURFACES, ) objective_tuples = [ (aspect.J, TARGET_ASPECT, ASPECT_WEIGHT), (iota_floor.J, 0.0, IOTA_WEIGHT), (qs.J, 0.0, QS_WEIGHT), # Optional physics terms: # (vj.LgradB(threshold=0.30, smooth_penalty=1.0e-3).J, 0.0, 0.01), # (vj.MagneticWell(minimum=0.0).J, 0.0, 1.0), # (vj.VolavgB().J, TARGET_VOLAVGB, VOLAVGB_WEIGHT), # (vj.BetaTotal().J, TARGET_BETA, BETA_WEIGHT), # (vj.DMerc(minimum=0.0, softness=1.0e-3).J, 0.0, DMERC_WEIGHT), # (vj.GlasserResistiveInterchange(maximum=0.0, softness=1.0e-3).J, 0.0, GLASSER_WEIGHT), # (vj.JDotB(surfaces=(0.25, 0.50, 0.75)).J, 0.0, JDOTB_WEIGHT), # (vj.BDotB(surfaces=(0.25, 0.50, 0.75)).J, TARGET_BDOTB, BDOTB_WEIGHT), # (vj.BDotGradV(surfaces=(0.25, 0.50, 0.75)).J, TARGET_BDOTGRADV, BDOTGRADV_WEIGHT), # (vj.ToroidalCurrent(surfaces=(0.25, 0.50, 0.75)).J, TARGET_TORCUR, TORCUR_WEIGHT), # (vj.ToroidalCurrentGradient(surfaces=(0.25, 0.50, 0.75)).J, TARGET_TORCUR_PRIME, TORCUR_PRIME_WEIGHT), # (vj.BVector(s_index=-1).J, TARGET_B_VECTOR, B_VECTOR_WEIGHT), # (vj.JVector(surfaces=(0.25, 0.50, 0.75)).J, TARGET_J_VECTOR, J_VECTOR_WEIGHT), ] problem = vj.LeastSquaresProblem.from_tuples(objective_tuples) result = vj.least_squares_solve( vmec, problem, stage_modes=STAGE_MODES, max_nfev=MAX_NFEV, continuation_nfev=CONTINUATION_NFEV, method=METHOD, ftol=FTOL, gtol=GTOL, xtol=XTOL, use_ess=USE_ESS, ess_alpha=ALPHA, label=LABEL, use_mode_continuation=USE_MODE_CONTINUATION, inner_max_iter=INNER_MAX_ITER, inner_ftol=INNER_FTOL, trial_max_iter=TRIAL_MAX_ITER, trial_ftol=TRIAL_FTOL, solver_device=SOLVER_DEVICE, scipy_tr_solver=SCIPY_TR_SOLVER, save_stage_inputs=SAVE_STAGE_INPUTS, save_stage_wouts=SAVE_STAGE_WOUTS, save_final_outputs=False, ) history = result.history objective_history = result.objective_history timing = result.timing_summary saved_paths = vj.save_optimization_result(result, output_dir=OUTPUT_DIR) print(f"Final aspect ratio: {history['aspect_final']:.6g}") print(f"Final mean iota: {history['iota_final']:.6g}") print(f"Final field objective: {history['qs_final']:.6e}") print(f"Wall time: {timing['total_wall_time_s']:.2f} s") print(f"Recent objectives: {objective_history[-3:]}") wout_final = vj.load_wout(saved_paths.final_wout) theta, zeta, b_lcfs = vj.vmecplot2_bmag_grid( wout_final, s_index=-1, ntheta=64, nzeta=64, zeta_max=2.0 * np.pi / float(wout_final.nfp), ) print(f"LCFS |B| grid shape: {b_lcfs.shape}, Bmax={np.max(b_lcfs):.6g}") plot_paths = { "boundary_comparison": vj.plot_3d_boundary_comparison( saved_paths.initial_wout, saved_paths.final_wout, outdir=OUTPUT_DIR, ), "boozer_lcfs_bmag_contours": vj.plot_boozer_lcfs_bmag_comparison( saved_paths.initial_wout, saved_paths.final_wout, outdir=OUTPUT_DIR, ), "objective_history": vj.plot_objective_history( saved_paths.history, outdir=OUTPUT_DIR, ), } print(plot_paths) The helper above only writes the standard artifacts. If you need custom filenames, call ``result.initial_optimizer.save_input``, ``result.initial_optimizer.save_wout``, ``result.final_optimizer.save_input``, ``result.final_optimizer.save_wout``, and ``result.final_optimizer.save_history`` directly. Objective callbacks receive ``(ctx, state)`` and may return a scalar or vector. For continuation runs, use ``result.initial_stage`` and ``result.final_stage`` for the common endpoints, and ``result.stage_histories`` or ``result.stage_timing_summaries`` for per-stage accepted exact-replay history and timing. The raw records remain available as ``result.stage_records`` for custom inspection, but user scripts do not need to unpack them just to save, plot, or report final outputs. Set ``save_final_outputs=False`` when demonstrating explicit artifact writes as above; omit it when the default ``input.final``, ``wout_final.nc``, and ``history.json`` filenames are sufficient. ``result.final_params`` and ``result.final_state`` refer to the selected exact accepted point, so saved final artifacts do not come from an unreplayed relaxed trial point. The tuple weight follows SIMSOPT semantics: vmec_jax minimizes ``sqrt(weight) * (J - target)``. Problem-specific targets and QI sampling options live on the objective objects/problem, while ``least_squares_solve`` only receives optimizer, continuation, device, and output controls. Run it with: .. code-block:: bash python examples/optimization/QH_optimization.py Recommended standalone scripts ------------------------------ The individual scripts are workflow examples, not the canonical benchmark tables. They intentionally expose all important controls as top-level variables, so users can edit the file exactly as in the SIMSOPT examples: VMEC resolution, active boundary modes, objective terms, weights, ESS scaling, continuation policy, and optimizer options. Use these four scripts as the current starting points for fixed-boundary optimization: .. list-table:: :header-rows: 1 :widths: 18 28 54 * - Target - Script - Default objective tuple set * - QA - ``examples/optimization/QA_optimization.py`` - ``AspectRatio``, signed ``MeanIota`` target, and ``QuasisymmetryRatioResidual(helicity_m=1, helicity_n=0)``. This is the recommended public QA example. * - QH - ``examples/optimization/QH_optimization.py`` - ``AspectRatio``, ``AbsMeanIotaFloor``, and ``QuasisymmetryRatioResidual(helicity_m=1, helicity_n=-1)`` on the NFP=4 warm start. * - QP - ``examples/optimization/QP_optimization.py`` - ``AspectRatio``, ``AbsMeanIotaFloor``, and ``QuasisymmetryRatioResidual(helicity_m=0, helicity_n=-1)`` from the bundled NFP=2 QI seed. * - QI - ``examples/optimization/QI_optimization.py`` - ``AspectRatio``, ``AbsMeanIotaFloor``, ``QuasiIsodynamicResidual``, ``MirrorRatio``, and ``MaxElongation`` with lower-mode continuation and ESS. The default ``STAGE_MODE_POLICY = "lower-repeat"`` repeats each spectral rung for far circular seeds; ``"lower"`` is shorter, and ``"repeat"`` repeats only the final mode for near-basin cleanup. .. code-block:: bash PYTHONPATH=. JAX_PLATFORMS=cpu python examples/optimization/QA_optimization.py PYTHONPATH=. JAX_PLATFORMS=cpu python examples/optimization/QH_optimization.py PYTHONPATH=. JAX_PLATFORMS=cpu python examples/optimization/QP_optimization.py PYTHONPATH=. JAX_PLATFORMS=cpu python examples/optimization/QI_optimization.py Those scripts write ``input.initial``, ``input.final``, ``wout_initial.nc``, ``wout_final.nc``, ``history.json``, and per-case diagnostic plots. The checked-in README QA/QH/QP rows are archived lower-mode rows from reviewed ``qs_ess_sweep`` outputs; the archived README QI row is selected from a standalone ``QI_optimization.py`` lane until the current aspect-5 constrained-QI sweep has complete passing rows. Compact panels include the source-initial LCFS, final LCFS, objective history, and initial/final Boozer :math:`|B|` line contours. The common-minimal seed inputs are intentionally close to circular tori and can sit on a zero-transform branch. The QA/QH/QP examples now use ``SIMPLE_SEED_PERTURBATION = 1e-5`` as a tiny derivative seed. QP is more sensitive to direct high-mode starts, so it uses the same tiny perturbation but keeps a lower-mode continuation sequence from ``vj.qs_stage_modes(...)`` before the final high-mode cleanup. This avoids the nonphysical zero-iota branch that can otherwise produce enormous initial QP residuals. These are script-level optimization-policy choices: change them directly when studying direct high-mode starts or alternate seeds. Full QA/QH/QP/QI policy sweep ----------------------------- The sweep below compares four target objectives: - QA: the reference omnigenity NFP=2 QA deck, aspect ratio target 5, signed mean iota target 0.42, and quasi-axisymmetry. - QH: the bundled NFP=4 warm start, aspect ratio target 5, quasi-helical symmetry, and a smooth ``abs(mean_iota) >= 0.41`` lower bound. - QP: aspect ratio target 5, quasi-poloidal symmetry, and a smooth ``abs(mean_iota) >= 0.41`` lower bound, using the same bundled NFP=2 seed as the QI runs. - QI: aspect ratio target 5, a differentiable smooth Boozer-space quasi-isodynamic residual evaluated through ``booz_xform_jax``, maximum mirror-ratio penalty, maximum-LCFS-elongation penalty, and the same smooth ``abs(mean_iota) >= 0.41`` lower bound. ``LgradB`` is available as an optional commented term in the example scripts. The current objective priority is primary symmetry/QI quality and rotational transform control first. The active QA/QH/QP/QI regeneration target uses aspect ratio 5, with case-specific staged QI diagnostics available for harder far seeds. QA also uses the signed iota-0.42 target, while QH/QP/QI use ``abs(mean_iota) >= 0.41``. ``LgradB`` remains available for users who want extra magnetic-gradient regularization, but it is not active in the default sweeps or best-row selection. When enabling ``LgradB`` in an adjoint optimization, use a small ``smooth_penalty`` so the softplus penalty remains differentiable near the threshold. Each problem is run with staged mode continuation and with direct-start mode expansion. Each policy is run with and without ESS using ``alpha = 1.2``, matching the gentler scaling used by the local omnigenity optimization workflows. For QI, the focused constrained sweep additionally compares a same-mode QP preseed against the public minimal NFP=2 start. This QP preseed is useful as a controlled basin proposal, but its scalar objective is different from the QI refinement objective and is therefore not stitched into the plotted QI history. The CLI default is ``--qi-qp-preseed off``; use ``--qi-qp-preseed on`` or ``--qi-qp-preseed both`` to run the QP-preseed diagnostic rows. The public minimal seed contains only axisymmetric elongation, so QI stages first add deterministic active-mode hints and then project the boundary to the requested active space: ``max(abs(m), abs(n)) <= max_mode``. When enabled, the QP preseed is followed by a final refinement with the full QI + mirror-ratio + elongation objective. ``LgradB`` remains an optional commented objective term in the scripts, but it is not active in the default sweeps. The previous QI-only preseed and profile-locking penalty are disabled by default because the local reference QI workflow optimizes bounce/well-width label dependence directly, plus mirror, elongation, and magnetic-gradient scale-length penalties, rather than forcing a prescribed Boozer ``|B|`` profile. This keeps QP as an explicit optional experiment while still measuring whether the preseed helps or hurts the constrained QI solve. ``QI_OPTIONS.phimin`` selects the beginning of the one-field-period well interval used by the smooth QI residual. The bundled NFP=2 seed uses the default ``0.0``; set it to ``np.pi / nfp`` when comparing against a reference configuration whose well starts one half-period later. Columns in archived checked-in panels may correspond to older ``max_mode = 1, 2, 3`` sweep rows. The current regeneration target extends through ``max_mode=5``. The vertical dotted lines mark continuation stage boundaries. QA/QH/QP continuation uses the repeated omnigenity-style policy ``[1, 1, 2, 2, 2]`` for ``max_mode=2`` and ``[1, 1, 2, 2, 2, 3, 3, 3]`` for ``max_mode=3``; higher modes follow the same repeated-stage pattern when continuation is enabled. QI defaults to ``STAGE_MODE_POLICY = "lower-repeat"`` for seed robustness; same-mode repeat remains available for near-basin cleanup via ``STAGE_MODE_POLICY = "repeat"`` or the corresponding CLI flag. Mode-4 and mode-5 rows should be promoted only after matching reviewed CSV/JSON rows and figures are regenerated. When the full matrix is regenerated, the objective panels contain the CPU/GPU policy sweep. Solid curves met the optimizer success criterion; dashed curves are stopped, failed, or budgeted lanes. The summary tables identify whether a dashed lane reached ``max_nfev``, hit the per-case timeout, or failed earlier such as from GPU OOM. Curves are split by objective stage and plotted as best-so-far values within that stage, so QP preseed and full constrained QI refinement are not treated as one continuous scalar objective. The QA input follows the omnigenity ``input.nfp2_QA_omnigenity`` deck and carries nonzero mode-1 boundary terms so the iota residual has a useful derivative. Direct QA without ESS remains a weak policy for high direct-start modes; staged continuation is the default research-grade policy for QA. The large all-policy panels, atlases, summary-table images, and PDF snapshots are generated assets. Recreate them from the sweep results with the commands below. The README keeps only the compact best-result PNGs; full publication sweeps belong in :doc:`optimization_sweep_results` as local result trees or release assets. Copy only compact, reviewed, documentation-critical artifacts into ``docs/_static/figures``. Keep checked-in PNG/JPEG figures compressed below the docs hygiene gate (currently 2 MiB each); larger atlases, PDFs, and full-sweep panels should stay in ignored result directories or release assets until they are explicitly reviewed and compacted. .. code-block:: bash PYTHONPATH=. JAX_PLATFORMS=cpu python examples/optimization/generate_qs_ess_sweep.py --backend-label cpu --solver-device cpu --policy continuation --problems qa,qh,qp,qi --modes 1,2,3,4,5 --ess both --qi-qp-preseed off --max-nfev 70 --continuation-nfev 25 --inner-max-iter 180 --inner-ftol 1e-9 --trial-max-iter 180 --trial-ftol 1e-9 --rerun PYTHONPATH=. JAX_PLATFORMS=cpu python examples/optimization/generate_qs_ess_sweep.py --backend-label cpu --solver-device cpu --policy direct --problems qa,qh,qp,qi --modes 1,2,3,4,5 --ess both --qi-qp-preseed off --max-nfev 70 --continuation-nfev 25 --inner-max-iter 180 --inner-ftol 1e-9 --trial-max-iter 180 --trial-ftol 1e-9 --rerun PYTHONPATH=. JAX_PLATFORM_NAME=gpu python examples/optimization/generate_qs_ess_sweep.py --backend-label gpu --solver-device gpu --policy continuation --problems qa,qh,qp,qi --modes 1,2,3,4,5 --ess both --qi-qp-preseed off --max-nfev 70 --continuation-nfev 25 --inner-max-iter 180 --inner-ftol 1e-9 --trial-max-iter 180 --trial-ftol 1e-9 --rerun PYTHONPATH=. JAX_PLATFORM_NAME=gpu python examples/optimization/generate_qs_ess_sweep.py --backend-label gpu --solver-device gpu --policy direct --problems qa,qh,qp,qi --modes 1,2,3,4,5 --ess both --qi-qp-preseed off --max-nfev 70 --continuation-nfev 25 --inner-max-iter 180 --inner-ftol 1e-9 --trial-max-iter 180 --trial-ftol 1e-9 --rerun PYTHONPATH=. python examples/optimization/render_qs_ess_publication_panel.py Run the constrained QI preseed/no-preseed matrix explicitly with: .. code-block:: bash PYTHONPATH=. JAX_PLATFORMS=cpu python examples/optimization/generate_qs_ess_sweep.py --backend-label cpu --solver-device cpu --policy continuation --problems qi --modes 1,2,3,4,5 --ess both --qi-qp-preseed both --max-nfev 70 --continuation-nfev 25 --inner-max-iter 180 --inner-ftol 1e-9 --trial-max-iter 180 --trial-ftol 1e-9 --rerun PYTHONPATH=. JAX_PLATFORMS=cpu python examples/optimization/generate_qs_ess_sweep.py --backend-label cpu --solver-device cpu --policy direct --problems qi --modes 1,2,3,4,5 --ess both --qi-qp-preseed both --max-nfev 70 --continuation-nfev 25 --inner-max-iter 180 --inner-ftol 1e-9 --trial-max-iter 180 --trial-ftol 1e-9 --rerun PYTHONPATH=. python examples/optimization/render_qi_constrained_sweep.py The default per-case timeout is ``1800 s``. The current omnigenity-style science configs use ``inner_max_iter = trial_max_iter = 120`` and ``ftol = trial_ftol = 1e-9``. GPU production sweeps cap those values at 180 if a future config requests a larger replay budget, so high-mode/LASYM cases are bounded without silently switching to diagnostic budgets. Add ``--diagnostic-budgets`` only for bounded quick-look GPU diagnostics, and use ``--case-timeout-s 0`` only for an unbounded local diagnostic run. Timed-out sweep workers are started in a private process group when supported; the parent terminates that group, including solver or GPU descendant processes, before writing the timeout result. The continuation runner also writes bounded ``stage_checkpoint.json`` records: a pending checkpoint before a long QI stage, a pre-diagnostic checkpoint after the solve returns, and a promotion checkpoint after exact QI diagnostics. The standalone QI runner materializes root-level ``input.initial`` and ``input.final`` files for each completed stage before advancing, so the next stage always starts from the exact accepted point rather than an unreplayed trial or a nested continuation artifact. If a later high-mode/QI stage times out, the final ``case_result.json`` is annotated from the latest checkpoint so the CSV still preserves the last accepted objective, aspect, iota, QI, mirror, elongation, timing, and stage label. These partial rows are useful diagnostics, but they are not promoted as README best rows unless their independent physics gates pass. For checkpoint/profiling probes, the QI example accepts ``--qi-mboz``, ``--qi-nboz``, ``--qi-nphi``, ``--qi-nalpha``, and ``--qi-n-bounce`` overrides; production scripts should usually leave the top-level ``OPT_QI_RESOLUTION`` defaults in place. To recreate one row, restrict ``--policy`` and ``--problems``. For example, this reruns the checked-in archived README QA row: .. code-block:: bash PYTHONPATH=. JAX_PLATFORMS=cpu python examples/optimization/generate_qs_ess_sweep.py --backend-label cpu --solver-device cpu --policy continuation --problems qa --modes 3 --ess on --rerun PYTHONPATH=. python examples/optimization/render_qs_ess_publication_panel.py The GPU rows run through the same accepted-point tape exact/replay path as CPU with GPU-calibrated optimizer budgets; vmec_jax does not silently switch GPU optimization callbacks to CPU or to scan exact. If you need a short diagnostic matrix, append ``--diagnostic-budgets``; otherwise the script uses calibrated production budgets and records any non-converged case as a normal ``max_nfev`` stop. Force ``VMEC_JAX_OPT_EXACT_PATH=scan`` only for a targeted scan-exact profiling or parity experiment. .. code-block:: bash PYTHONPATH=. JAX_PLATFORM_NAME=gpu python examples/optimization/generate_qs_ess_sweep.py --backend-label gpu --solver-device gpu --policy continuation --problems qa,qh,qp,qi --modes 1,2,3,4,5 --ess both --qi-qp-preseed off --max-nfev 70 --continuation-nfev 25 --inner-max-iter 180 --inner-ftol 1e-9 --trial-max-iter 180 --trial-ftol 1e-9 --rerun PYTHONPATH=. JAX_PLATFORM_NAME=gpu python examples/optimization/generate_qs_ess_sweep.py --backend-label gpu --solver-device gpu --policy direct --problems qa,qh,qp,qi --modes 1,2,3,4,5 --ess both --qi-qp-preseed off --max-nfev 70 --continuation-nfev 25 --inner-max-iter 180 --inner-ftol 1e-9 --trial-max-iter 180 --trial-ftol 1e-9 --rerun PYTHONPATH=. python examples/optimization/render_qs_ess_publication_panel.py Non-Stellarator-Symmetric Sweeps ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Append ``--stellarator-asymmetric`` to set ``LASYM = T`` in the in-memory VMEC input and optimize ``RBC/ZBS/RBS/ZBC`` instead of only the stellarator-symmetric ``RBC/ZBS`` families. The sweep deterministically seeds zero asymmetric ``RBS/ZBC`` degrees of freedom with ``1e-7`` so exact Jacobians do not start from a completely inactive asymmetric subspace. Results are written under ``results/qs_ess_sweep//asymmetric/`` and the renderer creates additional ``*_asymmetric_*`` objective panels, state atlases, summary tables, and full publication panels when those cases are present. .. code-block:: bash PYTHONPATH=. JAX_PLATFORMS=cpu python examples/optimization/generate_qs_ess_sweep.py --backend-label cpu --solver-device cpu --policy continuation --problems qa,qh,qp,qi --modes 1,2,3,4,5 --ess both --qi-qp-preseed off --stellarator-asymmetric --max-nfev 70 --continuation-nfev 25 --inner-max-iter 180 --inner-ftol 1e-9 --trial-max-iter 180 --trial-ftol 1e-9 --rerun PYTHONPATH=. JAX_PLATFORMS=cpu python examples/optimization/generate_qs_ess_sweep.py --backend-label cpu --solver-device cpu --policy direct --problems qa,qh,qp,qi --modes 1,2,3,4,5 --ess both --qi-qp-preseed off --stellarator-asymmetric --max-nfev 70 --continuation-nfev 25 --inner-max-iter 180 --inner-ftol 1e-9 --trial-max-iter 180 --trial-ftol 1e-9 --rerun PYTHONPATH=. JAX_PLATFORM_NAME=gpu python examples/optimization/generate_qs_ess_sweep.py --backend-label gpu --solver-device gpu --policy continuation --problems qa,qh,qp,qi --modes 1,2,3,4,5 --ess both --qi-qp-preseed off --stellarator-asymmetric --max-nfev 70 --continuation-nfev 25 --inner-max-iter 180 --inner-ftol 1e-9 --trial-max-iter 180 --trial-ftol 1e-9 --rerun PYTHONPATH=. JAX_PLATFORM_NAME=gpu python examples/optimization/generate_qs_ess_sweep.py --backend-label gpu --solver-device gpu --policy direct --problems qa,qh,qp,qi --modes 1,2,3,4,5 --ess both --qi-qp-preseed off --stellarator-asymmetric --max-nfev 70 --continuation-nfev 25 --inner-max-iter 180 --inner-ftol 1e-9 --trial-max-iter 180 --trial-ftol 1e-9 --rerun PYTHONPATH=. python examples/optimization/render_qs_ess_publication_panel.py The checked-in LASYM summary rows are partial bounded-time lanes rather than a complete matrix. This is intentional: timeout and GPU-memory failures are useful performance data for the asymmetric exact/replay path. LASYM figures/atlases must be regenerated before being described as published figures. The current frozen snapshot contains 23 CPU LASYM rows and 36 GPU LASYM rows. The CPU subset has 11 successful rows, 8 crashed rows, and 4 budgeted stops; the GPU subset has 11 successful rows, no crashed rows, and 25 budgeted stops. For NVIDIA-only JAX installations, ``JAX_PLATFORMS=cuda`` is also valid. Do not use ``JAX_PLATFORMS=gpu``: some JAX versions interpret that as both CUDA and ROCm and fail if ROCm is not installed. The renderer writes these report artifacts under ``examples/optimization/results/qs_ess_sweep``. For ordinary README updates, copy only compact best-row panels into ``docs/_static/figures``. For a full-sweep docs publication, attach bulky objective panels, initial/final state atlases, wall-time summary-table figures, and PDF snapshots to a release unless they are compact enough to pass the repository size gate and have been reviewed as documentation-critical. Keep synchronized CSV/JSON summaries compact and document the provenance in :doc:`optimization_sweep_results`. Initial and final equilibria for CPU/GPU continuation/direct cases are rendered separately so the 3D surfaces and boundary-field colorbars remain readable. The ``|B|`` panels use line contours on the LCFS, not filled contours. The renderer emits ``initial_final_state_atlas_*.png/.pdf`` files locally for reports, with legacy ``final_state_atlas_*`` aliases; they should be promoted to docs assets only after the full-sweep rows have been reviewed. Finite-beta stage-one examples ------------------------------ The finite-beta examples reproduce the VMEC-only stage-one part of the ``single_stage_optimization_finite_beta`` workflows without SIMSOPT or coils. They use the finite-pressure/current input decks in ``examples/data`` and add JAX-differentiable global residuals for aspect ratio, iota bounds, volume-averaged field proxy, and total beta. QA and QH then add the usual quasisymmetry residual; QI adds the smooth Boozer-space quasi-isodynamic residual through ``booz_xform_jax``. The scripts now build pressure and bootstrap-current profile data from the same standard SIMSOPT-style profile bundle. ``vj.standard_finite_beta_profiles`` creates polynomial density and temperature profiles ``ne = ne0 * [1, 0, 0, 0, 0, -0.99]`` and ``Te = Te0 * [1, -0.99]`` with the Landreman finite-beta scaling, then forms ``pressure_pa = e * (ne*Te + ni*Ti)``. ``vj.with_pressure_profile`` writes that pressure into the VMEC input deck, and the same ``ne``/``Te`` coefficients feed ``vj.RedlBootstrapMismatch`` when ``BOOTSTRAP_WEIGHT`` is enabled. This keeps the pressure profile and self-consistent bootstrap-current residual on a single differentiable source of truth. Unlike the public QA/QH/QP/QI teaching scripts above, these finite-beta stage-one examples call ``FixedBoundaryExactOptimizer`` directly rather than ``least_squares_solve``. That is intentional: each continuation stage builds stage-local pressure/current profile data, finite-beta global residual blocks, and optional Redl/QI residual closures before running the optimizer. The shared ``finite_beta_stage1_common.py`` helper only standardizes stage budgets, saved inputs/wouts, histories, and plots; it does not hide objective assembly behind a config object. .. code-block:: bash PYTHONPATH=. python examples/optimization/qa_optimization_finite_beta.py PYTHONPATH=. python examples/optimization/qh_optimization_finite_beta.py PYTHONPATH=. python examples/optimization/qi_optimization_finite_beta.py All controls are top-level variables in those scripts: ``MAX_MODE``, ``MAX_NFEV``, ``USE_ESS``, ``USE_MODE_CONTINUATION``, and ``SOLVER_DEVICE``. The scripts also expose ``INNER_MAX_ITER``, ``INNER_FTOL``, ``TRIAL_MAX_ITER``, and ``TRIAL_FTOL`` so deck-controlled VMEC budgets can be kept or overridden explicitly. Set ``SOLVER_DEVICE = "gpu"`` or run with ``JAX_PLATFORM_NAME=gpu`` on a machine with a working JAX GPU install. The QI finite-beta script additionally exposes ``QI_MBOZ``, ``QI_NBOZ``, ``QI_NPHI``, ``QI_NALPHA``, and ``QI_N_BOUNCE``. Its defaults are intended as diagnostic first-run settings; increase these grid controls before treating a QI finite-beta refinement as a final research-quality result. The current implementation includes differentiable finite-beta global diagnostics, current-driven iota through ``PCURR_TYPE = "cubic_spline_ip"``, a ``vj.DMerc`` lower-bound residual for stellarator-symmetric and LASYM equilibria, a ``vj.GlasserResistiveInterchange`` upper-bound residual for the ``D_R <= 0`` resistive-interchange condition, VMEC/JXBFORCE profile accessors ``vj.JDotB``, ``vj.BDotB`` and ``vj.BDotGradV``, and state-derived current-profile accessors ``vj.ToroidalCurrent`` and ``vj.ToroidalCurrentGradient``. Pass ``surfaces=(...)`` to those profile objects to select nearest full-mesh surfaces, or omit it to use all interior radial surfaces. ``ToroidalCurrent`` uses VMEC's Mercier normalization ``signgs * 2*pi * ``; its gradient is the radial derivative ``ip`` used in ``DMerc``. Two vector-valued accessors are also available for advanced targeting: ``vj.BVector(s_index=...)`` returns Cartesian ``(Bx, By, Bz)`` on one radial surface and ``vj.JVector(surfaces=...)`` returns VMEC flux-coordinate current density components ``(J^theta, J^zeta) = (itheta/sqrtg, izeta/sqrtg)`` on the selected surfaces. These are flattened residual blocks; users should choose their own target arrays and normalizations before adding them to a problem. The Redl bootstrap-current mismatch is available as ``vj.RedlBootstrapMismatch``: .. code-block:: python profiles = vj.standard_finite_beta_profiles(BETA_PERCENT) indata = vj.with_pressure_profile(indata, profiles.pressure_pa) ne_coeffs = vj.profile_to_power_series_coeffs(profiles.ne).tolist() te_coeffs = vj.profile_to_power_series_coeffs(profiles.Te).tolist() redl = vj.RedlBootstrapMismatch( helicity_n=HELICITY_N, ne_coeffs=ne_coeffs, Te_coeffs=te_coeffs, Ti_coeffs=te_coeffs, Zeff_coeffs=1.0, surfaces=(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9), ) objective_tuples.append((redl.J, 0.0, BOOTSTRAP_WEIGHT)) The Redl algebra follows the SIMSOPT/Redl et al. fit formula. vmec_jax evaluates the geometry from differentiable VMEC state channels on nearest full-mesh surfaces using fixed quadrature for the trapped-particle fraction. This keeps the term differentiable and usable in the discrete-adjoint least-squares workflow; final validation against a Boozer-space Redl geometry choice should still be part of any publication-quality finite-beta study. The deterministic plan for converting a Redl ```` profile into a VMEC ``PCURR_TYPE = "cubic_spline_ip"`` current profile without optimizing shape or coil variables is documented in :doc:`bootstrap_current_fixed_point`. The full multi-page artifact inventory, including legacy aliases, CSV/JSON summary downloads, and exact reproduction commands for each standalone example, is collected in :doc:`optimization_sweep_results`. QI objective tuning ------------------- The current standalone ``QI_optimization.py`` defaults start from ``input.minimal_seed_nfp2``, write an ``input.simple_seed`` in the run directory, add deterministic target-helicity hints to avoid zero Jacobian columns, and keep same-NFP reference-family preconditioning disabled unless the user explicitly enables it: .. code-block:: bash PYTHONPATH=. python examples/optimization/QI_optimization.py For a smaller focused QI+iota basin probe from the bundled stellarator seed, run: .. code-block:: bash PYTHONPATH=. JAX_PLATFORMS=cpu python examples/optimization/QI_seed_robustness.py Both scripts optimize QI, aspect ratio, and a differentiable ``abs(mean_iota) >= 0.41`` floor. The public QA/QH/QP/QI examples now target aspect ratio 5 so their README panels can be compared directly. The ``input.QI_stel_seed_3127`` far-seed lane still needs deterministic same-NFP reference-family boundary preconditioning before local cleanup. The preconditioner scans interpolation points between the raw seed and the bundled NFP=3 QI reference, solves each candidate, ranks them with independent smooth/legacy QI, mirror, elongation, aspect, and iota gates, and then continues from the lowest-mirror accepted non-endpoint candidate when one exists. That preconditioned candidate is also recorded as the accepted baseline, so a later cleanup stage cannot overwrite it unless exact diagnostics improve. For this far-seed case, the legacy Goodman-style QI metric uses a tight ``2e-3`` gate while the smooth differentiable proxy uses an explicit ``5e-3`` cap; this avoids rejecting a legacy-good state only because the smooth surrogate is more conservative on the six-surface high-resolution audit. The script therefore prints both a ``QI+iota`` gate and a stricter engineering gate that also includes mirror ratio and elongation. It also writes a Boozer-coordinate ``|B|`` line-contour plot; VMEC-angle contour plots alone are not accepted as a QI visual gate. The study compared direct versus repeated-stage continuation, QP pre-seeding, aspect-ratio weights, mirror/elongation soft-wall weights, QI branch-width weights, the branch-shuffle profile residual, ``phimin`` well-interval choices, and termination tolerances against the nfp=2 ``examples/data/input.nfp2_QI`` seed and the bundled near-axis ``input.QI_stel_seed_3127`` seed. The historical far-seed QI lane used ``max_mode = 4`` with ESS, ``target_aspect = 5.0``, ``abs(mean_iota) >= 0.41``, same-NFP reference-family boundary interpolation, and a single QI/iota cleanup with a QI ceiling. The shuffle-profile term is intentionally retained because width-only and branch-width-only smooth surrogates can rank QH/QP-like false positives ahead of the branch-squash/stretch/shuffle diagnostic used in the reference Goodman et al. omnigenity workflow. Two practical lessons from that study are now reflected in the example: - ``least_squares_solve`` uses SIMSOPT tuple semantics, so tuple weights are ``sqrt(weight)`` residual multipliers. Mirror/elongation soft-wall weights should be strong enough to prevent pathological shapes but not so dominant that they block the lower-QI basin. ``MirrorRatio`` defaults to all selected Boozer surfaces when ``surface_index=None`` and exposes ``smooth_extrema`` and ``smooth_penalty`` to avoid hard max/min gradients during cleanup. For QA/QH/QP mirror cleanup, ``VMECMirrorRatio`` is the cheaper VMEC-space alternative because the mirror ratio itself is coordinate-invariant. - The QI branch-width and shuffle-profile terms smooth the well matching enough to avoid the noisy objective jumps seen in earlier direct QI attempts, while preserving the same design ranking as the legacy branch diagnostic on the seed, the reference omnigenity result, and recent vmec_jax candidates. LgradB remains available as a commented optional shaping term, but it is not part of the default best QI lane. - ``weighted_shuffle_profile_weight`` enables an opt-in branch-shuffle term that uses the legacy squash/stretch endpoint correction, monotone linear branch crossings, and differentiable field-line weights when computing the mean bounce width. It is closer to the Goodman-style branch diagnostic than the historical smooth occupancy crossing estimator and is useful for ranking and homotopy experiments. In the current bounded NFP=2 probe it improved the isolated branch-shuffle ranking but did not beat the default mirror-aware run, so the public example leaves it at ``0.0``. - ``shuffle_profile_nphi_out`` enables an opt-in dense output grid for the differentiable branch-shuffle profile residual. This mirrors the legacy ``arr_out=True`` Goodman objective more closely by comparing the shuffled well and original profile on a denser toroidal grid than the base ``nphi``. It is intended for QI homotopy/diagnostic experiments because it increases residual size, memory, and wall time. - A QI-only objective is not a robust optimization policy. It can make the branch/shuffle diagnostic small while leaving ``mean_iota`` near zero. The default examples therefore include the iota floor and only promote a result when the independent smooth-QI, legacy-QI, and iota gates all pass. Mirror ratio is reported separately. For ``input.QI_stel_seed_3127``, QI-only cleanup recovers precise QI but raises mirror ratio, while mirror-heavy cleanup lowers mirror but damages QI. The current public lane keeps the reproducible QI/iota result as the promoted output and leaves the balanced mirror policies in ``tools/diagnostics/qi_constraint_policy_scan.py``; the open lane is still a fully QI-preserving mirror cleanup schedule that is robust across unrelated seeds. - ``QuasiIsodynamicResidualCeiling`` is the preferred cleanup guard when adding mirror, elongation, or other engineering terms after a low-QI basin has been found. It adds a smooth penalty only when the shared QI residual exceeds a configured ceiling. If a simultaneous mirror-aware solve uses no ceiling, it must instead require the independent full engineering gate. Any endpoint that improves mirror ratio but exceeds the ceiling or fails smooth/legacy QI validation is rejected, not promoted. - ``AugmentedLagrangianConstraint`` is available for engineering constraints that should be enforced without manually guessing a single enormous weight. It follows the projected Powell-Hestenes-Rockafellar form used in modern constrained stellarator optimization: the wrapped objective provides a signed constraint residual ``g(x) <= 0``, and the optimizer minimizes ``sqrt(mu) * max(g(x) + lambda / mu, 0)``. Update ``lambda`` and ``mu`` only from exact accepted diagnostics between explicit stages; never update them from relaxed trial residuals. ``MirrorRatio`` and ``MaxElongation`` expose signed constraint hooks for this wrapper while preserving their ordinary least-squares penalty behavior. - ``BoozerBTarget`` is available as a differentiable steering or homotopy term when a known reference Boozer ``|B|`` spectrum should guide a run toward a specific basin. It is a steering objective, not a final acceptance diagnostic. For example: .. code-block:: python target = vj.boozer_b_target_from_wout( "wout_reference_qi.nc", surfaces=SURFACES, mboz=QI_OPTIONS.mboz, nboz=QI_OPTIONS.nboz, ) b_target = vj.BoozerBTarget( target_bmnc=target["bmnc_b"], target_bmns=target["bmns_b"], qi_options=QI_OPTIONS, ) objective_tuples.append((b_target.J, 0.0, 100.0)) - ``vj.interpolate_indata_boundary`` is the lower-level boundary-space counterpart to ``BoozerBTarget``. It deterministically interpolates selected VMEC boundary coefficient dictionaries, preserving seed-owned scalar metadata such as ``NFP`` and ``LASYM``. For very far seeds this can be more effective than a local trust-region step because it deliberately crosses basins before the differentiable local optimizer starts. The ``qi_stel_seed_3127`` case uses this as a same-NFP reference-family preconditioner and, after the independent engineering gate passes, gives mirror ratio enough selection weight to use aspect/elongation margin instead of always picking the smallest scalar QI residual. ``QI_optimization.py`` is the recommended editable QI entry point. It no longer hides seed selection behind launch-time overrides. Instead, edit these ordinary variables near the top of the script: .. code-block:: python INPUT_FILE = DATA_DIR / "input.minimal_seed_nfp2" OUTPUT_DIR = Path("results/qi_opt/ess/minimal_nfp2_qi") MAX_MODE = 5 USE_SIMPLE_SEED = True USE_TARGET_HELICITY_SEED = True USE_REFERENCE_FAMILY_SEED = False REFERENCE_INPUT_FILE = DATA_DIR / "input.nfp2_QI" MAX_NFEV = 70 CONTINUATION_NFEV = 20 TARGET_ASPECT = 5.0 TARGET_ABS_IOTA_MIN = 0.41 MAX_MIRROR_RATIO = 0.30 MAX_ELONGATION = 8.2 The script takes ``nfp`` from the VMEC input file, so NFP=1/2/3/4 do not need separate drivers. To try a different VMEC input deck, change only ``INPUT_FILE`` and ``OUTPUT_DIR`` first, then run: .. code-block:: bash PYTHONPATH=. JAX_PLATFORMS=cpu python examples/optimization/QI_optimization.py ``QI_optimization.py`` preserves the raw selected input as ``RAW_INPUT_FILE``. It only rewrites that input when ``USE_SIMPLE_SEED`` is enabled, and then the generated ``input.simple_seed`` is written under ``OUTPUT_DIR``. This matters for reviewed seeds such as ``input.QI_stel_seed_3127``: keep ``USE_SIMPLE_SEED = False`` when you want to audit and optimize the actual scientific seed rather than a three-coefficient toy boundary. The compact ``QI_optimization_seed.py`` preset is the reproducible NFP=3 seed-3127 path used for the docs/README row. It leaves the raw ``input.QI_stel_seed_3127`` as the initial artifact, scans the same-NFP ``input.nfp3_QI_fixed_resolution_final`` reference family with ``REFERENCE_LAMBDAS = (0.998, 1.0, 1.002, 1.004, 1.006, 1.008, 1.01)``, keeps the preconditioner scan cheap with ``BOUNDARY_REFERENCE_MAX_ITER = 80``, then replays the selected accepted baseline with ``INNER_MAX_ITER = 450`` so final QI/mirror/iota diagnostics are not taken from an underconverged candidate. Use it directly when you want to reproduce the reviewed seed-3127 row without editing the larger QI driver. For far seeds, use the visible seed helpers before the local optimizer: - Set ``USE_TARGET_HELICITY_SEED = True`` to add deterministic tiny target-helicity modes when the active modes are exactly zero. - Set ``USE_REFERENCE_FAMILY_SEED = True`` and point ``REFERENCE_INPUT_FILE`` at a solved same-NFP QI input when the seed is in a different basin. The helper scans ``REFERENCE_LAMBDAS`` in boundary space, ranks candidates with independent QI/mirror/iota diagnostics, and writes the selected input under ``OUTPUT_DIR`` before local cleanup. Keep ``project_input_boundary_to_max_mode=True`` in the ``FixedBoundaryVMEC.from_input`` call so a ``max_mode = 1`` run removes higher boundary modes from a richer seed, while ``max_mode = 2`` and ``3`` expose the corresponding larger parameter spaces. The standalone script owns the optimization policy; ``examples/optimization/qi_optimization_cases.py`` is now primarily a sweep/rendering registry for archived docs rows such as ``nfp3_qi``, ``nfp4_qi_finite_beta``, and ``nfp4_qh_warm_to_qi``. The ``nfp4_qi_finite_beta`` row is a finite-beta NFP=4 stress fixture, and ``nfp4_qh_warm_to_qi`` is an explicit non-passing stress fixture. If you want to rank a new seed before optimizing it, first create a matching wout with ``vmec /path/to/input.my_seed`` and then run ``examples/optimization/audit_qi_seed_suitability.py --case label:qi:input_path:wout_path`` as described in :doc:`validation`. Promote a QI result only if both the numerical metrics and the Boozer ``|B|`` line-contour plot pass the QI gate. For local cleanup after a global preconditioner, the script can unlock boundary modes anisotropically using ``stage_mode_limits``. For example, ``{"mode": 3, "max_m": 1, "max_n": 3, "label": "nfirst"}`` lets toroidal harmonics move while keeping poloidal complexity low, then a second ``{"mode": 3, "max_m": 3, "max_n": 3, "label": "full"}`` stage unlocks the full production boundary. Candidate stages are promoted only if independent exact diagnostics pass, so a lower-mirror local step cannot overwrite the accepted QI baseline if it damages legacy QI or the mirror gate. For historical comparison, the earlier same-NFP reference-family scan used by the ``qi_stel_seed_3127`` far-seed case was: .. code-block:: bash PYTHONPATH=. JAX_PLATFORMS=cpu python tools/diagnostics/qi_boundary_interpolation_scan.py \ --seed-input examples/data/input.QI_stel_seed_3127 \ --reference-input examples/data/input.nfp3_QI_fixed_resolution_final \ --out-root results/diagnostics/qi_seed3127_boundary_interpolation \ --lambdas 0.99,0.995,1.0,1.005,1.008,1.01,1.012 \ --max-mode 4 --max-iter 80 --target-aspect 5.0 \ --surfaces 0.1,0.28,0.46,0.64,0.82,1.0 \ --mboz 18 --nboz 18 --nphi 151 --nalpha 31 --n-bounce 51 \ --smooth-qi-max 5e-3 --legacy-qi-max 2e-3 \ --max-mirror-ratio 0.35 --max-elongation 8.0 The following local landscape scan illustrates why the raw ``input.QI_stel_seed_3127`` neighborhood is hard. The seed starts with low mirror ratio, but poor elongation and low transform. Nearby boundary moves in ``rc01`` and ``zs01`` can keep the mirror low only in a narrow valley and do not fix the elongation/transform issue by themselves. This is why the promoted case-gated lane uses a larger reference-family move before local cleanup. The plot is generated with contour lines, not filled contours, so the competing ridges are visible. .. image:: _static/figures/qi_seed3127_landscape_rc01_zs01.png :alt: QI seed 3127 two degree-of-freedom landscape scan :align: center :width: 100% Recreate the scan with: .. code-block:: bash PYTHONPATH=. JAX_PLATFORMS=cpu python tools/diagnostics/qi_landscape_scan.py \ --input examples/data/input.QI_stel_seed_3127 \ --output-dir results/diagnostics/qi_landscape_seed3127 \ --max-mode 3 --min-vmec-mode 6 --dofs rc01,zs01 --points 3 \ --span 0.03 --span2 0.03 --surfaces 0.35,0.65 \ --nphi 51 --nalpha 11 --n-bounce 15 \ --mirror-ntheta 32 --mirror-nphi 32 \ --elongation-ntheta 24 --elongation-nphi 8 QI diagnostics and validation plan ---------------------------------- QI optimization has more ways to get a plausible but wrong answer than QA/QH because the smooth QI residual is a differentiable proxy for the legacy branch diagnostic. The public audit helpers are ``vj.QIDiagnosticOptions``, ``vj.qi_diagnostics_from_boozer_output``, ``vj.qi_diagnostics_from_state``, and ``vj.rank_qi_seed_records``; they return and rank unweighted records with smooth QI, raw QI, legacy branch, mirror-ratio, elongation, optional ``LgradB``, resolution metadata, and diagnostic error fields. Treat the standalone QI script as a candidate generator, then run the following validation ladder before promoting a result. 1. Confirm the objective definition on the seed and any final candidate. The diagnostics compare smooth QI variants, component totals, mirror ratio, elongation, ``phimin`` shifts, and optionally the reference omnigenity implementation: .. code-block:: bash PYTHONPATH=. JAX_PLATFORMS=cpu python tools/diagnostics/qi_objective_component_report.py PYTHONPATH=. JAX_PLATFORMS=cpu python examples/optimization/compare_omnigenity_qi_objective.py VMEC_JAX_RUN_REFERENCE_OMNIGENITY=1 PYTHONPATH=. JAX_PLATFORMS=cpu python examples/optimization/compare_omnigenity_qi_objective.py ``qi_objective_component_report.py`` is intentionally cheap: it does not run VMEC or BOOZ_XFORM. It uses controlled synthetic Boozer spectra to verify that the differentiable smooth-QI surrogate ranks QI-like, mixed, and QH-like spectra in the same order as the legacy branch-shuffle diagnostic. 2. Run the constrained QI matrix before declaring a best policy. This keeps direct QI, lower-mode continuation, optional same-mode repeat, ESS, and QP-preseed rows comparable: .. code-block:: bash PYTHONPATH=. JAX_PLATFORMS=cpu python examples/optimization/generate_qs_ess_sweep.py --backend-label cpu --solver-device cpu --policy continuation --problems qi --modes 1,2,3,4,5 --ess both --qi-qp-preseed both --max-nfev 70 --continuation-nfev 25 --inner-max-iter 180 --inner-ftol 1e-9 --trial-max-iter 180 --trial-ftol 1e-9 --rerun PYTHONPATH=. JAX_PLATFORMS=cpu python examples/optimization/generate_qs_ess_sweep.py --backend-label cpu --solver-device cpu --policy direct --problems qi --modes 1,2,3,4,5 --ess both --qi-qp-preseed both --max-nfev 70 --continuation-nfev 25 --inner-max-iter 180 --inner-ftol 1e-9 --trial-max-iter 180 --trial-ftol 1e-9 --rerun PYTHONPATH=. python examples/optimization/render_qi_constrained_sweep.py 3. Re-evaluate the final candidate at higher Boozer/QI resolution before using it as a science result. Increase ``QI_MBOZ``, ``QI_NBOZ``, ``QI_NPHI``, ``QI_NALPHA``, and ``QI_N_BOUNCE`` in ``compare_omnigenity_qi_objective.py`` or ``QI_optimization.py`` and check that the ranking, mirror ratio, elongation, and branch-shuffle metrics do not change qualitatively. 4. Run the QI-specific unit and workflow tests after changing the QI objective construction: .. code-block:: bash pytest -q tests/test_quasi_isodynamic.py tests/test_qi_legacy.py tests/test_qi_diagnostics.py tests/test_qi_objective_component_report.py tests/test_booz_input.py pytest -q tests/test_optimization_examples.py -k "qi or LeastSquaresProblem" 5. For parity-grade QI validation, compare against a local VMEC2000 executable when the external deck is available: .. code-block:: bash VMEC_JAX_RUN_QI_PARITY=1 VMEC2000_EXEC=/path/to/xvmec2000 pytest -q tests/test_qi_wout_parity.py The acceptance criteria are not just a lower scalar objective. Keep the final aspect ratio and signed/absolute iota in bounds, verify mirror ratio and LCFS elongation stay below their soft-wall thresholds, inspect ``|B|`` contours, and prefer candidates whose smooth QI metrics preserve the same ranking as the legacy branch diagnostic under the higher-resolution audit. For mirror-cleanup lanes, a lower mirror ratio is only promotable when either the QI residual ceiling or the independent full engineering gate still passes. Algorithms in detail --------------------- Least-squares drivers ~~~~~~~~~~~~~~~~~~~~~ ``vmec_jax`` provides two standalone outer least-squares drivers: - :func:`~vmec_jax.gauss_newton_least_squares`, a concrete custom Gauss-Newton loop tuned for exact VMEC callbacks; - ``method="scipy"`` in :class:`~vmec_jax.FixedBoundaryExactOptimizer.run`, which uses ``scipy.optimize.least_squares`` with the same exact residual and discrete-adjoint Jacobian callbacks. The QA exact-optimization example currently uses the SciPy route because it is more robust on the higher-dimensional QA problem. The custom Gauss-Newton path is still available for lower-level experiments and tighter callback control. Discrete-adjoint Jacobian ~~~~~~~~~~~~~~~~~~~~~~~~~~ The Jacobian column ``∂r/∂pᵢ`` is computed by propagating the initial-state tangent ``∂state₀/∂pᵢ`` through all recorded iteration steps: .. code-block:: text params → boundary → state₀ (linearize via jax.linearize) state₀ → state₁ → ... → stateₙ (forward scan, checkpointed) stateₙ → residuals (linearize via jax.linearize) The tangent propagation through the iterates is done by :func:`~vmec_jax.checkpoint_tape_state_jvp_columns`, which: 1. Loads the checkpoint tape (packed state at each step, preconditioner, etc.). 2. Calls ``jax.vmap(jax.jvp(step_fn, ...))`` over all parameter tangents at once, reusing the same compiled scan kernel. Key implementation choices: - **``backtracking=False, strict_update=True``**: matches the VMEC2000 iteration path. Using ``backtracking=True`` collapses the step size to machine epsilon on QH geometry and kills convergence. - **``VMEC_JAX_DYNAMIC_REPLAY_BUCKET``**: pads nearby solve trajectories so the same XLA scan executable is reused across Jacobian evaluations with slightly different tape lengths. The default is backend-adaptive: ``32`` on CPU and ``128`` on CUDA/ROCm/GPU backends. Larger values are profiling controls, not a universal GPU speedup. - **Single-entry cache** (``_exact_cache``): stores the last tape by parameter hash. Avoids rebuilding the tape when ``residual_fun`` then ``jacobian_fun`` are called at the same ``x`` (which Gauss-Newton always does). Residuals function ~~~~~~~~~~~~~~~~~~ :func:`~vmec_jax.make_qh_residuals_fn` builds the combined QH + aspect-ratio residual vector: .. code-block:: python r[0] = aspect_weight * (aspect - target_aspect) r[1:] = qs_weight * quasisymmetry_ratio_residuals(state, surfaces, m=1, n=-1) # n is in field-period units: n=-1 → QH with nn=-nfp internally (nfp=4 → nn=-4) The quasisymmetry ratio residual is computed by :func:`~vmec_jax.quasisymmetry_ratio_residual_from_state`, which evaluates :math:`|B|` on the specified surfaces, decomposes it into helical modes, and returns the residuals of the off-helicity modes. Public API ---------- .. currentmodule:: vmec_jax Recommended workflow API ~~~~~~~~~~~~~~~~~~~~~~~~ For new examples and user scripts, prefer the small workflow layer that keeps the problem assembly in user code and standardizes only the repeated mechanics: .. list-table:: :header-rows: 1 :widths: 38 62 * - Object or function - Role in an editable script * - ``FixedBoundaryVMEC.from_input(...)`` - Load the VMEC input deck, choose active boundary modes, apply the resolution policy, and select the output directory. * - ``LeastSquaresProblem.from_tuples(...)`` - Convert the visible ``(objective.J, target, weight)`` list into regular and QI residual blocks using SIMSOPT weight semantics. * - ``least_squares_solve(...)`` - Run the regular QS or QI least-squares driver. It should receive optimizer, continuation, ESS, device, and artifact-writing controls, not hidden physics-target keyword arguments. ``method="auto"`` may route high-mode stellsym CPU/default QS/QI cases to matrix-free LSMR while preserving dense SciPy for GPU runs; ``method="auto_scalar"``/``"auto_adjoint"`` routes high-mode stellsym CPU/GPU QS/QI cases to the safeguarded scalar-adjoint trust path, with dense SciPy retained for LASYM and low-mode runs. * - ``FixedBoundaryOptimizationResult`` - Return object with teaching accessors such as ``initial_optimizer``, ``final_optimizer``, ``initial_params``, ``final_params``, ``initial_state``, ``final_state``, ``history``, ``objective_history``, and ``timing_summary``. * - Objective wrappers such as ``AspectRatio``, ``MeanIota``, ``AbsMeanIotaFloor``, ``QuasisymmetryRatioResidual``, ``QuasiIsodynamicResidual``, ``MirrorRatio``, ``VMECMirrorRatio``, and ``MaxElongation`` - Small objects whose ``.J`` methods are placed directly in ``objective_tuples``. QI field objectives share one ``QuasiIsodynamicOptions`` object so the Boozer transform is routed through a single QI solve path. The lower-level optimizer remains public for custom research workflows, but the examples should use the workflow API above unless a script truly needs to construct residual closures or stage-local finite-beta data itself. :class:`FixedBoundaryExactOptimizer` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The main entry point for differentiable fixed-boundary optimisation. .. list-table:: :header-rows: 1 :widths: 40 60 * - Method - Description * - ``__init__(static, indata, boundary, specs, residuals_fn)`` - Construct optimizer; derive solver settings from indata. * - ``run(params0, *, max_nfev, ftol, gtol, xtol, x_scale)`` - Run Gauss-Newton loop; returns SciPy-like result dict. * - ``save_wout(path, params, state=None)`` - Write a ``wout_*.nc`` for the equilibrium at ``params``; pass a cached solved state from the result object to avoid an extra replay. * - ``save_history(path, result)`` - Write per-iteration history to JSON. * - ``aspect_ratio(params)`` - Query aspect ratio (uses exact-solve cache). * - ``quasisymmetry_objective(params)`` - Query total QS objective (uses exact-solve cache). * - ``clear_caches()`` - Release JIT and tape caches. :func:`make_qh_residuals_fn` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Factory returning a ``residuals_from_state(VMECState) → jnp.ndarray`` callable configured for quasi-helical symmetry. Combines one aspect-ratio residual with per-surface QS residuals. Parameters: ``static``, ``indata``, ``helicity_m``, ``helicity_n``, ``target_aspect``, ``surfaces``, ``aspect_weight``, ``qs_weight``. :func:`make_qs_residuals_fn` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ General quasisymmetry residuals factory supporting QA (``helicity_n=0``), QH, and optional mean-iota targets. This is the lower-level/custom-workflow factory; the SIMSOPT-like example scripts should usually use objective tuples and ``least_squares_solve`` instead. Parameters: ``static``, ``indata``, ``helicity_m``, ``helicity_n``, ``target_aspect``, ``target_iota``, ``surfaces``, ``aspect_weight``, ``iota_weight``, ``qs_weight``. :func:`create_x_scale` ~~~~~~~~~~~~~~~~~~~~~~~ Build per-DOF exponential spectral scaling weights for use with :meth:`FixedBoundaryExactOptimizer.run`. .. math:: w_i = \exp(-\alpha \cdot \max(|m_i|, |n_i|)) \;/\; \exp(-\alpha) The lowest non-trivial mode level (``max(|m|,|n|)=1``) has weight 1; higher modes are progressively down-weighted. Use ``alpha=0`` for uniform weights. Parameters: ``specs``, ``alpha`` (default 1.0). :func:`gauss_newton_least_squares` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Bare Gauss-Newton solver with exact Jacobian, Armijo line search, and hooks for expensive outer loops. See ``vmec_jax/optimization.py`` for full signature. Plotting helpers ~~~~~~~~~~~~~~~~ ``plot_3d_boundary_comparison``, ``plot_boozer_lcfs_bmag_comparison``, and ``plot_objective_history`` Generate the standard optimization figures independently, so user scripts can choose only the plots they need: - ``boundary_comparison.png`` — 3-D LCFS coloured by :math:`|B|`. - ``boozer_lcfs_bmag_comparison.png`` — initial/final :math:`|B|` contour lines on the LCFS in Boozer coordinates :math:`(\theta_B,\phi_B)`. - ``objective_history.png`` — Objective and aspect ratio vs Jacobian index. :func:`checkpoint_tape_state_jvp_columns` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Low-level function for propagating a batch of tangents through the adjoint tape. Returns final-state tangents (one per parameter). Used internally by :class:`FixedBoundaryExactOptimizer` but exposed publicly for custom workflows. Source files ------------- .. list-table:: :header-rows: 1 :widths: 45 55 * - File - Role * - ``vmec_jax/optimization.py`` - :class:`FixedBoundaryExactOptimizer`, :func:`make_qh_residuals_fn`, :func:`make_qs_residuals_fn`, :func:`create_x_scale`, :func:`gauss_newton_least_squares`, boundary DOF helpers. * - ``vmec_jax/discrete_adjoint.py`` - Checkpoint tape build (``build_residual_checkpoint_tape_direct``), JVP propagation (``checkpoint_tape_state_jvp_columns``). * - ``vmec_jax/quasisymmetry.py`` - QS residuals (``quasisymmetry_ratio_residual_from_state``). * - ``vmec_jax/quasi_isodynamic.py`` - Differentiable smooth Boozer-space QI residuals (``quasi_isodynamic_residual_from_state``), mirror-ratio penalty, and LCFS elongation and ``LgradB`` penalties. * - ``vmec_jax/plotting.py`` - ``plot_3d_boundary_comparison``, ``plot_boozer_lcfs_bmag_comparison``, and ``plot_objective_history``. * - ``vmec_jax/driver.py`` - ``write_wout_from_fixed_boundary_run``, ``wout_from_fixed_boundary_run``. * - ``examples/optimization/QH_optimization.py`` - QH SIMSOPT-style workflow script (no argparse, variables and objective list at top). * - ``examples/optimization/QA_optimization.py`` - QA workflow with ESS toggle (aspect + mean iota + QA objectives). * - ``examples/optimization/QP_optimization.py`` - QP workflow with helicity ``M=0`` quasisymmetry and iota lower bound. * - ``vmec_jax/optimization_workflow.py`` - Shared mechanics for objective terms, stage construction, continuation, output writing, and plotting while keeping the QA/QH/QP scripts linear. * - ``examples/optimization/QI_optimization.py`` - QI workflow using ``booz_xform_jax``, a public minimal seed by default, optional explicit reference-family seeding, and explicit mirror-ratio/elongation/``LgradB`` objective blocks that users can extend in the script. * - ``examples/optimization/QI_seed_robustness.py`` - Focused QI + aspect + iota-floor seed-refinement workflow for minimal/circular-like seeds, with far seeds such as ``input.QI_stel_seed_3127`` available as explicit diagnostics. * - ``examples/optimization/compare_omnigenity_qi_objective.py`` - Diagnostic QI objective comparison against the local ``omnigenity_optimization`` reference scripts, including ``phimin`` interval scans and residual-block timings. * - ``examples/optimization/plot_optimization_results.py`` - Standalone plotting helper (regenerates figures from saved wout+JSON). * - ``examples/optimization/target_iota_aspect_volume.py`` - Simpler optimisation targeting iota, aspect, volume. Running the QH example ----------------------- .. code-block:: bash # Run optimisation (saves wout files + history.json + figures to results/qh_opt) python examples/optimization/QH_optimization.py # Regenerate figures from saved outputs python examples/optimization/plot_optimization_results.py --output-dir results/qh_opt Increase ``MAX_MODE`` at the top of ``QH_optimization.py`` for richer boundary parameterisation; increase ``MAX_NFEV`` for more optimisation budget. Running the QI objective comparison ------------------------------------ .. code-block:: bash PYTHONPATH=. python examples/optimization/compare_omnigenity_qi_objective.py PYTHONPATH=. python examples/optimization/scan_qi_boozer_mode.py These scripts are intentionally diagnostic. The comparison script writes JSON and ``wout`` artifacts under ``results/omnigenity_compare/qi_objective`` and should be used before changing the smooth QI objective weights, especially ``shuffle_profile_weight``, or the ``phimin`` well interval. The local SIMSOPT/``omnigenity_optimization`` reference leg is off by default to avoid expensive accidental runs; set ``RUN_REFERENCE_OMNIGENITY = True`` in the script when an apples-to-apples reference residual is needed. The reference leg runs in a child process with ``REFERENCE_TIMEOUT_S`` so crashes, memory pressure, or timeouts are reported in the JSON summary without killing the vmec_jax diagnostic. The Boozer-mode scan script writes ``results/omnigenity_compare/qi_boozer_mode_scan/qi_boozer_mode_scan.json`` and ``.png``. It runs one equilibrium/Boozer transform, then scales one selected ``bmnc`` coefficient and compares the smooth differentiable QI objective against the legacy Goodman-style branch/shuffle diagnostic. Use this before launching a large QI sweep when objective changes appear noisy or when the optimized configuration looks visually QP/QH-like despite a small smooth QI objective. GPU acceleration ----------------- The same workflow runs on GPU without modification. After installing GPU-enabled JAX, leave JAX's default platform selection active or set ``JAX_PLATFORM_NAME=gpu`` before running. For NVIDIA CUDA specifically, ``JAX_PLATFORMS=cuda`` is also valid. You can also pass ``solver_device="gpu"`` to the Python optimizer/driver interfaces. If ``solver_device`` is unset, vmec_jax inherits JAX's active default backend. For GPU runs, the dynamic replay bucket (``VMEC_JAX_DYNAMIC_REPLAY_BUCKET``) should be tuned only during profiling. The default keeps padding modest. Coarser buckets can reduce recompilation in some trajectories, but they can also make accepted-point replay much slower. Further reading --------------- .. seealso:: * :doc:`discrete_adjoint` * :doc:`simsopt_comparison`