Illustration generation

The consistency problem

A single illustration is easy. A set of twelve in the same style is the hard problem. Research 10 + 15: prompt words drift; reference images do not. Any illustration skill must inject a brand style reference at every call.

Routing

Model	Brand lock mechanism	Best for
Flux Pro / Flux.2	`reference_images[]` (up to 8 in Flux.2) + brand LoRA	Photoreal, stylized 3D, brand illustration sets
SDXL + brand LoRA	trained LoRA (6d recipe, ~5k steps on 20 images)	Bespoke brand style, open-weight
Recraft V3	`style_id` (brand magic)	Flat vector, editorial illustration
Ideogram 3	style codes	Loose "same vibe" — not strict lock
Midjourney v6/v7	`--sref` / `--cref` / `--mref`	Concept work; no API, community wrappers only
`gpt-image-1`	`input_image[]`	Edit / composite flows

First illustration in a set is human-gated. Once approved, its style becomes the reference injected into all subsequent generations.

Brand bundle injection

illustration prompt =
  [SUBJECT + SCENE from brief]
+ [style anchor: "in the style of the provided reference images"]
+ [palette: exact hex list from brand]
+ [do_not list as positive anchors: "flat matte surfaces" not "no glossy plastic"]
+ [typography reminder: "no text, no labels"]
+ [technical constraints: aspect, resolution, composition]

+ reference_images[]: [style_ref_01.png, style_ref_02.png, (prior approved illustration).png]
+ LoRA handle / style_id / --sref

Prompt scaffold

An illustration of [SUBJECT: concrete noun phrase] in a [SCENE: clear action/context].
Composition: [centered | rule-of-thirds | off-center-left]. Subject occupies ~60% of frame.
Style: in the style of the provided reference images. Flat vector with soft gradients. 
Line weight consistent with references.
Palette strictly limited to: [#hex, #hex, #hex, #hex, #hex].
Materials: matte surfaces, soft ambient lighting, no rim lights, no lens flare.
No text, no labels, no UI elements.
[aspect ratio]. 2048x1280 resolution.

Drop quality modifiers on Flux (no masterpiece, 8k — hurts adherence). Keep them on SD.

Post-processing

Background removal only if asset is spot art (use BiRefNet for soft edges).
Palette validation: K-means 8-color in LAB, ΔE2000 against brand palette, regenerate if max ΔE > 10.
Composition validation: VLM rubric check against style references.
Resize to target sizes (sharp premultiplied-alpha resize).

Full-set propagation

Workflow for generating N illustrations:

Generate illustration #1 with the brand bundle and the brief.
Human gates: accept / regenerate / tweak.
On accept: add illustration #1 to the style reference set.
Subsequent illustrations pull the whole augmented reference set → style locks progressively tighter.
After 3–4 accepted illustrations, train a LoRA for even tighter lock on 20+ asset sets.

Output

illustrations/
├── empty-state-projects.png
├── empty-state-tasks.png
├── onboarding-welcome.png
└── meta.json     # includes provenance + "set coherence score" across the batch