license: apache-2.0
pipeline_tag: image-to-image
base_model:
- black-forest-labs/FLUX.2-klein-base-4B
base_model_relation: finetune
datasets:
- boatbomber/CuneiformPhotosMSII
tags:
- image-to-image
- cuneiform
- geometry
- curvature
- multi-scale-integral-invariant
- msii
- Flux
<div align="center">
<h1 align="center">
NisabaRelief
</h1>
<img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/NisabaRelief-Logo.png" width="600"/>
</div>
NisabaRelief
NisabaRelief is a rectified flow transformer that converts ordinary photographs of cuneiform clay tablets into Multi-Scale Integral Invariant (MSII) curvature visualizations, without requiring 3D scanning hardware. Traditional MSII computation requires a high-resolution 3D scanner and GigaMesh postprocessing, averaging approximately 68 minutes per tablet. NisabaRelief processes a photograph in approximately 7 seconds.
Photographic images introduce a variety of noise sources: lighting direction, clay color, surface sheen, photography conditions, and surface staining. Any of these can cause wedge impressions to appear as shadows or shadows to appear as wedge impressions. MSII filtering discards this photometric variation, retaining only the geometric signal pressed into the clay. See What is MSII? for full technical details.
Built by fine-tuning Flux.2 Klein Base 4B on paired photo/MSII data generated from 3D scans in the HeiCuBeDa corpus. Training data is made available here: CuneiformPhotosMSII.
Named for Nisaba, the early Sumerian goddess of writing and scribes, NisabaRelief will serve as the preprocessing backbone of NabuOCR V2, a cuneiform OCR system currently in development.
Showcase Video:

Contents
Example Output
<table>
<thead>
<tr>
<th align="center" width="25%">Input</th>
<th align="center" width="25%">Output</th>
<th align="center" width="25%">Ground Truth</th>
<th align="center" width="25%">Difference</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_input_0.png" width="200"/></td>
<td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_output_0.png" width="200"/></td>
<td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_truth_0.png" width="200"/></td>
<td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_diff_0.png" width="200"/></td>
</tr>
<tr>
<td colspan="4" align="center"><b>Dice: 0.9652</b> · RMSE: 0.0775 · MS-SSIM: 0.9295 · PSNR: 22.22 dB · PSNR-HVS-M: 17.77 dB · SRE: 58.34 dB</td>
</tr>
<tr>
<td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_input_1.png" width="200"/></td>
<td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_output_1.png" width="200"/></td>
<td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_truth_1.png" width="200"/></td>
<td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_diff_1.png" width="200"/></td>
</tr>
<tr>
<td colspan="4" align="center"><b>Dice: 0.9555</b> · RMSE: 0.0788 · MS-SSIM: 0.9219 · PSNR: 22.07 dB · PSNR-HVS-M: 17.80 dB · SRE: 57.89 dB</td>
</tr>
<tr>
<td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_input_2.png" width="200"/></td>
<td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_output_2.png" width="200"/></td>
<td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_truth_2.png" width="200"/></td>
<td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_diff_2.png" width="200"/></td>
</tr>
<tr>
<td colspan="4" align="center"><b>Dice: 0.9630</b> · RMSE: 0.1108 · MS-SSIM: 0.8513 · PSNR: 19.11 dB · PSNR-HVS-M: 14.65 dB · SRE: 59.60 dB</td>
</tr>
<tr>
<td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_input_3.png" width="200"/></td>
<td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_output_3.png" width="200"/></td>
<td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_truth_3.png" width="200"/></td>
<td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_diff_3.png" width="200"/></td>
</tr>
<tr>
<td colspan="4" align="center"><b>Dice: 0.9713</b> · RMSE: 0.1035 · MS-SSIM: 0.8748 · PSNR: 19.70 dB · PSNR-HVS-M: 15.33 dB · SRE: 59.41 dB</td>
</tr>
<tr>
<td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_input_4.png" width="200"/></td>
<td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_output_4.png" width="200"/></td>
<td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_truth_4.png" width="200"/></td>
<td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_diff_4.png" width="200"/></td>
</tr>
<tr>
<td colspan="4" align="center"><b>Dice: 0.9564</b> · RMSE: 0.1054 · MS-SSIM: 0.9325 · PSNR: 19.55 dB · PSNR-HVS-M: 15.18 dB · SRE: 57.36 dB</td>
</tr>
</tbody>
</table>
Quickstart
Installation
Prerequisites:
- Python >= 3.10
- PyTorch with CUDA support. See https://pytorch.org/get-started/locally/.
# Install PyTorch (CUDA 12.8 example)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
# Windows only: install Triton (included automatically on Linux)
pip install triton-windows
Install:
pip install nisaba-relief
Usage
from nisaba_relief import NisabaRelief
model = NisabaRelief() # downloads weights from HF Hub automatically if needed
result = model.process("tablet.jpg")
result.save("tablet_msii.png")
Constructor parameters:
| Parameter | Default | Description |
|---|---|---|
| device | "cuda" if available | Device for inference |
| num_steps | 2 | Denoising steps |
| weights_dir | None | Local weights directory; if None, downloads from HF Hub or uses HF cache. Expected dir contents: model.safetensors, ae.safetensors, prompt_embedding.safetensors |
| batch_size | None | Batch size for processing tiles during inference. None (default) auto-selects the largest batch that fits in available VRAM. Set an explicit integer to override. Higher values are faster but see note below. |
| seed | None | Optional random seed for reproducible noise generation; if None, randomized |
| compile | True | Use torch.compile for faster repeated inference. Requires Triton. Set to False if Triton is not installed or for one-off runs. |
Reproducibility note: Results are pixel-exact across repeated runs with the same batch_size and seed. However, changing batch_size between runs (including letting None auto-select a different value as available VRAM changes) will produce outputs that differ by up to ~1-2 pixel values (mean < 0.25) due to GPU floating-point non-determinism: CUDA selects different kernel implementations for different matrix shapes, which changes the floating-point accumulation order in the transformer attention and linear layers. The visual difference is imperceptible. If exact cross-run reproducibility is required, set a constant batch_size.
process() parameters:
| Parameter | Default | Description |
|---|---|---|
| image | required | File path (str/Path) or PIL Image |
| show_pbar | None | Progress bar visibility. None = auto (shows when >= 2 batches); True/False = always show/hide |
Returns: Grayscale PIL.Image.Image containing the MSII visualization.
Input requirements:
- Any PIL-readable format (PNG, JPG, WEBP, ...)
- Minimum 64 px on the short side; maximum aspect ratio 8:1
Large image support:
The model's native tile size is 1024 px. For images where either side exceeds 1024 px, the model automatically applies a sliding-window tiling pass. Tiles are blended with raised-cosine overlap weights to avoid seams. Each tile is also conditioned on a 128 px thumbnail of the full image with a red rectangle marking the tile's position, so the model retains global context while processing local detail.
There is no practical upper limit on input resolution, though the model may perform unexpectedly if the 1024 px tile is only a small fraction of the total image area.
Hardware Requirements
While CPU inference is technically supported, it is too slow for practical use. A GPU with at least 9GB VRAM is required, with 12GB+ being recommended for better batching.
The 9 GB figure is substantially lower than the ~18 GB a standard FLUX.2-klein-base-4B deployment would require because the Qwen3-4B text encoder is never loaded at runtime. The conditioning prompt is pre-computed once and shipped as a 7.8 MB embedding file alongside the model weights.
Performance
Traditional pipelines require a high-resolution 3D scanner and GigaMesh postprocessing: across the HeiCuBeDa corpus, this averages approximately 68 minutes per tablet, totalling over 2,200 hours for the full collection. NisabaRelief processes a tablet photograph in approximately 7 seconds, roughly 600x faster, with no scanning equipment required.
On a 1064x2048px photo, an RTX 3090 performs as follows:
| Run | Time |
|---|---|
| compile warmup | 11.61s |
| 1 | 7.05s |
| 2 | 7.07s |
| 3 | 7.09s |
| Mean | 7.07 ± 0.02s |
What is MSII?
Multi-Scale Integral Invariant (MSII) filtering is a geometry-processing algorithm that computes a robust curvature measure at every point on a 3D surface mesh. At each vertex, a sphere of radius r is centered on the surface and the algorithm measures how much of the sphere's volume falls below the surface (the "interior" volume). On a perfectly flat surface the ratio is exactly one half. Concave regions (such as the channel cut by a wedge impression) admit more of the sphere below the surface, pushing the ratio above 0.5. Convex regions such as ridges or the rounded back of a tablet expose less interior volume, pulling the ratio below 0.5. The signed difference from the flat baseline maps directly to the sign and magnitude of mean curvature at that point.
The multi-scale component repeats this computation at several sphere radii simultaneously. Small radii resolve fine wedge tips and hairline details; large radii capture broader curvature trends such as the tablet's overall convexity. The per-vertex measurements across all radii form a compact feature vector, and the final scalar output conventionally displayed as a grayscale image is the maximum component of that feature vector, capturing the strongest curvature response across all scales into a single value per pixel.
By convention the scalar is displayed with its sign inverted relative to the mean curvature: concave regions (ratio > 0.5) map to darker pixel values and convex regions (ratio < 0.5) to lighter ones. This places the flat-surface baseline at mid-gray and renders wedge channels as dark strokes against a bright background, similar to ink on paper.
Because the result depends only on the 3D shape of the surface rather than on lighting, clay color, or photograph angle, wedge impressions appear as consistent dark strokes against a bright background. This makes the surface structure considerably more legible to machine-vision OCR systems than raw photographs.
Intended Use & Limitations
Generating an MSII visualization of a tablet requires a high-resolution laser scanner and substantial per-vertex computation. The vast majority of cuneiform tablets do not have a 3D scan available, and the computational cost is difficult to scale across large corpora.
To reduce this barrier and increase the availability of readable images, this model is trained to predict the MSII visualization directly from photographs.
Intended use:
- Preprocessing step for cuneiform OCR (specifically NabuOCR V2)
- Visualizing cuneiform tablet geometry for research and digital humanities
Limitations:
- Trained exclusively using HeiCuBeDa 3D-scan data; performance on tablet types or scribal traditions not well-represented in that corpus is unknown
- Outputs are MSII approximations inferred from 2D photographs, not computed from true 3D geometry. They are suitable for OCR preprocessing but are not a substitute for physical scanning
- Not a general-purpose MSII model; behavior on non-cuneiform inputs is undefined and out of distribution
- Designed for photographs following CDLI photography guidelines: high-resolution fatcross layout on a black background. The model may underperform on low-resolution or visually cluttered inputs such as older black-and-white excavation photographs where the background blends into the tablet
Evaluation
The model was evaluated on 704 held-out validation pairs, all tablets whose geometry was never seen during training (see Training Data). Each validation image was processed through the model and the output compared against the ground-truth MSII visualization computed from the 3D scan. Ran with seed=42 and batch_size=4.
| Metric | Value |
|------------|------------------|
| Dice | 0.9639 ± 0.0138 |
| RMSE | 0.0877 ± 0.0208 |
| MS-SSIM | 0.9026 ± 0.0308 |
| PSNR | 21.36 ± 1.91 dB |
| PSNR-HVS-M | 16.98 ± 1.89 dB |
| SRE | 59.57 ± 1.92 dB |
Dice (Binarized Dice Coefficient) thresholds both images to isolate wedge stroke regions, then measures overlap between predicted and ground-truth strokes on a 0-1 scale. This is the most task-relevant metric, as it directly measures whether the model correctly localizes wedge impressions for downstream OCR.
RMSE (Root Mean Squared Error) measures average pixel-level reconstruction error; lower is better.
MS-SSIM (Multi-Scale Structural Similarity Index) measures perceptual image similarity by comparing luminance, contrast, and local structure at multiple spatial scales simultaneously. Coarser scales capture global shape agreement; finer scales capture edge and texture detail. Scores range from 0 to 1, where 1 is a perfect match; higher is better.
PSNR (Peak Signal-to-Noise Ratio) expresses reconstruction fidelity in decibels relative to the maximum pixel value; higher is better.
PSNR-HVS-M (Peak Signal-to-Noise Ratio - Human Visual System and Masking) measures reconstruction fidelity in decibels relative to the maximum pixel value while taking into account Contrast Sensitivity Function (CSF) and between-coefficient contrast masking of DCT basis functions.
SRE (Signal-to-Reconstruction Error) ratio measures reconstruction fidelity in decibels based on signal energy vs. error energy; higher is better.
Step Sweep
A sweep of step counts was run on a subset of 175 validation samples and found that 2 steps is ideal for this model, adding one corrective step over the already solid single-step result. The rectified flow field is extremely straight (straightness_ratio=0.9989, path_length_ratio=1.0011, velocity_std=0.1565). For near-perfectly straight ODE trajectories, a single Euler step is theoretically near-exact, and each additional step accumulates small model prediction errors faster than it reduces discretization error. Where throughput is the primary concern, one step is acceptable. Ran with seed=42 and batch_size=4.
| Metric | Steps=1 | Steps=2 | Steps=4 | Steps=8 |
|------------|------------------|----------------------|------------------|------------------|
| Dice | 0.9582 ± 0.0153 | 0.9634 ± 0.0139 | 0.9612 ± 0.0142 | 0.9580 ± 0.0148 |
| RMSE | 0.0909 ± 0.0209 | 0.0859 ± 0.0212 | 0.0900 ± 0.0203 | 0.0949 ± 0.0197 |
| MS-SSIM | 0.8987 ± 0.0326 | 0.9081 ± 0.0310 | 0.9039 ± 0.0314 | 0.8959 ± 0.0326 |
| PSNR | 21.03 ± 1.83 dB | 21.56 ± 1.97 dB | 21.11 ± 1.84 dB | 20.63 ± 1.72 dB |
| PSNR-HVS-M | 16.65 ± 1.80 dB | 17.19 ± 1.96 dB | 16.70 ± 1.83 dB | 16.18 ± 1.70 dB |
| SRE | 58.81 ± 1.81 dB | 59.07 ± 1.87 dB | 58.85 ± 1.87 dB | 58.61 ± 1.86 dB |
Training Data
Training uses the CuneiformPhotosMSII dataset: 13,928 paired image pairs generated from 1,741 tablets sourced from the HeiCuBeDa (Heidelberg Cuneiform Benchmark Dataset), a professional research collection of 3D-scanned clay tablets. Each tablet was rendered multiple times in Blender at up to 4096 px, producing synthetic photographs alongside their corresponding MSII curvature visualizations.
Each render variant randomizes which faces of the tablet are shown, camera focal length (80-150 mm), tablet rotation (±5° Euler XYZ), lighting position/color/intensity, and background (fabric, grunge, stone, or none). This diversity encourages the model to generalize across realistic shooting conditions rather than overfitting to a specific lighting or composition style.
The dataset was split tablet-wise: 13,224 pairs (~95% of tablets) for training and 704 pairs (~5% of tablets) held out for validation. Because the split is by tablet identity, the model never sees a validation tablet's geometry during training.
Training Pipeline
Training proceeded in three sequential stages: Pretrain, Train, and Rectify. Each stage builds directly on the weights from the previous one.
Key Technical Decision: Text-Encoder-Free Training
All three stages skip the Qwen3-4B text encoder entirely. Text embeddings are pre-computed once and cached to disk, reducing VRAM consumption from ~18 GB to ~9 GB without any loss in conditioning fidelity.
Key Technical Decision: VAE BatchNorm Domain Calibration
The FLUX.2 VAE contains a BatchNorm layer whose running statistics (running_mean and running_var across 128 channels: 32 latent channels × 2×2 patch size) were originally computed on diverse natural images. Applying this encoder to cuneiform tablets and MSII renderings introduces a latent-space distribution shift that manifests as screen-door dithering artifacts in decoded outputs.
To correct this, the BatchNorm statistics were recalibrated on the target domain before training began. 3,000 CDLI cuneiform tablet photographs and 2,000 synthetic MSII visualizations (5,000 images total) were encoded through the frozen VAE encoder; running mean and variance were accumulated across 19,301,093 spatial samples using float64 accumulators for numerical stability. Images from both domains were interleaved to ensure balanced sampling. The calibrated statistics are baked directly into the ae.safetensors weights shipped with this model.
Stage 1: Pretrain (Domain Initialization)
The pretrain stage adapts the base FLUX.2 model to the cuneiform domain before any image-to-image translation is attempted. It runs standard text-to-image flow-matching training on two sources of real cuneiform imagery:
- ~60% CDLI archive photographs: real museum photos of tablets, paired with per-image text embeddings generated from CDLI metadata (period, material, object type, provenience, genre, language). Eight prompt templates were used and varied randomly.
- ~40% synthetic MSII renders: MSII visualization images from the training set, paired with MSII-specific text embeddings emphasizing curvature, surface topology, and wedge impression terminology.
Each image has its own unique cached embedding rather than a shared prompt, preventing the model from memorizing specimen identifiers and encouraging generalization.
| Hyperparameter | Value |
|---|---|
| Steps | 75,000 |
| Learning rate | 2e-4 (cosine decay, 1k warmup) |
| Effective batch size | 2 (batch 1, grad accum 2) |
| LoRA rank | 256 |
| LoRA init | PiSSA (8-iteration fast SVD) |
| Optimizer | 8-bit Adam |
| Precision | bfloat16 autocast |
| Timestep sampling | Logit-normal (mean=0, std=1) |
| Gradient clipping | 1.0 |
Images are resized to fit within 1 megapixel and rounded to 128-pixel multiples. Light augmentations are applied (horizontal flip, ±5° rotation, minor color jitter). Validation generates text-conditioned images across four aspect ratios every 1,000 steps.
Stage 2: Train (Image-to-Image Adaptation)
The main training stage fine-tunes the pretrained weights for the target task: translating cuneiform tablet photographs into MSII visualizations. This stage introduces two significant changes over standard FLUX.2 fine-tuning.
Tile and global context conditioning
Rather than processing full images, the model trains on dynamic tile crops (128-1024 px, depending on image resolution) while simultaneously receiving a downscaled 128 px thumbnail of the full image with a red rectangle marking the tile's location, providing both local detail and global context.
Paired crop with geometric consistency
The same crop coordinates and geometric transforms (flip, rotation, perspective distortion) are applied to both the input photograph and the target MSII image, ensuring the model always receives spatially aligned pairs.
Augmentation Pipeline
Augmentations are split into two categories applied in sequence:
Geometric (applied identically to input and target):
- Horizontal flip (50%), vertical flip (40%), rotation ±8° (50%), perspective distortion strength 0.02 (30%)
Domain adaptation (applied to input only, to simulate real photographic variation):
- Perlin noise illumination (20%), vignette (40%), directional lighting gradient (50%), dust particles (50%), Gaussian noise (80%), gamma correction (50%), contrast adjustment (50%), brightness shift (50%), hue/saturation shift (40%), Gaussian blur (30%), grayscale conversion (3%)
Spatially-dependent effects (Perlin noise, vignette, gradient) use crop coordinates so the tile and its global thumbnail receive matching effects.
Loss
Flow-matching loss with Min-SNR-γ weighting (γ=5.0) to down-weight noisy high-timestep predictions, plus a multi-scale latent gradient loss weighted at 0.25. The gradient loss computes spatial gradient differences between predicted and target latents at four downsampling scales, encouraging sharp edge structure in outputs.
| Hyperparameter | Value |
|---|---|
| Steps | 150,000 |
| Learning rate | 3e-4 (cosine decay to 6e-6, 1k warmup) |
| Effective batch size | 8 (batch 1, grad accum 8) |
| LoRA rank | 256, alpha √rank, RSLoRA |
| LoRA init | PiSSA (8-iteration fast SVD) |
| EMA decay | 0.999 (used for validation and final save) |
| Optimizer | 8-bit Adam |
| Gradient clipping | 0.8 (with spike detection: skip if >2.5× EMA norm) |
| Precision | bfloat16 autocast |
| Gradient loss weight | 0.25 |
| Min-SNR-γ | 5.0 |
| Timestep sampling | Logit-normal (mean=0, std=1) |
Validation runs every 2,000 steps, generating 8 sample images with 8 denoising steps.
Stage 3: Rectify (Trajectory Straightening)
The rectify stage implements Rectified Flow to reduce the number of inference steps required at runtime.
Standard flow-matching trains on random (noise, real target) pairs, producing curved ODE trajectories that require 25-50 denoising steps to traverse accurately. Rectified training instead pairs each noise sample with the output the fully-trained model generates from that noise, creating straight-line trajectories that can be traversed in 1-4 steps without quality loss.
Before training, a one-time preprocessing pass runs the trained model over the training set. Each image is cropped deterministically (seeded RNG, same tile-sizing logic as training), then fully denoised with the trained weights to produce a (noise, generated_output) coupled pair saved to disk. This eliminates VAE encoding from the training loop, reducing VRAM further.
The loss trains the model to predict the velocity between a coupled (noise, generated) pair at a random interpolated timestep. A pseudo-Huber loss replaces the MSE used in earlier stages, providing better gradient stability when predictions are far from target.
| Hyperparameter | Value |
|---|---|
| Steps | 50,000 |
| Learning rate | 3e-6 (cosine decay, 500 warmup) |
| Effective batch size | 4 (batch 1, grad accum 4) |
| LoRA rank | 256 |
| LoRA init | Loaded from Stage 2 weights (warm-start) |
| Loss | Pseudo-Huber (c=0.001) |
| Optimizer | 8-bit Adam |
| Gradient clipping | 1.0 |
| Precision | bfloat16 autocast |
| Timestep sampling | Logit-normal (mean=0, std=1) |
Validation runs every 2,000 steps using real validation images (not coupled pairs), generating outputs with only 2 denoising steps to directly measure few-step inference quality.
The result is usable MSII visualizations in 1-2 denoising steps, compared to the 25-50 steps standard flow-matching requires.
Acknowledgements & Citations
3D Scan Data (HeiCuBeDa)
3D scans used to generate the training dataset are from the Heidelberg Cuneiform Benchmark Dataset (HeiCuBeDa):
Bogacz, B., Gertz, M., & Mara, H. (2015). Character Proposals for Cuneiform Script Digitization. Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). doi:10.11588/data/IE8CCN
Archive Photographs (CDLI)
Real tablet photographs used in Stage 1 pretraining are sourced from the Cuneiform Digital Library Initiative (CDLI).
MSII Curvature (GigaMesh)
MSII curvature values embedded in the HeiCuBeDa PLY files were computed using the GigaMesh Software Framework.
Rectified Flow
Stage 3 (Rectify) implements the trajectory-straightening approach from:
Liu, X., et al. (2022). Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow. arXiv:2209.03003
Base Model (FLUX.2 Klein Base 4B)
Fine-tuned from FLUX.2-klein-base-4B by Black Forest Labs.
Tags: safetensors, image-to-image, cuneiform, geometry, curvature, multi-scale-integral-invariant, msii, Flux, dataset:boatbomber/CuneiformPhotosMSII, arxiv:2209.03003, base_model:black-forest-labs/FLUX.2-klein-base-4B, base_model:finetune:black-forest-labs/FLUX.2-klein-base-4B, license:apache-2.0, region:us