FutureMa/Qwen3-8B-Drama-Thinking
license: apache-2.0 base_model: Qwen/Qwen3-8B tags:
- qwen3
- thinking
- creative-writing
- screenwriting
- drama
- chain-of-thought
- reasoning
- ms-swift
- full-parameter-finetuning datasets:
- custom-drama-thinking-dataset language:
- en
- zh library_name: transformers pipeline_tag: text-generation model-index:
- name: Qwen3-8B-Drama-Thinking
results:
- task:
type: text-generation
name: Creative Script Writing
metrics:
- type: thinking_depth value: 9.0 name: Thinking Depth Score
- type: script_format value: 9.0 name: Script Format Score
- type: dramatic_craft value: 8.5 name: Dramatic Craft Score
- task:
type: text-generation
name: Creative Script Writing
metrics:
Qwen3-8B-Drama-Thinking
This model is a full parameter fine-tuned version of Qwen/Qwen3-8B on a custom drama thinking dataset with explicit creative reasoning chains.
Model Description
- Base Model: Qwen3-8B (8 billion parameters)
- Training Method: Full Parameter Fine-tuning (NOT LoRA)
- Training Framework: ms-swift
- Training Data: Custom Drama Thinking Dataset (6,319 samples, avg ~5,000 tokens)
- Specialization: Screenwriting with explicit
<think>...</think>creative reasoning - Hardware: 2x NVIDIA H100 80GB SXM5
- Training Time: 2 hours 46 minutes (3 epochs)
- Training Cost: ~$17.86
Key Features
🎬 Professional Screenwriting Assistant
This model generates dramatic scripts with explicit creative deliberation:
- ✅ Thinking Process Visible: Uses
<think>...</think>tags to show internal reasoning - ✅ Deep Character Psychology: Analyzes motivations, defense mechanisms, subtext
- ✅ Structural Planning: Three-act structure, emotional arcs, pacing decisions
- ✅ Visual Storytelling: Symbolism, atmosphere, cinematographic choices
- ✅ Professional Format: Correct screenplay formatting (scene headers, action lines, dialogue)
📊 Performance Comparison
Compared to base Qwen3-8B:
| Metric | Base Model | Fine-Tuned | Improvement | |--------|------------|------------|-------------| | Output Length | 1,071 tokens | 3,874 tokens | +262% | | Thinking Depth | 5/10 | 9/10 | +80% | | Creative Reasoning | 500 tokens | 3,400 tokens | +580% | | Craft Analysis | Generic | Professional | Qualitative leap |
🎯 Unique Value Proposition
This is not just a text generator - it's a creative thinking partner that externalizes the entire screenwriting process: from title analysis to character psychology to structural planning to final execution.
Training Details
Training Configuration
Model: Qwen/Qwen3-8B
Template: qwen3_thinking
Training Type: Full Parameter (all 8B parameters)
Max Length: 8192 tokens (for long thinking chains)
Batch Size: 1 per device × 2 GPUs
Gradient Accum: 8 steps (effective batch size: 16)
Learning Rate: 1e-5
Epochs: 3
Optimization: DeepSpeed Zero3 + Gradient Checkpointing
Liger Kernel, BF16 mixed precision
Loss Scale: ignore_empty_think
GPU Memory: ~74.62 GB per H100 (stable)
Dataset Characteristics
- Samples: 6,319 dramatic script continuations
- Average Length: ~5,000 tokens per sample
- Max Length: ~6,100 tokens
- Format: Conversations with
<think>...</think>reasoning tags - Content:
- Script opening scenes (title, description, initial dialogue)
- Extensive creative deliberation (3,000+ tokens of thinking)
- Script continuation with proper formatting
- Style: Dramatic, emotionally intense scenarios (conflicts, reconciliation, tragedy)
Training Metrics
- Final Loss: 0.844
- Average Loss: 0.978
- Loss Trajectory: 1.602 (start) → 0.82-0.83 (end)
- Training Speed: ~8 seconds/iteration
- Total Steps: 1,185
- Checkpoints: 5 saved (400, 800, 900, 1000, 1185)
Usage
Quick Start (ms-swift)
# Install ms-swift
pip install ms-swift
# Inference (interactive mode)
swift infer \
--ckpt_dir FutureMa/Qwen3-8B-Drama-Thinking \
--template qwen3_thinking \
--max_new_tokens 4096 \
--temperature 0.7
Python API
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import PtEngine, InferRequest, RequestConfig
# Initialize engine
engine = PtEngine(
model_id_or_path="FutureMa/Qwen3-8B-Drama-Thinking",
max_batch_size=1,
torch_dtype="bfloat16"
)
# Create prompt
prompt = """Title: The Reunion
Description: Two estranged siblings meet at their childhood home after 20 years.
INT. FAMILY LIVING ROOM - DAY
SARAH (35) stands by the window, looking out at the garden.
MICHAEL (38) enters, hesitant."""
# Generate (streaming)
messages = [{'role': 'user', 'content': prompt}]
request = InferRequest(messages=messages)
config = RequestConfig(max_tokens=4096, temperature=0.7, stream=True)
for response in engine.infer([request], config)[0]:
if response:
print(response.choices[0].delta.content, end='', flush=True)
Using with Transformers (requires adapting checkpoint)
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"FutureMa/Qwen3-8B-Drama-Thinking",
torch_dtype="bfloat16",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")
messages = [
{"role": "system", "content": "You are a creative screenwriter assistant specialized in dramatic storytelling with internal reasoning."},
{"role": "user", "content": "Write a scene about forgiveness..."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Example Output
Input Prompt:
Title: The Last Apology
Description: A daughter arrives at her estranged father's deathbed...
Model Output (excerpt):
<think>
Okay. Let's go. I'm sitting here with the opening of *The Last Apology* fresh in my mind,
and I need to figure out exactly where this goes next—not just the next scene, but the next
emotional beat, the next structural pivot...
First, the title: *The Last Apology*. That's not just poetic—it's structural. It tells me
this story is about delayed recognition, about the finality of words left unsaid...
[3,400 tokens of deep creative analysis including:]
- Title deconstruction and thematic implications
- Character psychology analysis
- Three-act structure planning
- Visual language and symbolism
- Multiple narrative paths considered
- Professional screenwriting techniques
</think>
INT. HOSPITAL ROOM - NIGHT
ANNA (28), in a wrinkled business suit, hesitates at the doorway.
DAVID (65) lies in bed, breathing labored...
[Script continues with proper formatting]
Intended Use
✅ Recommended Use Cases
- Screenwriting Education: Learn professional creative thinking process
- Script Ideation: Generate story frameworks and narrative alternatives
- Story Consulting: Explore "what if" scenarios with explicit reasoning
- Creative Brainstorming: Understand decision-making in storytelling
- Draft Development: Plan structure before execution
❌ Not Recommended For
- Final Shooting Scripts: Requires human refinement for production
- Comedy/Action Genres: Training bias toward dramatic content
- Long-form Series: Single-pass generation may lack consistency
- Immediate Production: Dialogue needs naturalization
Evaluation Results
Quantitative Metrics (vs. Base Model)
| Aspect | Score | Base Model | Improvement | |--------|-------|------------|-------------| | Thinking Depth | 9/10 | 5/10 | +80% | | Script Format | 9/10 | 8/10 | +13% | | Dramatic Craft | 8.5/10 | 8/10 | +6% | | Character Psychology | 9/10 | 6/10 | +50% | | Decision Transparency | 9/10 | 5/10 | +80% | | Overall | 8.1/10 | 6.9/10 | +17% |
Qualitative Improvements
- ✅ Professional Voice: Sounds like experienced screenwriter
- ✅ Structural Thinking: Explicit three-act planning
- ✅ Meta-Awareness: "This isn't just a script. It's a reckoning."
- ✅ Non-Linear Reasoning: Considers alternatives, backtracks, refines
- ✅ Craft-Oriented: Explains why choices serve the story
Limitations
-
Thinking Verbosity: Generates ~3,400 tokens of thinking (87% of output)
- May be excessive for quick tasks
- Consider using
max_new_tokensto limit length
-
Incomplete Execution: Token budget consumed by thinking
- Many planned scenes not fully generated
- May need 6,000-8,000 token limit for complete scripts
-
Dialogue Naturalness: More direct/literary than conversational
- Training data style influences output
- May need post-processing for natural speech
-
Training Data Bias: Skews toward melodramatic scenarios
- Less suited for subtle/realistic dialogue
- Best for emotionally intense stories
Training Insights
What Made This Successful
-
8192 Token Context: Essential for capturing full thinking chains
- Initial assumption of 2048 would have truncated data
- Average sample length: ~5,000 tokens
-
DeepSpeed Zero3: Required (not optional)
- Single H100: Would need ~109-114 GB (OOM)
- Zero3 sharding: ~74.62 GB per card ✅
-
Full Parameter Training: Worth the cost
- Deeper capability transfer than LoRA
- Better thinking process internalization
- Cost: $17.86 (2.8 hours) vs ~$5 for LoRA
-
Quality Training Data: 6,319 long-form reasoning examples
- Actual creative process in
<think>tags - High-quality dramatic writing
- Actual creative process in
Citation
@misc{qwen3-drama-thinking-2025,
author = {FutureMa},
title = {Qwen3-8B-Drama-Thinking: Full Parameter Fine-tuning for Creative Screenwriting},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/FutureMa/Qwen3-8B-Drama-Thinking}},
note = {Full parameter fine-tuning on 6,319 drama samples with explicit reasoning chains}
}
Acknowledgments
- Base Model: Qwen Team - Qwen3-8B
- Training Framework: ms-swift - ModelScope SWIFT
- Infrastructure: Lambda Cloud - 2x H100 80GB SXM5
- Dataset: Custom Drama Thinking Dataset (6,319 samples)
Model Card Contact
For questions or feedback:
- HuggingFace: @FutureMa
- GitHub Issues: Report via ms-swift repository
Training Date: 2025-12-08 Training Duration: 2h 46m Model Size: ~16GB (BF16 precision) Recommended VRAM: 16GB+ for inference
Author: FutureMa
Likes: 12
Downloads: 0
Tags: transformers, safetensors, qwen3, text-generation, thinking, creative-writing, screenwriting, drama, chain-of-thought, reasoning, ms-swift, full-parameter-finetuning, conversational, en, zh, dataset:custom-drama-thinking-dataset, base_model:Qwen/Qwen3-8B, base_model:finetune:Qwen/Qwen3-8B, license:apache-2.0, model-index, text-generation-inference, endpoints_compatible, region:us

