What will I learn from this ai video tutorial?

Complete guide to training WAN 2.2 LoRAs for consistent person/character video generation. Dataset prep, optimal settings, and pro techniques. This comprehensive guide covers all the essential concepts and practical steps you need to master ai video.

Is this ai video tutorial suitable for beginners?

This tutorial is designed to be accessible for learners at various skill levels. We provide clear explanations and step-by-step instructions to help you understand ai video concepts effectively.

How long does it take to complete this ai video tutorial?

This tutorial has an estimated reading time of 10 minutes. However, we recommend taking additional time to practice the concepts and techniques covered to fully master the material.

Where can I find more ai video tutorials and resources?

You can find more ai video tutorials in our AI Video category section. We also recommend exploring our related articles and following our blog for the latest updates on ai video techniques and best practices.

/ AI Video / How to Train WAN 2.2 LoRA for Person: The Pro Method That Actually Works

AI Video • December 22, 2025 • 10 min read

How to Train WAN 2.2 LoRA for Person: The Pro Method That Actually Works

Complete guide to training WAN 2.2 LoRAs for consistent person/character video generation. Dataset prep, optimal settings, and pro techniques.

WAN 2.2 LoRA training for person and character video generation

Training LoRAs for WAN 2.2 is nothing like training them for image models. I learned this the hard way after wasting 40+ hours applying SD LoRA techniques to video generation. The dual-model architecture, the motion considerations, the dataset requirements. Everything is different.

Quick Answer: WAN 2.2 person LoRA training requires sigmoid time step scheduling, 10-30 varied images/clips, 3000-5000 training steps at 0.0002 learning rate, and Differential Output Preservation set to "person" for character training. Use AI Toolkit or diffusion-pipe for the actual training.

Key Takeaways:

Use Sigmoid time step type for person/character training specifically
10-30 images/clips with varied poses, lighting, and backgrounds
3000-5000 steps at 0.0002 learning rate for faster convergence
DOP (Differential Output Preservation) set to "person" preserves base realism
Training produces TWO LoRAs: high_noise and low_noise for different generation phases
Expect 24-72 hours training time depending on hardware

Why WAN 2.2 LoRA Training Is Different

Let me explain the architecture first, because this is why standard LoRA techniques fail.

Learning ComfyUI? Join 115 other course members

51 lessons covering ComfyUI + AI influencer marketing. Early-bird pricing ends soon.

WAN 2.2 uses a Mixture of Experts (MoE) architecture with separate handling for high-noise and low-noise generation phases. When you train a LoRA, you're actually training two specialized models:

high_noise_lora: Optimized for initial motion planning and temporal structure low_noise_lora: Optimized for refining motion details and smooth transitions

Both get applied during generation at different stages. If you only train one or use the wrong settings, your person LoRA won't transfer the identity properly across video frames.

This is why SD LoRA knowledge doesn't translate directly. Different architecture, different training approach.

Hardware Reality Check

I'll be honest about the hardware requirements because I've seen people start this process without understanding the commitment.

Minimum viable:

24GB VRAM GPU (RTX 3090/4090 or A6000)
64GB system RAM
500GB+ fast storage
Training time: 2-3 days

Comfortable setup:

48GB+ VRAM (A6000, dual GPUs)
128GB system RAM
NVMe storage
Training time: 12-24 hours

Low VRAM option (16-24GB):

Enable VRAM block swapping
Significantly longer training times
Works, just slower

On an NVIDIA A6000 with 96GB VRAM, training took me about 24 hours for a solid person LoRA. Consumer hardware takes 2-3 days but absolutely works.

Heads Up: WAN 2.2 training consumes more time than WAN 2.1 due to the high+low noise dual model approach. Plan accordingly and consider starting training overnight.

Dataset Preparation: The Most Important Step

Here's the thing. Your LoRA will only be as good as your dataset. I've seen people spend 48 hours training on garbage data and wonder why results are bad.

Image Requirements for Person LoRAs

Quantity: 10-30 high-quality images or short clips

Variety is critical:

Multiple poses (front, side, three-quarter angles)
Different backgrounds (don't let the model learn background = person)
Various lighting conditions
Different expressions if relevant
Multiple outfits unless clothing is part of identity

Quality requirements:

Clear, well-lit images
Subject fully visible (not cropped awkwardly)
No heavy filters or extreme stylization
Consistent person across all images
Resolution: at least 512x512, ideally higher

Video Clip Requirements

If training from video clips instead of images:

4-8 seconds per clip
256x256 minimum resolution during training
12-20 high-quality clips recommended
Clips should show representative motions you want to reproduce
Avoid clips with occlusion or motion blur

Caption Format

Every image/clip needs captions. This is the structure that works:

[trigger token], [description of person], [description of scene/pose]

Example:

zxq-person, a woman with long dark hair, wearing a blue dress, standing in a garden, natural lighting, full body shot

The trigger token (like "zxq-person") should be unique and not a real word. This becomes how you invoke the LoRA during generation.

Captioning tips:

Include appearance details that define the person
Describe clothing, lighting, and framing
Be consistent with terminology across captions
Don't include elements you don't want the LoRA to learn

Training Tools and Setup

AI Toolkit Method

AI Toolkit is the most common approach for WAN 2.2 LoRA training. Here's the configuration that works:

Free ComfyUI Workflows

Find free, open-source ComfyUI workflows for techniques in this article. Open source is strong.

100% Free MIT License Production Ready Star & Try Workflows

Key settings:

model_type: wan2.2
time_step_type: sigmoid  # Critical for person training
learning_rate: 0.0002    # Higher than default for faster training
max_train_steps: 5000    # More steps = better results, longer time
dop_preservation_class: person  # Enables DOP for character training

Sigmoid vs other time step types:

Sigmoid is specifically designed for person/character training
Produces better identity preservation
Other types (uniform, cosine) work better for style/motion LoRAs

Diffusion-pipe Method

Alternative to AI Toolkit, particularly popular with some researchers.

Setup requires:

Enable WSL (Windows) or use native Linux
Install Ubuntu and diffusion-pipe
Configure training parameters

Diffusion-pipe gives you more low-level control but has a steeper learning curve. I recommend AI Toolkit for first-time trainers.

DOP (Differential Output Preservation)

This is the secret sauce for person LoRAs that actually look good.

DOP helps maintain WAN's strong base realism while learning your specific person. Without it, LoRAs often degrade overall quality while learning the new identity.

How to configure:

Set preservation class to "person" for character training
For style LoRAs, use different preservation classes
The model preserves base capabilities while learning new content

I cannot stress enough how much DOP improved my results. Before using it, my person LoRAs had a "uncanny valley" quality. After, they maintain WAN's natural motion and appearance.

Want to skip the complexity? Apatero gives you professional AI results instantly with no technical setup required.

Zero setup Same quality Start in 30 seconds Try Apatero Free

No credit card required

The Training Process

Step 1: Prepare Your Dataset

Collect 10-30 images/clips of your person
Ensure variety in poses, lighting, backgrounds
Create captions for each with your trigger token
Organize in the expected folder structure

Step 2: Configure Training

## Example AI Toolkit config for person LoRA
base_model: wan2.2_14b
time_step_type: sigmoid
learning_rate: 0.0002
max_train_steps: 5000
batch_size: 1
gradient_accumulation_steps: 4
dop_preservation_class: person
save_every_n_steps: 1000

Step 3: Start Training

python train.py --config your_config.yaml

And wait. This is the part nobody warns you about. Training takes a long time. I recommend:

Start training in the evening
Let it run overnight or over a weekend
Monitor early checkpoints for sanity checking

Step 4: Evaluate Checkpoints

LoRA saves happen at intervals (every 1000 steps in my config). Test these intermediate checkpoints:

Does the person identity transfer?
Is video quality maintained?
Are there artifacts or distortions?

Often the best checkpoint isn't the final one. Training can overfit. Test and compare.

Step 5: Use Your LoRA

Load both high_noise and low_noise LoRAs in your workflow. They apply to different phases of generation automatically.

My Actual Results and Lessons

I've trained about 15 person LoRAs for WAN 2.2 now. Here's what I've learned:

What works:

More images beat fewer images (until around 30, then diminishing returns)
Background variety is crucial to avoid background bleed
Higher learning rate (0.0002) produces faster convergence without quality loss
DOP is non-negotiable for character work
Testing checkpoints saves time vs. always using final

What doesn't work:

Using single-outfit datasets (LoRA learns outfit = identity)
Training without DOP (quality degrades)
Too few steps (underfitting, weak identity)
Too many steps (overfitting, artifacts)
Ignoring the high/low noise dual model nature

Time investment reality: My first LoRA took a week of trial and error. Now I can prep data and configure training in an afternoon, then let it run. The process is front-loaded with learning curve.

Cloud Training Options

Not everyone has 24GB+ VRAM sitting around. Cloud options exist:

Creator Program

Earn Up To $1,250+/Month Creating Content

Join our exclusive creator affiliate program. Get paid per viral video based on performance. Create content in your style with full creative freedom.

$100

300K+ views

$300

1M+ views

$500

5M+ views

Apply Now - Start Earning

Weekly payouts

No upfront costs

Full creative freedom

WaveSpeedAI: They offer WAN 2.2 14B LoRA trainers with claims of 10x faster training. Worth considering if you lack local hardware.

RunComfy: Cloud ComfyUI with LoRA training capabilities. More accessible for those already familiar with the platform.

MimicPC: AI Toolkit hosting with WAN 2.2 base model support.

The cost/benefit depends on how many LoRAs you plan to train. For one or two, cloud is probably cheaper. For ongoing work, local hardware pays off.

Using Person LoRAs Effectively

Once you have a trained LoRA, application matters:

Strength settings:

Start at 0.7 and adjust
Too high: artifacts, frozen motion
Too low: weak identity preservation

Combining with other techniques:

LoRA + IPAdapter for additional face reference
LoRA + ControlNet for pose control
LoRA + base prompt for scene variation

Prompt usage: Always include your trigger token when using the LoRA:

zxq-person walking through a forest, cinematic lighting...

Without the trigger, the LoRA may not activate properly.

What Apatero Can Do for Training

Full disclosure: I'm involved with Apatero. Currently, Apatero focuses on inference (using models) rather than training (creating models). But LoRA training is something we're looking at.

For now, if you train locally and want to use your LoRA with Apatero's workflows, that's a conversation to have. The platform is primarily designed around pre-loaded models, but custom LoRA support is technically feasible.

Frequently Asked Questions

How long does WAN 2.2 LoRA training take?

With consumer hardware (RTX 4090), expect 2-3 days for 5000 steps. With enterprise hardware (A6000+), 12-24 hours. Low VRAM setups with block swapping take longer.

Can I use SD LoRA training techniques?

Not directly. WAN 2.2's dual-model architecture requires different approaches. Sigmoid time stepping and DOP are specific to video LoRA training.

How many images do I really need?

10-30 works well for person LoRAs. Under 10 tends to underfit. Over 30 has diminishing returns unless you need extreme variety.

Why does my LoRA produce artifacts?

Usually overfitting (too many steps) or dataset issues (poor quality images, insufficient variety). Try earlier checkpoints or improve your dataset.

What's the difference between high_noise and low_noise LoRAs?

High_noise handles initial structure and motion planning. Low_noise handles detail refinement and smooth transitions. Both are needed for complete results.

Can I train on video clips instead of images?

Yes, and it often works better for motion-focused LoRAs. Use 4-8 second clips at 256x256+ resolution. Around 12-20 clips is a good amount.

Final Thoughts

WAN 2.2 LoRA training is a commitment. The hardware requirements are significant, the training times are long, and the learning curve is real. But the results are worth it.

A well-trained person LoRA means consistent character identity across generated videos. No more hoping the model remembers what your character looks like. No more face drift between clips. Reliable consistency.

If you're doing production work with recurring characters, invest the time to learn this. The upfront cost pays off every time you generate without fighting identity consistency.

Start with one LoRA. Learn the process. Optimize your dataset and settings. Then scale up. The knowledge transfers to future training once you've done it right the first time.

Ready to Create Your AI Influencer?

Join 115 students mastering ComfyUI and AI influencer marketing in our complete 51-lesson course.

Early-bird pricing ends in:

Days

Hours

Minutes

Seconds

Claim Your Spot - $199

Save $200 - Price Increases to $399 Forever

#wan-2.2 #lora-training #ai-video #character-consistency #comfyui

AI video denoising and restoration complete guide for fixing noisy footage

AI Video • January 8, 2026

AI Video Denoising and Restoration: Complete Guide to Fixing Noisy Footage (2025)

Master AI video denoising and restoration techniques. Fix grainy footage, remove artifacts, restore old videos, and enhance AI-generated content with professional tools.

#video denoising #video restoration

AI video generation for adult content in 2025

AI Video • December 22, 2025

AI Video Generation for Adult Content: What Actually Works in 2025

Practical guide to generating NSFW video content with AI. Tools, workflows, and techniques that produce usable results for adult content creators.

#ai-video #nsfw-video