/ AI Image Generation / How to Train a LoRA Locally for Pony Diffusion with AMD GPU 2025
AI Image Generation 13 min read

How to Train a LoRA Locally for Pony Diffusion with AMD GPU 2025

Complete guide to training Pony Diffusion XL LoRAs on AMD GPUs using ROCm 6.2+ in 2025. Score tags, optimal parameters, and character training for Radeon cards.

How to Train a LoRA Locally for Pony Diffusion with AMD GPU 2025 - Complete AI Image Generation guide and tutorial

You have an AMD GPU and want to train custom character or style LoRAs for Pony Diffusion XL, but guides focus on NVIDIA hardware and Pony's unique score tag system adds confusion. Training Pony LoRAs on AMD GPUs works identically to SDXL training using ROCm 6.2+, with specific optimizations for Pony's Danbooru-trained base and quality-scoring system that dramatically impacts results.

Quick Answer: Training Pony Diffusion XL LoRAs on AMD GPUs follows SDXL workflows with ROCm 6.2+, Python 3.10, and Kohya's sd-scripts, but with Pony-specific score tags (score_9, score_8_up, etc.) that control quality. Recommended parameters include network dimension 32, alpha 16, batch size 2-4 if VRAM allows, 350-400 total training steps (repeats x images), and 10-12 epochs. The unique score system lets you train quality levels directly into LoRAs. RX 7900 XTX (24GB) and RX 6800 XT (16GB) both work with appropriate optimization.

Key Takeaways:
  • Pony Diffusion is SDXL-based, so same 16GB+ VRAM requirements apply
  • Score tags (score_9, score_8_up, etc.) control generation quality and must appear in training captions
  • Smaller network dimensions recommended (dim 32, alpha 16) than general SDXL
  • Target 350-400 total training steps via repeats × image count
  • Character LoRAs work well with 50-100 images, fewer than SDXL typically needs

What Makes Pony Diffusion Different from Standard SDXL?

Pony Diffusion XL (PDXL) represents a specialized SDXL fine-tune trained on Danbooru and e621 datasets, optimizing for furry, anime, and character generation. Understanding Pony's unique characteristics helps you train effective LoRAs.

The base architecture remains SDXL with identical technical specifications. Pony uses SDXL's dual text encoder structure, 1024x1024 native resolution, and similar parameter count. This means SDXL training infrastructure and hardware requirements apply directly to Pony.

Danbooru and e621 training data creates Pony's specialty in character generation, particularly anthropomorphic and furry characters. The training included quality-scored images from these communities, teaching the model to understand and generate content at specific quality levels.

The score tag system represents Pony's most distinctive feature. Tags like score_9, score_8_up, score_7_up down to score_4_up control generation quality. These tags originated from Danbooru's numerical scoring system where users rate images 1-10. Pony learned associations between these scores and visual quality characteristics.

Pony Score Tag Hierarchy:
  • score_9: Highest quality, masterpiece level
  • score_8_up: High quality, well-executed
  • score_7_up: Good quality, solid execution
  • score_6_up: Decent quality, acceptable
  • score_5_up and below: Lower quality levels (rarely used in training)

Using score tags in training captions adjusts the quality level your LoRA targets. Training with score_9 and score_8_up in captions teaches the LoRA to produce higher quality outputs. Training with lower scores risks learning lower quality characteristics.

Version evolution through Pony Diffusion v6 XL established current standards. This version provides the best balance of quality, flexibility, and training stability. Earlier versions exist but v6 XL represents the recommended base for LoRA training in 2025.

Character and furry specialization makes Pony excel where general SDXL struggles. Anthropomorphic characters, detailed fur textures, expressive animal features, and character consistency across variations all benefit from Pony's specialized training.

For users wanting character generation without training custom LoRAs, platforms like Apatero.com provide access to various pre-trained models through optimized interfaces.

How Do You Set Up AMD GPUs for Pony Training?

Pony training setup uses identical configuration to SDXL and Illustrious since all share SDXL architecture. If you've configured for either, your environment works for Pony immediately.

Hardware requirements match SDXL exactly. Minimum 16GB VRAM (RX 6800 XT, RX 6900 XT), comfortable at 20GB (RX 7900 XT), ideal at 24GB (RX 7900 XTX). No additional requirements exist because the model architectures are identical.

ROCm 6.2+ installation with PyTorch for ROCm 6.3 provides the foundation. Follow AMD's official installation guide, verify with rocm-smi, and set HSA_OVERRIDE_GFX_VERSION appropriately (11.0.0 for RDNA 3, 10.3.0 for RDNA 2).

Python 3.10 environment, Kohya sd-scripts installation, and dependency configuration follow standard SDXL procedures. Create a venv, install PyTorch for ROCm 6.3, install requirements, configure Accelerate, and add additional dependencies.

The tokenizer fix for SDXL applies to Pony identically. Edit ./sd-scripts/library/sdxl_train_util.py and change both TOKENIZER1_PATH and TOKENIZER2_PATH to "openai/clip-vit-large-patch14". Without this fix, training fails.

Model download for Pony Diffusion v6 XL happens from Civitai or HuggingFace. The base model weighs approximately 6-7GB. Download and place in your models directory alongside other SDXL checkpoints.

Pony AMD Training Requirements:
  • Identical to SDXL: 16GB VRAM minimum, 24GB recommended
  • Same ROCm 6.2+ and PyTorch setup as any SDXL training
  • Must apply tokenizer fix in sdxl_train_util.py
  • Download Pony Diffusion v6 XL base model (6-7GB)
  • Training takes 2-5 hours for character LoRAs depending on dataset size

What Training Parameters Work Best for Pony on AMD?

Pony training parameters differ slightly from general SDXL based on community findings optimized for the model's characteristics. These settings produce quality character and style LoRAs.

Network dimension recommendations skew lower for Pony than general SDXL. Dimension 32 with alpha 16 works well for most character LoRAs. This lower dimension reflects Pony's specialized training and prevents overfitting that larger dimensions risk.

Batch size can increase to 2-4 if VRAM allows, unlike strict batch size 1 for general SDXL on AMD. Pony's training characteristics make larger batches more stable. On 24GB RX 7900 XTX, batch size 4 works reliably with dimension 32. On 16GB cards, stick with batch size 1-2.

Free ComfyUI Workflows

Find free, open-source ComfyUI workflows for techniques in this article. Open source is strong.

100% Free MIT License Production Ready Star & Try Workflows

Total training steps target 350-400 via the formula: repeats × image count ÷ batch size × epochs = total steps. For example, 50 images with 8 repeats at batch size 2 for 12 epochs yields (50 × 8 ÷ 2) × 12 = 2400 steps. Adjust repeats and epochs to hit the 350-400 step range for the sweet spot.

Recommended Pony AMD Parameters:
  • Network dimension: 32 (most cases)
  • Network alpha: 16 (half of dimension)
  • Learning rate: 1e-4 with cosine scheduler
  • Batch size: 2-4 (24GB GPU), 1-2 (16GB GPU)
  • Epochs: 10-12
  • Total steps: 350-400 target
  • Resolution: 1024x1024 standard

Learning rate of 1e-4 works well with cosine scheduler. Pony responds predictably to standard SDXL learning rates without requiring special adjustments. Some users report success with slightly lower rates (8e-5) for particularly complex subjects.

Dataset size for Pony character LoRAs typically ranges 50-100 images, larger than Illustrious but still smaller than photorealistic SDXL subjects. The furry and character focus benefits from showing subjects in diverse poses, expressions, and contexts across these images.

Resolution stays at 1024x1024 as Pony trains at SDXL's native resolution. Avoid lowering resolution even on 16GB cards, instead using aggressive caching to manage memory.

Caching configuration remains critical. Enable all caching with disk storage: --cache_latents --cache_latents_to_disk --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk. These options are essential for 16GB cards.

Optimizer choice favors AdamW8bit for memory efficiency without quality loss. This works reliably across all Pony training scenarios on AMD GPUs.

How Do You Use Score Tags in Pony Training Captions?

Score tags represent Pony's unique captioning requirement that dramatically affects LoRA quality. Proper score tag usage ensures your LoRA generates at desired quality levels.

Include score tags at the beginning of every training caption. Start with score_9, score_8_up, score_7_up for high-quality LoRAs. This tells the model to associate your subject with top-tier quality characteristics.

Want to skip the complexity? Apatero gives you professional AI results instantly with no technical setup required.

Zero setup Same quality Start in 30 seconds Try Apatero Free
No credit card required

The cumulative nature means score_8_up includes all scores from 8-10, score_7_up includes 7-10, etc. Including multiple tags like score_9, score_8_up, score_7_up reinforces quality across the range rather than targeting a single narrow level.

Example Pony Caption Formats:
  • Character LoRA: `score_9, score_8_up, score_7_up, 1girl, character_name, blue_eyes, long_blonde_hair, detailed fur, standing, forest background`
  • Furry character: `score_9, score_8_up, anthro, wolf, character_name, grey_fur, green_eyes, casual_clothing, smile, outdoor_setting`
  • Style LoRA: `score_9, score_8_up, score_7_up, 1girl, detailed_shading, painterly_style, warm_colors, atmospheric_lighting`

Consistency across captions matters more than perfection. Use the same score tag combination for all images in your dataset. Mixing score levels confuses the LoRA about what quality to target.

Additional Danbooru tags follow score tags using standard conventions. Character count tags (1girl, 1boy, anthro), feature descriptions (hair, eyes, fur, clothing), pose and action tags, and setting descriptions all work as expected from Danbooru systems.

Quality descriptors like masterpiece, best quality, highly detailed can supplement score tags but aren't replacements. Score tags provide Pony's specific quality control mechanism, while descriptive quality terms add additional guidance.

Negative concepts still don't belong in training captions. Never describe what's not in images. Training captions should positively describe what exists.

Caption length can be substantial thanks to SDXL's 225-token capacity. Use 30-60 tags including score tags, character features, and scene descriptions for detailed training.

What Is a Complete Pony Training Command Example?

A typical Pony character LoRA training command for AMD GPUs combines SDXL structure with Pony-optimized parameters.

Example command: accelerate launch --mixed_precision="fp16" sdxl_train_network.py --pretrained_model_name_or_path="/path/to/ponyDiffusionV6XL.safetensors" --train_data_dir="./train" --output_dir="./output" --output_name="pony_character_LoRA" --network_module="networks.lora" --network_dim=32 --network_alpha=16 --learning_rate=1e-4 --lr_scheduler="cosine" --max_train_epochs=12 --save_every_n_epochs=2 --train_batch_size=2 --max_token_length=225 --xformers=False --cache_latents --cache_latents_to_disk --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --no_half_vae --mixed_precision="fp16" --optimizer_type="AdamW8bit" --gradient_checkpointing --persistent_data_loader_workers --resolution="1024,1024".

Join 115 other course members

Create Your First Mega-Realistic AI Influencer in 51 Lessons

Create ultra-realistic AI influencers with lifelike skin details, professional selfies, and complex scenes. Get two complete courses in one bundle. ComfyUI Foundation to master the tech, and Fanvue Creator Academy to learn how to market yourself as an AI creator.

Early-bird pricing ends in:
--
Days
:
--
Hours
:
--
Minutes
:
--
Seconds
51 Lessons • 2 Complete Courses
One-Time Payment
Lifetime Updates
Save $200 - Price Increases to $399 Forever
Early-bird discount for our first students. We are constantly adding more value, but you lock in $199 forever.
Beginner friendly
Production ready
Always updated

Key Pony-specific settings include lower network dimension (32 instead of 48-64), potentially higher batch size (2 instead of 1), and targeting 12 epochs to hit the 350-400 total training steps sweet spot with appropriate repeats.

Calculate your repeats based on image count and desired total steps. For 50 images targeting 400 total steps with batch size 2 over 12 epochs: 400 = (50 × repeats ÷ 2) × 12, solving gives repeats ≈ 1-2. Adjust based on your specific numbers.

Sample generation uses --sample_every_n_epochs=2 --sample_prompts="./pony_samples.txt". Create sample_prompts.txt with prompts including score tags and your trigger word to test LoRA effectiveness.

Training time on RX 7900 XTX with 50 images at 12 epochs takes approximately 2-4 hours. RX 6800 XT with 16GB takes 4-6 hours due to more conservative settings required by VRAM constraints.

Frequently Asked Questions

Why do score tags matter so much for Pony Diffusion?

Pony was trained on quality-scored images from Danbooru and e621, learning associations between score values and visual quality characteristics. The model understands score_9 means masterpiece-level quality and generates accordingly. Training LoRAs without score tags results in unpredictable quality because you haven't specified what quality level to target. Including score tags in training captions directly teaches your LoRA to generate at those quality levels.

Can I use Pony training setup for other SDXL models?

Yes absolutely. The setup (ROCm 6.2+, Python 3.10, Kohya sd-scripts) works identically across all SDXL-based models including base SDXL, Illustrious, and any SDXL fine-tune. Only the base model path and parameter choices differ. If your AMD environment works for Pony, it works for any SDXL variant. This makes your setup investment worthwhile across the entire SDXL ecosystem.

What network dimension should I use for style LoRAs versus character LoRAs?

Character LoRAs work well with dimension 32 and alpha 16 for Pony. Style LoRAs capturing artistic treatments or rendering techniques benefit from slightly higher dimensions like 48-64 to capture nuanced style elements. Start with 32 for characters, try 48 for styles, and only increase further if results seem under-capacity. Higher dimensions risk overfitting and larger file sizes without proportional quality gains.

How many images do I actually need for a Pony character LoRA?

50-100 images typically produce quality character LoRAs on Pony. This is more than Illustrious (10-20) but less than photorealistic SDXL (20-40). Pony's character specialization means it learns efficiently from moderate datasets. Ensure images show diverse poses, expressions, and contexts rather than repetitive similar shots. Quality and diversity matter more than hitting specific image counts.

Do I need to include all score levels in captions?

Not necessarily, but including score_9, score_8_up, score_7_up covers the high-quality range comprehensively. This combination reinforces quality across multiple levels rather than targeting narrowly. Some users successfully use only score_9, score_8_up, but the three-tag combination is most common in community guides. Avoid going below score_7_up unless deliberately training for specific lower-quality characteristics.

Can batch size really be higher than 1 on AMD for Pony?

Yes, Pony training characteristics make larger batches more stable than general SDXL. On 24GB cards like RX 7900 XTX, batch size 2-4 works reliably with dimension 32 and standard caching. Even on 16GB RX 6800 XT, batch size 2 is possible with aggressive optimization. Start with batch size 1 and increase if VRAM monitoring shows headroom. Larger batches can slightly improve training stability.

What if I trained without score tags and results are poor?

Retrain with score tags added to all captions. The score system is fundamental to how Pony interprets quality, and omitting it creates unpredictable results. You can't fix a LoRA trained without score tags through inference prompts. The quality levels need to be trained into the LoRA itself. This is a common mistake that requires retraining to correct properly.

Should I use different learning rates for UNET and Text Encoder like Illustrious?

For Pony, separate learning rates aren't typically necessary. The standard single learning rate of 1e-4 works reliably. Unlike Illustrious where separate rates (0.0003 UNET, 0.00003 TE) are recommended, Pony doesn't require this split. Stick with unified 1e-4 unless you have specific advanced reasons to experiment with separate rates.

Can I mix Pony LoRAs with SDXL or Illustrious LoRAs during generation?

Technically yes, as all are SDXL architecture LoRAs, but practical compatibility varies. LoRAs trained on Pony work best with Pony base models. Using Pony LoRAs on base SDXL or Illustrious may produce suboptimal results due to different training distributions. Similarly, SDXL or Illustrious LoRAs on Pony bases may not perform as expected. Train LoRAs specifically for the base model you'll use during generation.

Does Pony work better than Illustrious for anime characters?

Different strengths suit different needs. Illustrious excels at traditional anime/manga aesthetics and human anime characters. Pony excels at furry, anthropomorphic characters, and character-focused generation with broader style flexibility. For pure human anime characters, Illustrious often produces better results. For anthro characters or when you need the score quality system, Pony is superior. Your specific content determines which works better.

Succeeding with Pony Diffusion Training on AMD Hardware

Pony Diffusion XL training on AMD GPUs leverages the same robust SDXL infrastructure shared across the entire SDXL ecosystem. The identical hardware requirements (16GB+ VRAM), ROCm setup, and Kohya workflows make Pony accessible to anyone with SDXL training configured.

The unique score tag system represents Pony's key differentiator, providing direct quality control through training captions. Understanding and properly implementing score tags dramatically improves LoRA quality compared to treating Pony like generic SDXL.

Lower network dimensions (32 vs 48-64) and potentially larger batch sizes (2-4 vs 1) distinguish Pony parameters from standard SDXL training. These adjustments reflect community-discovered optimizations for Pony's specific characteristics.

For users wanting character generation without training custom LoRAs, platforms like Apatero.com provide access to various pre-trained models through streamlined interfaces, eliminating setup complexity.

As specialized SDXL fine-tunes continue proliferating, understanding the common foundation (SDXL architecture, AMD ROCm setup, Kohya training) while recognizing model-specific optimizations (score tags for Pony, Danbooru for Illustrious) positions you to train effectively across the ecosystem. AMD GPU users benefit equally from these specializations through the mature ROCm foundation enabling training across all SDXL variants.

Ready to Create Your AI Influencer?

Join 115 students mastering ComfyUI and AI influencer marketing in our complete 51-lesson course.

Early-bird pricing ends in:
--
Days
:
--
Hours
:
--
Minutes
:
--
Seconds
Claim Your Spot - $199
Save $200 - Price Increases to $399 Forever