当前位置：首页 > article >正文

Stable Diffusion 实战教程：从安装到图像生成

article 2026/5/22 0:28:27

Stable Diffusion 实战教程从安装到图像生成前言Stable Diffusion 是当前最流行的开源图像生成模型之一。它能够根据文字描述生成高质量的图像在创意设计、游戏开发等领域有广泛应用。我在多个项目中使用过 Stable Diffusion从简单的图像生成到风格迁移。今天分享完整的实战指南。环境准备# 创建虚拟环境 conda create -n sd python3.10 conda activate sd # 安装依赖 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 pip install diffusers transformers accelerate safetensors pip install gradio # 用于可视化基础使用文本到图像from diffusers import StableDiffusionPipeline import torch # 加载模型 pipe StableDiffusionPipeline.from_pretrained( runwayml/stable-diffusion-v1-5, torch_dtypetorch.float16 ).to(cuda) # 生成图像 prompt a beautiful sunset over the ocean, golden hour, photorealistic image pipe(prompt).images[0] # 保存图像 image.save(sunset.png)控制生成参数def generate_image( prompt: str, negative_prompt: str None, num_inference_steps: int 50, guidance_scale: float 7.5, seed: int None ) - Image: 生成图像 generator torch.Generator(cuda).manual_seed(seed) if seed else None image pipe( promptprompt, negative_promptnegative_prompt, num_inference_stepsnum_inference_steps, guidance_scaleguidance_scale, generatorgenerator ).images[0] return image # 使用示例 image generate_image( prompta cute cat playing with a ball, negative_promptugly, blurry, low quality, num_inference_steps30, guidance_scale7.5, seed42 )高级技巧图像到图像from diffusers import StableDiffusionImg2ImgPipeline from PIL import Image # 加载图像到图像模型 img2img_pipe StableDiffusionImg2ImgPipeline.from_pretrained( runwayml/stable-diffusion-v1-5, torch_dtypetorch.float16 ).to(cuda) # 加载输入图像 init_image Image.open(input.jpg).convert(RGB) init_image init_image.resize((512, 512)) # 生成 prompt turn this photo into a painting in the style of Van Gogh image img2img_pipe( promptprompt, imageinit_image, strength0.75 ).images[0] image.save(output.png)深度引导from diffusers import StableDiffusionDepth2ImgPipeline # 加载深度模型 depth_pipe StableDiffusionDepth2ImgPipeline.from_pretrained( stabilityai/stable-diffusion-2-depth, torch_dtypetorch.float16 ).to(cuda) # 使用深度图引导 prompt a futuristic city skyline image depth_pipe( promptprompt, imageinit_image, depth_mapNone # 自动计算深度 ).images[0]模型微调准备数据集from datasets import load_dataset # 加载数据集 dataset load_dataset(lambdalabs/pokemon-blip-captions) # 预处理 def preprocess(examples): images [image.convert(RGB).resize((512, 512)) for image in examples[image]] return {images: images, captions: examples[text]} dataset dataset.map(preprocess, batchedTrue)训练脚本from diffusers import StableDiffusionPipeline from diffusers.training_utils import set_seed # 设置种子 set_seed(42) # 加载模型 model_id runwayml/stable-diffusion-v1-5 pipe StableDiffusionPipeline.from_pretrained(model_id) # 配置训练参数 training_args { output_dir: ./pokemon-model, per_device_train_batch_size: 4, gradient_accumulation_steps: 4, learning_rate: 1e-5, num_train_epochs: 10, logging_steps: 10, save_steps: 100 } # 开始训练简化示例 # trainer.train()Web UI 部署import gradio as gr def generate(prompt, negative_prompt, steps, scale): 生成图像 image pipe( promptprompt, negative_promptnegative_prompt, num_inference_stepssteps, guidance_scalescale ).images[0] return image # 创建界面 with gr.Blocks() as demo: gr.Markdown(# Stable Diffusion Demo) with gr.Row(): with gr.Column(): prompt gr.Textbox(labelPrompt) negative_prompt gr.Textbox(labelNegative Prompt) steps gr.Slider(minimum10, maximum100, value50, labelSteps) scale gr.Slider(minimum1, maximum20, value7.5, labelGuidance Scale) generate_btn gr.Button(Generate) with gr.Column(): output gr.Image(labelOutput) generate_btn.click(generate, inputs[prompt, negative_prompt, steps, scale], outputsoutput) demo.launch()常见问题显存不足# 解决方案使用安全模式 pipe.enable_attention_slicing() # 或使用 CPU 卸载 pipe.enable_model_cpu_offload() # 或减少 batch size pipe.set_progress_bar_config(disableTrue)生成质量差# 提高质量的技巧 # 1. 使用更高的 steps # 2. 调整 guidance_scale # 3. 添加详细的 negative prompt # 4. 使用更好的模型如 SDXL总结Stable Diffusion 是强大的图像生成工具基础用法文本到图像的简单生成高级技巧图像到图像、深度引导微调适应特定风格或主题部署构建 Web 应用关键要点提示词质量直接影响生成结果negative prompt 很重要调整参数需要经验大显存 GPU 能显著提升速度

Stable Diffusion 实战教程：从安装到图像生成

相关文章：

Stable Diffusion 实战教程：从安装到图像生成

多模态大模型技术入门：让 AI 看见世界

“--tile”失效了？深度逆向Midjourney纹理无缝拼接底层逻辑（含Python自动化Tile校验脚本）

技术人的人际关系：建立良好的职业网络

LangFuse与LangSmith区别

Belkin向范围3排放碳中和目标迈进

数字图像质量提升技术【附代码】

涡流检测驱动的发动机气门硬度分选技术【附算法】

大规模数据降维中迹比率问题与非负矩阵分解的快速算法【附代码】

计算机视觉与深度学习融合的群养猪行为识别与分类算法【附算法】

RAG 和 NotebookLM 都试过后，我才发现数据库知识库真正缺的不是搜索

LangGraph Reducer 深度应用：为什么你的 State 合并总是出问题？

Kimi LeetCode 2547. 拆分数组的最小代价 C++实现

8.C# —— 随机数、DateTime时间、字符串

实测在ubuntu环境下调用taotoken api的延迟与稳定性表现

长期使用中观察Taotoken账单的透明度与预测准确性

开源 AI Agent Harness Engineering 模型与闭源模型的对比

软件开发行业的未来：AI编程将如何改变开发行业

智慧校园之考场作弊事实识别图像数据集考试作弊识别监控学生作弊识别系统数据集 AI识别作弊数据集

智慧无人机航拍巡检数据集红外行人车辆识别数据集行人车辆计数图像识别红外建筑物识别夜间低光环境下视觉感知算法安防、交通等领域红外视觉任务第10355期

抖音无水印下载器：高效保存高清视频与图集的完整解决方案

py之代码实现获取字符串中每个字符的unicode值

py每日spider案例之netease搜索接口获取

pubnub代码示例

c语言之pubnub库代码示例

《科技代替了我工作》的传播入口：技术焦虑如何落到听众

知识竞赛裁判怎么当？评分标准与争议处理

从被动响应到主动行动：AI Agent的自主性革命

聊一聊5家软件许可优化公司，哪个更适合你？

从零开始：5分钟掌握Mermaid Live Editor，告别复杂图表绘制烦恼