LTX-Video 설치 및 사용 가이드 (Mac)

개요

이 가이드는 Mac(Apple Silicon) 환경에서 LTX-Video를 설치하고 사용하는 방법을 설명한다. LTX-Video는 HuggingFace diffusers 라이브러리를 통해 Python 코드로 실행할 수 있으며, M1/M2/M3/M4 칩의 MPS(Metal Performance Shaders) 백엔드를 활용해 GPU 가속 추론이 가능하다.

CUDA GPU 환경보다는 생성 속도가 느리지만, 별도의 외장 GPU 없이 Mac에서 로컬로 영상 생성을 실행할 수 있다는 점이 장점이다.

사전 요구사항

하드웨어

Apple Silicon Mac (M1 / M2 / M3 / M4 계열) 권장
최소 16GB 통합 메모리 (32GB 이상 권장)
2B 경량 모델 기준 약 8~10GB VRAM 소비

소프트웨어

macOS 12.3 (Monterey) 이상
Python 3.10.5 이상
Homebrew (선택 사항, pyenv 설치에 활용)

설치 방법

1단계: Python 가상 환경 준비

# pyenv로 Python 3.11 설치 (권장)
brew install pyenv
pyenv install 3.11.9
pyenv local 3.11.9

# 가상환경 생성 및 활성화
python -m venv ltxvideo-env
source ltxvideo-env/bin/activate

2단계: PyTorch 설치 (MPS 지원 버전)

Mac MPS 백엔드는 PyTorch 2.3.0 이상에서 안정적으로 동작한다.

pip install --upgrade pip

# PyTorch 최신 안정 버전 설치 (macOS용)
pip install torch torchvision torchaudio

MPS 지원은 PyTorch 2.3.0 또는 2.6.0 이상을 권장한다. 설치 후 torch.backends.mps.is_available()로 MPS 사용 가능 여부를 확인할 수 있다.

3단계: diffusers 및 의존성 설치

# 최신 diffusers 설치 (LTX-Video 파이프라인 포함)
pip install --upgrade diffusers transformers accelerate

# 영상 저장을 위한 추가 패키지
pip install imageio imageio-ffmpeg opencv-python

4단계: 설치 확인

import torch
import diffusers

print(f"PyTorch 버전: {torch.__version__}")
print(f"diffusers 버전: {diffusers.__version__}")
print(f"MPS 사용 가능: {torch.backends.mps.is_available()}")
print(f"MPS 빌드됨: {torch.backends.mps.is_built()}")

Apple Silicon MPS 설정

MPS 디바이스를 사용하면 Mac의 통합 GPU를 활용해 CPU 대비 빠른 추론이 가능하다.

import torch

# MPS 디바이스 설정
if torch.backends.mps.is_available():
    device = torch.device("mps")
    print("MPS(Apple Silicon GPU) 사용 중")
elif torch.cuda.is_available():
    device = torch.device("cuda")
    print("CUDA GPU 사용 중")
else:
    device = torch.device("cpu")
    print("CPU 사용 중 (느림)")

MPS 메모리 관리 팁

Mac의 통합 메모리 아키텍처 특성상, 메모리 압박이 심해지면 시스템이 자동으로 스왑을 시작해 성능이 급격히 저하된다.

# 어텐션 슬라이싱으로 메모리 사용량 줄이기
pipeline.enable_attention_slicing()

# VAE 타일링으로 고해상도 처리 시 메모리 절약
pipeline.vae.enable_tiling()

# 사용 후 메모리 해제
import gc
gc.collect()
torch.mps.empty_cache()

T2V(Text-to-Video) 기본 사용법

기본 텍스트-영상 생성

import torch
from diffusers import LTXPipeline
from diffusers.utils import export_to_video

# 모델 로드 (첫 실행 시 자동 다운로드, 수 GB)
pipeline = LTXPipeline.from_pretrained(
    "Lightricks/LTX-Video",
    torch_dtype=torch.bfloat16
)
pipeline.enable_attention_slicing()
pipeline.to("mps")  # Apple Silicon GPU 사용

prompt = """
A serene mountain lake at sunrise, with golden light reflecting on the calm water surface.
Pine trees line the shore, and a light mist hovers above the water.
The camera slowly pans from left to right, revealing the full panoramic view.
"""
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"

# 영상 생성
# num_frames는 8n + 1 형태여야 함 (예: 25, 49, 97, 121, 161)
video = pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=768,
    height=512,
    num_frames=97,           # 약 4초 (24fps 기준)
    num_inference_steps=40,
    decode_timestep=0.05,
    decode_noise_scale=0.025,
    guidance_scale=5.0,
).frames[0]

export_to_video(video, "output_t2v.mp4", fps=24)
print("영상 저장 완료: output_t2v.mp4")

권장 해상도 및 프레임 조합

용도	해상도	프레임 수	재생 시간(24fps)
빠른 테스트	512×384	25	약 1초
표준	768×512	97	약 4초
고품질	768×512	161	약 6.7초

I2V(Image-to-Video) 사용법

정지 이미지를 입력으로 받아 움직이는 영상을 생성한다.

import torch
from diffusers import LTXConditionPipeline
from diffusers.pipelines.ltx.pipeline_ltx_condition import LTXVideoCondition
from diffusers.utils import export_to_video, load_image

# LTXConditionPipeline 사용 (I2V 지원)
pipeline = LTXConditionPipeline.from_pretrained(
    "Lightricks/LTX-Video",
    torch_dtype=torch.bfloat16
)
pipeline.enable_attention_slicing()
pipeline.vae.enable_tiling()
pipeline.to("mps")

# 입력 이미지 로드
image = load_image("input_image.jpg")

# 이미지-영상 조건 설정
condition = LTXVideoCondition(
    image=image,
    frame_index=0  # 첫 프레임을 이미지로 고정
)

prompt = """
The scene gently animates: leaves rustle in a soft breeze,
light dances across the surface, and the atmosphere breathes with subtle life.
Smooth, cinematic motion. High quality.
"""
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted, static"

video = pipeline(
    conditions=[condition],
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=768,
    height=512,
    num_frames=97,
    num_inference_steps=40,
    decode_timestep=0.05,
    decode_noise_scale=0.025,
    image_cond_noise_scale=0.0,
    guidance_scale=5.0,
).frames[0]

export_to_video(video, "output_i2v.mp4", fps=24)
print("이미지-영상 변환 완료: output_i2v.mp4")

빠른 생성 최적화 팁

경량 2B 증류 모델 사용

Mac에서 가장 실용적인 옵션은 2B 증류(distilled) 모델이다. 추론 스텝을 4~8단계로 줄여 생성 시간을 크게 단축할 수 있다.

from diffusers import LTXPipeline

# 2B 경량 모델 (Mac 환경에 최적)
pipeline = LTXPipeline.from_pretrained(
    "Lightricks/LTX-Video",  # 또는 최신 2B distilled 체크포인트
    torch_dtype=torch.bfloat16
)
pipeline.to("mps")

video = pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=512,
    height=384,
    num_frames=49,
    num_inference_steps=8,       # 증류 모델은 4~10 스텝으로 충분
    guidance_scale=1.0,          # 증류 모델은 반드시 1.0 사용
    decode_timestep=0.05,
    decode_noise_scale=0.025,
).frames[0]

해상도를 낮춰 테스트 먼저

전체 품질 영상 생성 전 저해상도로 구도와 움직임을 먼저 확인한다.

# 빠른 프리뷰 (512×384, 25프레임)
preview = pipeline(
    prompt=prompt,
    width=512, height=384,
    num_frames=25,
    num_inference_steps=5,
    guidance_scale=1.0,
).frames[0]
export_to_video(preview, "preview.mp4", fps=24)

`torch.compile` 활용 (반복 생성 시)

같은 설정으로 여러 영상을 생성할 경우, 첫 번째 컴파일 이후 속도가 향상된다.

pipeline.transformer = torch.compile(
    pipeline.transformer,
    mode="reduce-overhead",  # Mac에서는 max-autotune보다 안정적
    fullgraph=True
)

자주 발생하는 오류 해결

오류: `MPS backend out of memory`

메모리 부족 오류. 해상도, 프레임 수, 또는 배치 크기를 줄인다.

# 해결법 1: 어텐션 슬라이싱 활성화
pipeline.enable_attention_slicing(1)  # 가장 보수적인 설정

# 해결법 2: VAE 타일링 활성화
pipeline.vae.enable_tiling()

# 해결법 3: 해상도 및 프레임 줄이기
# width=512, height=384, num_frames=25 으로 시작

# 해결법 4: 캐시 초기화 후 재시도
import gc
gc.collect()
torch.mps.empty_cache()

오류: `RuntimeError: Expected all tensors to be on the same device`

일부 텐서가 CPU에, 일부가 MPS에 있을 때 발생한다.

# 파이프라인 전체를 MPS로 이동
pipeline = pipeline.to("mps")

# 입력 텐서도 동일 디바이스로 이동
image_tensor = image_tensor.to("mps")

오류: `num_frames must be of the form 8k + 1`

프레임 수가 유효하지 않을 때 발생한다. 25, 33, 41, 49, 57, 65, 73, 81, 89, 97, 121, 161 등의 값을 사용한다.

# 올바른 프레임 수 계산
def get_valid_num_frames(target_seconds, fps=24):
    total = target_seconds * fps
    k = (total - 1) // 8
    return int(8 * k + 1)

print(get_valid_num_frames(4))   # 97
print(get_valid_num_frames(6))   # 145

오류: `bfloat16 not supported on MPS`

일부 구버전 PyTorch에서 MPS가 bfloat16을 지원하지 않을 때 발생한다.

# 해결법: float16으로 대체
pipeline = LTXPipeline.from_pretrained(
    "Lightricks/LTX-Video",
    torch_dtype=torch.float16  # bfloat16 대신 float16 사용
)

생성 속도가 매우 느릴 때

시스템 메모리가 부족해 스왑이 발생하는 경우. 활성 앱을 최소화하고, 해상도와 프레임 수를 줄인다.

# 현재 메모리 압박 상태 확인 (터미널)
memory_pressure

참고 링크

공식 GitHub: Lightricks/LTX-Video
HuggingFace 모델 페이지: Lightricks/LTX-Video
diffusers LTX-Video 문서: HuggingFace Diffusers — LTX-Video
MPS 최적화 가이드: HuggingFace Diffusers — MPS
메모리 절약 기법: HuggingFace Diffusers — Memory Optimization
소개 페이지: LTX-Video 소개

개요