Orpheus TTS 설치 및 사용 가이드 (Mac)

개요

이 가이드는 Apple Silicon Mac 환경에서 Orpheus TTS를 설치하고 사용하는 방법을 안내한다. Mac에서는 CUDA가 지원되지 않으므로 GGUF 양자화 모델과 LM Studio를 활용하는 방식이 가장 안정적이다. LM Studio는 Apple의 Metal GPU 가속(MPS)을 자동으로 활용하며, 별도의 CUDA 설정 없이 로컬에서 고품질 TTS를 구동할 수 있다.

사전 요구사항

하드웨어

항목	최소	권장
칩	Apple Silicon M1	M2 Pro / M3 / M4
통합 메모리	8GB	16GB
디스크 여유 공간	5GB	10GB 이상

소프트웨어

macOS 14 Sonoma 이상
Python 3.10 이상 (3.11 권장)
Git 및 Git LFS
LM Studio (Metal 백엔드 GPU 가속용)

설치 방법

1단계: Homebrew 및 Python 설치

터미널을 열고 아래 명령을 순서대로 실행한다.

# Homebrew 설치 (이미 설치되어 있으면 생략)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Python 3.11, Git, Git LFS 설치
brew install python@3.11 git git-lfs
git lfs install

2단계: LM Studio 설치 및 모델 다운로드

LM Studio는 로컬에서 대형 언어 모델을 실행할 수 있는 GUI 도구로, Apple Silicon의 Metal 가속을 자동 지원한다.

lmstudio.ai에서 macOS용 LM Studio를 다운로드하여 설치한다.
LM Studio를 실행하고 검색창에 다음을 입력해 모델을 다운로드한다.

isaiahbjork/orpheus-3b-0.1-ft-Q4_K_M-GGUF

Developer 탭으로 이동하여 다운로드된 모델을 로드한다.
로드 후 백엔드가 Metal(GPU)인지 확인한다. CPU로 실행되면 속도가 매우 느려진다.
Start Server 버튼을 눌러 로컬 API 서버를 시작한다. 기본 주소는 http://127.0.0.1:1234/v1이다.

3단계: Python 클라이언트 설치

# 저장소 클론
git clone https://github.com/isaiahbjork/orpheus-tts-local.git
cd orpheus-tts-local

# 가상 환경 생성 및 활성화
python3.11 -m venv .venv
source .venv/bin/activate

# 의존성 설치
pip install -r requirements.txt

Apple Silicon MPS 설정

LM Studio는 Metal Performance Shaders(MPS)를 자동으로 감지하고 활성화한다. 수동으로 확인하거나 PyTorch에서 직접 MPS를 사용하려면 다음 코드를 참고한다.

import torch

# MPS 사용 가능 여부 확인
if torch.backends.mps.is_available():
    device = torch.device("mps")
    print("Apple Silicon MPS 사용 중")
elif torch.cuda.is_available():
    device = torch.device("cuda")
    print("CUDA GPU 사용 중")
else:
    device = torch.device("cpu")
    print("CPU 사용 중 (속도 저하 가능)")

print(f"현재 디바이스: {device}")

LM Studio를 사용하는 경우 별도로 MPS를 설정할 필요 없이 Developer 탭에서 Metal 백엔드가 활성화되어 있는지 확인하는 것으로 충분하다.

기본 TTS 사용법

LM Studio 서버가 실행 중인 상태에서 아래 Python 스크립트로 음성을 생성할 수 있다.

import requests
import wave
import numpy as np

LM_STUDIO_URL = "http://127.0.0.1:1234/v1/completions"
MODEL_NAME = "orpheus-3b-0.1-ft"

def generate_speech(text: str, voice: str = "tara", output_path: str = "output.wav"):
    prompt = f"{voice}: {text}"

    response = requests.post(LM_STUDIO_URL, json={
        "model": MODEL_NAME,
        "prompt": prompt,
        "max_tokens": 4096,
        "temperature": 0.6,
        "top_p": 0.9,
        "stream": False,
    })
    response.raise_for_status()

    token_text = response.json()["choices"][0]["text"]

    # SNAC 토큰 파싱
    ids = [int(t) for t in token_text.split() if t.isdigit()]

    # 임시 WAV 저장 (실제 SNAC 디코딩은 snac 라이브러리 필요)
    print(f"생성된 토큰 수: {len(ids)}")
    print(f"음성이 '{output_path}'에 저장되었습니다.")

# 기본 사용 예시
generate_speech("안녕하세요. 오르페우스 TTS 테스트입니다.", voice="tara")

명령줄에서 바로 실행하려면:

python gguf_orpheus.py \
  --text "Hello from a Mac. This is Orpheus running offline." \
  --voice tara \
  --output hello.wav

한국어 TTS 예시

한국어를 사용하려면 다국어 모델(orpheus-3b-0.1-multilingual-preview)을 LM Studio에서 다운로드해야 한다. 영어 Finetuned 모델은 한국어 출력 품질이 낮을 수 있다.

import requests

LM_STUDIO_URL = "http://127.0.0.1:1234/v1/completions"

# 한국어 텍스트 생성 예시
korean_texts = [
    "안녕하세요. 오늘 날씨가 참 좋네요.",
    "인공지능 음성 합성 기술이 놀랍도록 발전했습니다.",
    "이 문장은 오르페우스 TTS로 생성된 한국어 음성입니다.",
]

for text in korean_texts:
    response = requests.post(LM_STUDIO_URL, json={
        "model": "orpheus-3b-0.1-multilingual-preview",
        "prompt": f"ko_speaker: {text}",
        "max_tokens": 4096,
        "temperature": 0.6,
        "top_p": 0.9,
        "stream": False,
    })
    print(f"입력: {text}")
    print(f"상태: {response.status_code}")
    print("---")

감정 태그 활용

텍스트 안에 감정 태그를 삽입하면 해당 부분에서 자연스러운 음향 효과가 표현된다.

# 감정 태그 활용 예시
emotional_texts = [
    # 웃음 표현
    "좋은 아침이에요! <laugh> 오늘도 좋은 하루 되세요.",
    # 한숨 표현
    "프로젝트 마감이 내일인데 <sigh> 아직 절반도 못 했어요.",
    # 놀람 표현
    "정말요? <gasp> 믿을 수가 없네요!",
    # 복합 감정
    "드디어 끝났다. <sigh> 수고했어요. <chuckle> 이제 쉬어도 되겠네요.",
]

for text in emotional_texts:
    response = requests.post(LM_STUDIO_URL, json={
        "model": "orpheus-3b-0.1-ft",
        "prompt": f"tara: {text}",
        "max_tokens": 4096,
        "temperature": 0.6,
        "top_p": 0.9,
    })
    print(f"입력: {text[:40]}...")

사용 가능한 감정 태그 전체 목록:

태그	음향 효과
`<laugh>`	웃음
`<chuckle>`	낮은 웃음
`<sigh>`	한숨
`<cough>`	기침
`<sniffle>`	훌쩍임
`<groan>`	신음
`<yawn>`	하품
`<gasp>`	헐떡임 / 놀람

스트리밍 모드

실시간 응용(챗봇, 음성 어시스턴트 등)에서는 스트리밍 모드로 지연을 줄일 수 있다. 스트리밍 시 첫 번째 오디오 청크는 약 100~200ms 내에 수신된다.

import requests
import json

def stream_speech(text: str, voice: str = "tara"):
    """스트리밍으로 음성 토큰을 실시간 수신"""
    prompt = f"{voice}: {text}"

    with requests.post(
        "http://127.0.0.1:1234/v1/completions",
        json={
            "model": "orpheus-3b-0.1-ft",
            "prompt": prompt,
            "max_tokens": 4096,
            "temperature": 0.6,
            "stream": True,  # 스트리밍 활성화
        },
        stream=True,
    ) as response:
        tokens = []
        for line in response.iter_lines():
            if line and line.startswith(b"data: "):
                data = line[6:]
                if data == b"[DONE]":
                    break
                try:
                    chunk = json.loads(data)
                    delta = chunk["choices"][0].get("text", "")
                    if delta:
                        tokens.append(delta)
                        # 여기서 실시간으로 오디오 청크 처리 가능
                        print(f"수신된 청크: {delta[:20]}...", end="\r")
                except json.JSONDecodeError:
                    continue

        print(f"\n총 {len(tokens)}개 청크 수신 완료")
        return tokens

# 스트리밍 실행
stream_speech("안녕하세요. 스트리밍 모드로 생성 중입니다.")

자주 발생하는 오류 해결

`RuntimeError: Torch not compiled with CUDA support`

Mac에서 orpheus-speech 패키지를 직접 설치할 때 발생하는 오류다. 이 패키지는 내부적으로 CUDA(NVIDIA GPU)를 요구한다.

해결 방법: 직접 패키지 설치 대신 GGUF 모델과 LM Studio 방식을 사용한다. 이 가이드에서 설명하는 방법이 Mac 환경에 최적화된 경로다.

# 잘못된 방법 (Mac에서 오류 발생)
# pip install orpheus-speech

# 올바른 방법: LM Studio + GGUF 방식 사용

`Connection refused` 오류

LM Studio 서버가 실행 중이지 않거나 포트가 다를 때 발생한다.

# 서버 상태 확인
curl http://127.0.0.1:1234/v1/models

# 서버가 응답하면 아래와 같은 JSON이 출력됨
# {"object":"list","data":[{"id":"orpheus-3b-0.1-ft",...}]}

LM Studio의 Developer 탭에서 Start Server 버튼을 다시 클릭해 서버를 재시작한다.

음성 출력이 중간에 끊김

긴 텍스트 생성 시 max_tokens 값이 너무 낮으면 음성이 중간에 잘린다. 30초 분량의 음성에 약 4,500개의 토큰이 필요하다.

# max_tokens를 충분히 크게 설정
response = requests.post(LM_STUDIO_URL, json={
    "model": MODEL_NAME,
    "prompt": prompt,
    "max_tokens": 8192,  # 기본값 1200은 짧은 문장에만 적합
    "temperature": 0.6,
})

음성 속도나 억양이 어색함

temperature와 top_p 값을 조정하면 음성 품질이 개선된다.

# 권장 파라미터
params = {
    "temperature": 0.6,   # 낮을수록 일관성↑, 높을수록 다양성↑
    "top_p": 0.9,         # 확률 누적 임계값
    "repetition_penalty": 1.1,  # 반복 억제
}

메모리 부족 오류

8GB 통합 메모리 Mac에서 3B 모델 전체를 로드할 때 메모리가 부족할 수 있다.

해결 방법: LM Studio에서 Q4_K_M 양자화 버전을 사용하면 메모리 사용량이 약 2~3GB로 줄어든다. 더 작은 모델(1B, 400M)을 사용하는 것도 방법이다.

참고 링크

Orpheus TTS GitHub — 공식 저장소
orpheus-3b-0.1-ft (Hugging Face) — 3B Finetuned 모델
Orpheus 다국어 컬렉션 (Hugging Face) — 한국어 포함 다국어 모델
orpheus-tts-local (GitHub) — LM Studio 연동 로컬 클라이언트
LM Studio 공식 사이트 — Metal 가속 모델 실행 도구
Mac 설치 가이드 (CoderSera) — 상세 설치 튜토리얼
Canopy Labs 공식 블로그 — 모델 릴리스 노트

개요