144 KiB

Raw Blame History

Celery 기반 태스크 큐 아키텍처 설계서

O2O Castad Backend - 가사/노래/비디오 생성 파이프라인 Celery 전환 계획

1. 개요

1.1 목적

현재 FastAPI BackgroundTasks 기반의 동기식 파이프라인을 Celery 분산 태스크 큐로 전환하여:

독립적인 워커: 각 단계(가사/노래/비디오)가 자신의 큐에서만 작업을 처리
수평 확장: 워커 수를 독립적으로 조절 가능
장애 격리: 한 단계의 실패가 다른 단계에 영향을 주지 않음
상태 추적: Celery Result Backend를 통한 실시간 상태 관리

1.2 핵심 설계 원칙

┌─────────────────────────────────────────────────────────────────┐
│                     핵심 설계 원칙                                │
├─────────────────────────────────────────────────────────────────┤
│ 1. 단일 책임: 각 워커는 자신의 큐 작업만 처리                      │
│ 2. 느슨한 결합: task_id만으로 다음 단계 연결                       │
│ 3. 멱등성: 동일 요청의 재처리가 안전                              │
│ 4. 실패 복구: 자동 재시도 + 수동 재처리 지원                       │
└─────────────────────────────────────────────────────────────────┘

2. 현재 아키텍처 분석

2.1 현재 파이프라인 흐름

sequenceDiagram
    participant Client as 클라이언트
    participant API as FastAPI
    participant BG as BackgroundTasks
    participant DB as MySQL
    participant External as 외부 API

    Client->>API: POST /lyric/generate
    API->>DB: Lyric 레코드 생성 (status=processing)
    API->>BG: generate_lyric_background() 스케줄
    API-->>Client: task_id 반환

    loop 폴링
        Client->>API: GET /lyric/status/{task_id}
        API->>DB: 상태 조회
        API-->>Client: status 반환
    end

    BG->>External: ChatGPT API 호출
    External-->>BG: 가사 결과
    BG->>DB: Lyric 업데이트 (status=completed)

    Note over Client,External: 노래/비디오도 동일한 폴링 패턴

2.2 현재 구조의 한계

문제점	설명
확장성 제한	BackgroundTasks는 단일 프로세스 내 실행, 수평 확장 불가
장애 전파	API 서버 재시작 시 진행 중인 작업 손실
리소스 경쟁	API 요청과 백그라운드 작업이 동일 리소스 공유
모니터링 부재	작업 상태 추적을 위한 별도 폴링 필요
재시도 로직	수동으로 구현해야 함

3. Celery 아키텍처 설계

3.1 전체 아키텍처

┌─────────────────────────────────────────────────────────────────────────────┐
│                           Celery 기반 아키텍처                               │
└─────────────────────────────────────────────────────────────────────────────┘

                                    ┌─────────────┐
                                    │   Client    │
                                    └──────┬──────┘
                                           │
                                           ▼
                              ┌────────────────────────┐
                              │       FastAPI          │
                              │   (Producer 역할)       │
                              └───────────┬────────────┘
                                          │
                    ┌─────────────────────┼─────────────────────┐
                    │                     │                     │
                    ▼                     ▼                     ▼
           ┌───────────────┐     ┌───────────────┐     ┌───────────────┐
           │  lyric_queue  │     │  song_queue   │     │  video_queue  │
           │    (Redis)    │     │    (Redis)    │     │    (Redis)    │
           └───────┬───────┘     └───────┬───────┘     └───────┬───────┘
                   │                     │                     │
                   ▼                     ▼                     ▼
           ┌───────────────┐     ┌───────────────┐     ┌───────────────┐
           │ Lyric Worker  │     │  Song Worker  │     │ Video Worker  │
           │  (Consumer)   │     │  (Consumer)   │     │  (Consumer)   │
           └───────┬───────┘     └───────┬───────┘     └───────┬───────┘
                   │                     │                     │
                   │                     │                     │
                   ▼                     ▼                     ▼
           ┌─────────────────────────────────────────────────────────────┐
           │                        MySQL                                │
           │              (공유 데이터베이스)                              │
           └─────────────────────────────────────────────────────────────┘
                                          │
                                          ▼
                              ┌────────────────────────┐
                              │    Redis (Result)      │
                              │   Celery 상태 저장      │
                              └────────────────────────┘

3.2 큐 설계

# 3개의 독립 큐 정의
CELERY_QUEUES = {
    'lyric_queue': {
        'exchange': 'lyric',
        'routing_key': 'lyric.generate',
        'description': '가사 생성 전용 큐'
    },
    'song_queue': {
        'exchange': 'song',
        'routing_key': 'song.generate',
        'description': '노래 생성 전용 큐'
    },
    'video_queue': {
        'exchange': 'video',
        'routing_key': 'video.generate',
        'description': '비디오 생성 전용 큐'
    }
}

3.3 워커 구성

워커	큐	동시성	역할
`lyric-worker`	`lyric_queue`	4	ChatGPT 가사 생성
`song-worker`	`song_queue`	2	Suno API 노래 생성
`video-worker`	`video_queue`	2	Creatomate 비디오 렌더링

3.4 태스크 체이닝 전략

┌─────────────────────────────────────────────────────────────────────────────┐
│                         태스크 체이닝 전략                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   [방식 1] Celery Chain (X) - 사용하지 않음                                  │
│   ─────────────────────────────────────────                                 │
│   chain(lyric_task.s() | song_task.s() | video_task.s())                   │
│   → 문제: 강한 결합, 중간 단계 실패 시 전체 실패                              │
│                                                                             │
│   [방식 2] 독립 큐 + task_id 전달 (O) - 채택                                 │
│   ─────────────────────────────────────────                                 │
│   lyric_task 완료 → song_queue에 task_id 발행                               │
│   song_task 완료 → video_queue에 task_id 발행                               │
│   → 장점: 느슨한 결합, 각 단계 독립 재시도 가능                               │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

4. 데이터 흐름 상세

4.1 전체 파이프라인 시퀀스

sequenceDiagram
    participant C as Client
    participant API as FastAPI
    participant LQ as lyric_queue
    participant LW as Lyric Worker
    participant SQ as song_queue
    participant SW as Song Worker
    participant VQ as video_queue
    participant VW as Video Worker
    participant DB as MySQL
    participant RB as Redis (Result)

    %% Phase 1: 가사 생성 요청
    rect rgb(240, 248, 255)
        Note over C,RB: Phase 1: 가사 생성
        C->>API: POST /lyric/generate
        API->>DB: Lyric 레코드 생성 (pending)
        API->>LQ: lyric_task.delay(task_id)
        API-->>C: {"task_id": "xxx", "status": "pending"}

        LQ->>LW: 메시지 전달
        LW->>DB: status = "processing"
        LW->>LW: ChatGPT API 호출
        LW->>DB: status = "completed", lyric_result 저장
        LW->>RB: 결과 상태 저장
        LW->>SQ: song_task.delay(task_id)
    end

    %% Phase 2: 노래 생성
    rect rgb(255, 248, 240)
        Note over C,RB: Phase 2: 노래 생성
        SQ->>SW: 메시지 전달
        SW->>DB: Lyric 조회, Song 레코드 생성
        SW->>DB: status = "processing"
        SW->>SW: Suno API 호출 + 폴링
        SW->>DB: status = "completed", song_result_url 저장
        SW->>RB: 결과 상태 저장
        SW->>VQ: video_task.delay(task_id)
    end

    %% Phase 3: 비디오 생성
    rect rgb(248, 255, 240)
        Note over C,RB: Phase 3: 비디오 생성
        VQ->>VW: 메시지 전달
        VW->>DB: Song, Lyric, Image 조회
        VW->>DB: Video 레코드 생성, status = "processing"
        VW->>VW: Creatomate API 호출 + 폴링
        VW->>DB: status = "completed", result_movie_url 저장
        VW->>RB: 최종 결과 저장
    end

    %% 상태 조회
    rect rgb(248, 248, 248)
        Note over C,RB: 상태 조회 (언제든 가능)
        C->>API: GET /pipeline/status/{task_id}
        API->>DB: Lyric, Song, Video 상태 조회
        API->>RB: Celery 태스크 상태 조회
        API-->>C: 통합 상태 응답
    end

4.2 각 단계별 데이터 흐름

4.2.1 가사 생성 (Lyric Task)

┌─────────────────────────────────────────────────────────────────────────────┐
│                          가사 생성 데이터 흐름                                │
└─────────────────────────────────────────────────────────────────────────────┘

입력 데이터 (API → lyric_queue)
─────────────────────────────────
{
    "task_id": "0192abc-...",           # UUID7 (프로젝트 고유 식별자)
    "customer_name": "스테이 머뭄",
    "region": "군산",
    "detail_region_info": "군산 신흥동 카페거리",
    "language": "Korean"
}

처리 과정 (Lyric Worker)
─────────────────────────────────
1. DB에서 Project 조회/생성
2. Lyric 레코드 생성 (status=processing)
3. ChatGPT Prompt 구성
4. ChatGPT API 호출
5. 결과 파싱 및 검증
6. DB 업데이트 (status=completed, lyric_result)
7. song_queue에 task_id 발행

출력 데이터 (lyric_queue → song_queue)
─────────────────────────────────
{
    "task_id": "0192abc-...",           # 동일한 task_id 전달
    "trigger": "lyric_completed"
}

4.2.2 노래 생성 (Song Task)

┌─────────────────────────────────────────────────────────────────────────────┐
│                          노래 생성 데이터 흐름                                │
└─────────────────────────────────────────────────────────────────────────────┘

입력 데이터 (song_queue에서 수신)
─────────────────────────────────
{
    "task_id": "0192abc-...",
    "trigger": "lyric_completed"
}

처리 과정 (Song Worker)
─────────────────────────────────
1. DB에서 Lyric 조회 (task_id로)
2. lyric_result에서 가사 추출
3. Song 레코드 생성 (status=processing)
4. Suno API 호출 (음악 생성 요청)
5. Suno 상태 폴링 (SUCCESS까지)
6. 오디오 다운로드 + Azure Blob 업로드
7. SongTimestamp 저장 (가사 타이밍)
8. DB 업데이트 (status=completed)
9. video_queue에 task_id 발행

출력 데이터 (song_queue → video_queue)
─────────────────────────────────
{
    "task_id": "0192abc-...",
    "trigger": "song_completed"
}

4.2.3 비디오 생성 (Video Task)

┌─────────────────────────────────────────────────────────────────────────────┐
│                          비디오 생성 데이터 흐름                              │
└─────────────────────────────────────────────────────────────────────────────┘

입력 데이터 (video_queue에서 수신)
─────────────────────────────────
{
    "task_id": "0192abc-...",
    "trigger": "song_completed",
    "orientation": "vertical"           # 선택적 파라미터
}

처리 과정 (Video Worker)
─────────────────────────────────
1. DB에서 Project, Lyric, Song, Image 조회
2. Video 레코드 생성 (status=processing)
3. Creatomate 템플릿 조회
4. 템플릿 수정 (이미지, 음악, 가사 타이밍)
5. Creatomate 렌더링 요청
6. 렌더링 상태 폴링 (succeeded까지)
7. 비디오 다운로드 + Azure Blob 업로드
8. DB 업데이트 (status=completed)
9. 파이프라인 완료 (다음 큐 없음)

최종 출력
─────────────────────────────────
Video.result_movie_url = "https://blob.azure.../video.mp4"

4.3 상태 전이 다이어그램

stateDiagram-v2
    [*] --> pending: API 요청 수신

    state "Lyric Phase" as LP {
        pending --> lyric_processing: lyric_queue 처리 시작
        lyric_processing --> lyric_completed: ChatGPT 성공
        lyric_processing --> lyric_failed: ChatGPT 실패
        lyric_failed --> lyric_processing: 재시도
    }

    state "Song Phase" as SP {
        lyric_completed --> song_processing: song_queue 처리 시작
        song_processing --> song_uploading: Suno 성공, 업로드 중
        song_uploading --> song_completed: Azure 업로드 완료
        song_processing --> song_failed: Suno 실패
        song_failed --> song_processing: 재시도
    }

    state "Video Phase" as VP {
        song_completed --> video_processing: video_queue 처리 시작
        video_processing --> video_rendering: Creatomate 렌더링 중
        video_rendering --> video_completed: 렌더링 + 업로드 완료
        video_processing --> video_failed: 렌더링 실패
        video_failed --> video_processing: 재시도
    }

    video_completed --> [*]: 파이프라인 완료

5. 큐 및 태스크 동작 상세

5.1 Redis 큐 구조

┌─────────────────────────────────────────────────────────────────────────────┐
│                           Redis 키 구조                                      │
└─────────────────────────────────────────────────────────────────────────────┘

# Celery 브로커 큐 (List 타입)
─────────────────────────────────
celery:lyric_queue              # 가사 생성 대기 태스크
celery:song_queue               # 노래 생성 대기 태스크
celery:video_queue              # 비디오 생성 대기 태스크

# Celery Result Backend (String 타입)
─────────────────────────────────
celery-task-meta-{celery_task_id}   # 개별 태스크 결과

# 커스텀 상태 추적 (Hash 타입)
─────────────────────────────────
pipeline:{task_id}:status       # 파이프라인 전체 상태
pipeline:{task_id}:lyric        # 가사 단계 상세 정보
pipeline:{task_id}:song         # 노래 단계 상세 정보
pipeline:{task_id}:video        # 비디오 단계 상세 정보

5.2 메시지 형식

# Celery 메시지 구조 (JSON 직렬화)
{
    "id": "celery-task-uuid",           # Celery 태스크 ID
    "task": "app.tasks.lyric.generate_lyric",  # 태스크 함수 경로
    "args": [],                          # 위치 인자
    "kwargs": {                          # 키워드 인자
        "task_id": "0192abc-...",
        "customer_name": "스테이 머뭄",
        "region": "군산",
        "detail_region_info": "...",
        "language": "Korean"
    },
    "retries": 0,                        # 현재 재시도 횟수
    "eta": null,                         # 예약 실행 시간 (없으면 즉시)
    "expires": null                      # 만료 시간
}

5.3 태스크별 동작 상세

5.3.1 Lyric Task 동작

┌─────────────────────────────────────────────────────────────────────────────┐
│                    Lyric Task 실행 흐름                                      │
└─────────────────────────────────────────────────────────────────────────────┘

시간 →
─────────────────────────────────────────────────────────────────────────────►

[T+0ms]  lyric_queue에서 메시지 BRPOP
         │
         ▼
[T+5ms]  태스크 시작, Celery 상태 = STARTED
         │
         ▼
[T+10ms] DB 세션 획득
         │
         ├─── Project 조회 또는 생성
         │
         ├─── Lyric 레코드 생성
         │    - task_id = "0192abc-..."
         │    - status = "processing"
         │    - lyric_prompt = (프롬프트 저장)
         │
         ▼
[T+50ms] DB 세션 해제 (외부 API 호출 전)
         │
         ▼
[T+100ms ~ T+5000ms]  ChatGPT API 호출
         │
         ├─── 성공 시 ───────────────────────────────┐
         │                                           │
         ▼                                           ▼
[T+5100ms] DB 세션 재획득               [실패 시 예외 발생]
         │                                           │
         ├─── Lyric.status = "completed"             ├─── retry() 또는
         │    Lyric.lyric_result = "..."             │    상태 = FAILURE
         │                                           │
         ▼                                           ▼
[T+5150ms] DB 커밋 및 세션 해제          [재시도 로직 실행]
         │
         ▼
[T+5200ms] song_queue에 다음 태스크 발행
         │
         └─── song_task.apply_async(
                  kwargs={"task_id": "0192abc-..."},
                  queue="song_queue"
              )
         │
         ▼
[T+5250ms] Celery 상태 = SUCCESS
         │
         └─── 태스크 완료

5.3.2 Song Task 동작

┌─────────────────────────────────────────────────────────────────────────────┐
│                    Song Task 실행 흐름                                       │
└─────────────────────────────────────────────────────────────────────────────┘

[수신]   song_queue에서 메시지 수신
         │
         ▼
[검증]   Lyric 상태 확인
         │
         ├─── lyric.status != "completed" → 예외 발생, 재시도 대기
         │
         ▼
[생성]   Song 레코드 생성
         │
         ├─── status = "processing"
         │    song_prompt = lyric.lyric_result + genre
         │
         ▼
[API]    Suno API 호출 (음악 생성)
         │
         ├─── suno_task_id 수신
         │
         ▼
[폴링]   Suno 상태 폴링 (최대 5분, 10초 간격)
         │
         ├─── PENDING → 대기
         ├─── processing → 대기
         ├─── SUCCESS → 다음 단계
         └─── failed → 예외 발생
         │
         ▼
[업로드] 오디오 다운로드 + Azure Blob 업로드
         │
         ├─── song_result_url 획득
         │
         ▼
[타임스탬프] SongTimestamp 저장
         │
         ├─── Suno API에서 가사 타이밍 정보 수신
         │    각 가사 라인의 start_time, end_time 저장
         │
         ▼
[완료]   Song.status = "completed"
         │
         ▼
[전달]   video_queue에 task_id 발행

5.3.3 Video Task 동작

┌─────────────────────────────────────────────────────────────────────────────┐
│                    Video Task 실행 흐름                                      │
└─────────────────────────────────────────────────────────────────────────────┘

[수신]   video_queue에서 메시지 수신
         │
         ▼
[검증]   Song 상태 확인
         │
         ├─── song.status != "completed" → 예외 발생
         ├─── song.song_result_url 없음 → 예외 발생
         │
         ▼
[조회]   관련 데이터 조회
         │
         ├─── Project, Lyric, Song
         ├─── Image 리스트 (img_order 순)
         ├─── SongTimestamp 리스트
         │
         ▼
[템플릿] Creatomate 템플릿 처리
         │
         ├─── 템플릿 조회 (vertical/horizontal)
         ├─── 이미지 매핑
         ├─── 음악 URL 설정
         ├─── 가사 + 타이밍 설정
         ├─── duration 조정
         │
         ▼
[렌더링] Creatomate API 호출
         │
         ├─── creatomate_render_id 수신
         │
         ▼
[폴링]   렌더링 상태 폴링 (최대 10분)
         │
         ├─── planned → 대기
         ├─── rendering → 대기
         ├─── succeeded → 다음 단계
         └─── failed → 예외 발생
         │
         ▼
[업로드] 비디오 다운로드 + Azure Blob 업로드
         │
         ▼
[완료]   Video.status = "completed"
         result_movie_url 저장
         │
         └─── 파이프라인 완료 (다음 큐 없음)

5.4 워커 격리 원칙

┌─────────────────────────────────────────────────────────────────────────────┐
│                         워커 격리 원칙                                       │
└─────────────────────────────────────────────────────────────────────────────┘

                    ┌──────────────────┐
                    │   lyric_queue    │
                    └────────┬─────────┘
                             │
            ┌────────────────┼────────────────┐
            │                │                │
            ▼                ▼                ▼
    ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
    │Lyric Worker 1│ │Lyric Worker 2│ │Lyric Worker 3│
    └──────────────┘ └──────────────┘ └──────────────┘
            │
            │  ※ Lyric Worker는 절대로 song_queue나 video_queue를 처리하지 않음
            │  ※ 각 워커는 자신이 구독한 큐의 메시지만 처리
            │
            ▼
    ┌─────────────────────────────────────────────────────────────────────┐
    │                        격리 보장 방법                                │
    ├─────────────────────────────────────────────────────────────────────┤
    │ 1. 워커 실행 시 -Q 옵션으로 구독 큐 명시                              │
    │    celery -A app.celery_app worker -Q lyric_queue                   │
    │                                                                     │
    │ 2. 태스크 정의 시 @app.task(queue='lyric_queue') 데코레이터          │
    │                                                                     │
    │ 3. 태스크 호출 시 .apply_async(queue='lyric_queue') 명시             │
    └─────────────────────────────────────────────────────────────────────┘

6. 코드 구현

6.1 프로젝트 구조

app/
├── celery_app.py              # Celery 앱 인스턴스 및 설정
├── celery_config.py           # Celery 상세 설정
├── tasks/
│   ├── __init__.py
│   ├── base.py                # 공통 태스크 베이스 클래스
│   ├── lyric_tasks.py         # 가사 생성 태스크
│   ├── song_tasks.py          # 노래 생성 태스크
│   └── video_tasks.py         # 비디오 생성 태스크
├── workers/
│   ├── __init__.py
│   └── utils.py               # 워커 유틸리티
└── api/
    └── routers/
        └── v1/
            └── pipeline.py    # 통합 파이프라인 API

6.2 Celery 앱 설정

# app/celery_app.py
"""
Celery 애플리케이션 인스턴스 생성 및 설정

이 파일은 Celery 워커가 시작될 때 로드되며,
브로커(Redis), Result Backend, 태스크 설정을 정의합니다.
"""

from celery import Celery
from kombu import Queue, Exchange
import os

# ============================================================================
# Celery 앱 인스턴스 생성
# ============================================================================
# 'app.tasks'는 태스크 모듈의 기본 경로입니다.
# 워커는 이 경로를 기준으로 태스크를 검색합니다.
celery_app = Celery(
    'o2o_castad',
    broker=os.getenv('CELERY_BROKER_URL', 'redis://localhost:6379/0'),
    backend=os.getenv('CELERY_RESULT_BACKEND', 'redis://localhost:6379/1'),
    include=[
        'app.tasks.lyric_tasks',   # 가사 태스크 모듈
        'app.tasks.song_tasks',    # 노래 태스크 모듈
        'app.tasks.video_tasks',   # 비디오 태스크 모듈
    ]
)

# ============================================================================
# 큐 정의 - 각 태스크 유형별 독립 큐
# ============================================================================
# Exchange: 메시지 라우팅 규칙을 정의 (direct = 정확한 routing_key 매칭)
# Queue: 실제 메시지가 저장되는 버퍼
# routing_key: 메시지를 특정 큐로 라우팅하는 키

lyric_exchange = Exchange('lyric', type='direct')
song_exchange = Exchange('song', type='direct')
video_exchange = Exchange('video', type='direct')

celery_app.conf.task_queues = (
    # 가사 생성 큐: lyric.* 라우팅 키를 가진 메시지만 수신
    Queue(
        'lyric_queue',
        lyric_exchange,
        routing_key='lyric.generate',
        queue_arguments={'x-max-priority': 10}  # 우선순위 큐 지원
    ),
    # 노래 생성 큐
    Queue(
        'song_queue',
        song_exchange,
        routing_key='song.generate',
        queue_arguments={'x-max-priority': 10}
    ),
    # 비디오 생성 큐
    Queue(
        'video_queue',
        video_exchange,
        routing_key='video.generate',
        queue_arguments={'x-max-priority': 10}
    ),
)

# ============================================================================
# 태스크 라우팅 - 태스크 이름 → 큐 매핑
# ============================================================================
# 각 태스크가 어떤 큐로 발행될지 자동으로 결정합니다.
# 이 설정이 있으면 .delay() 호출 시 자동으로 올바른 큐로 전달됩니다.

celery_app.conf.task_routes = {
    # 패턴 매칭: app.tasks.lyric_tasks.* → lyric_queue
    'app.tasks.lyric_tasks.*': {
        'queue': 'lyric_queue',
        'routing_key': 'lyric.generate',
    },
    'app.tasks.song_tasks.*': {
        'queue': 'song_queue',
        'routing_key': 'song.generate',
    },
    'app.tasks.video_tasks.*': {
        'queue': 'video_queue',
        'routing_key': 'video.generate',
    },
}

# ============================================================================
# Celery 상세 설정
# ============================================================================
celery_app.conf.update(
    # ------------------------------------
    # 직렬화 설정
    # ------------------------------------
    task_serializer='json',          # 태스크 인자 직렬화 형식
    accept_content=['json'],         # 허용하는 직렬화 형식
    result_serializer='json',        # 결과 직렬화 형식

    # ------------------------------------
    # 타임존 설정
    # ------------------------------------
    timezone='Asia/Seoul',
    enable_utc=True,

    # ------------------------------------
    # 태스크 실행 설정
    # ------------------------------------
    task_acks_late=True,             # 태스크 완료 후 ACK (장애 복구용)
    task_reject_on_worker_lost=True, # 워커 손실 시 태스크 재큐
    worker_prefetch_multiplier=1,    # 한 번에 하나씩 가져오기 (공정한 분배)

    # ------------------------------------
    # 결과 백엔드 설정
    # ------------------------------------
    result_expires=86400,            # 결과 보존 기간 (24시간)
    result_extended=True,            # 확장 결과 정보 저장

    # ------------------------------------
    # 재시도 설정
    # ------------------------------------
    task_default_retry_delay=60,     # 기본 재시도 대기 시간 (60초)
    task_max_retries=3,              # 기본 최대 재시도 횟수

    # ------------------------------------
    # 워커 설정
    # ------------------------------------
    worker_concurrency=4,            # 기본 동시성 (워커별로 오버라이드 가능)
    worker_max_tasks_per_child=100,  # 메모리 누수 방지
)

6.3 베이스 태스크 클래스

# app/tasks/base.py
"""
공통 태스크 베이스 클래스

모든 태스크가 상속받는 베이스 클래스로,
공통 로직(DB 세션 관리, 에러 핸들링, 상태 업데이트)을 제공합니다.
"""

from celery import Task
from typing import Optional, Any
from sqlalchemy.ext.asyncio import AsyncSession
import asyncio
import logging
import redis

from app.database.session import BackgroundSessionLocal
from app.celery_app import celery_app

logger = logging.getLogger(__name__)

# Redis 클라이언트 (파이프라인 상태 추적용)
redis_client = redis.Redis.from_url(
    celery_app.conf.result_backend,
    decode_responses=True
)


class BaseTaskWithDB(Task):
    """
    데이터베이스 연동 태스크의 베이스 클래스

    특징:
    - 자동 DB 세션 관리 (with 문 사용)
    - 실패 시 자동 재시도 로직
    - 파이프라인 상태 추적
    - 비동기 함수 실행 지원

    사용 예시:
        @celery_app.task(base=BaseTaskWithDB, bind=True)
        def my_task(self, task_id: str):
            async def _run():
                async with self.get_db_session() as session:
                    # DB 작업
                    pass
            return self.run_async(_run())
    """

    # 추상 태스크로 설정 (직접 실행 불가)
    abstract = True

    # 재시도 설정 (자식 클래스에서 오버라이드 가능)
    autoretry_for = (Exception,)     # 모든 예외에 대해 재시도
    retry_backoff = True              # 지수 백오프 사용
    retry_backoff_max = 600           # 최대 10분 대기
    retry_jitter = True               # 재시도 시간에 랜덤 지터 추가
    max_retries = 3                   # 최대 3회 재시도

    def get_db_session(self) -> AsyncSession:
        """
        백그라운드 DB 세션 획득

        BackgroundSessionLocal을 사용하여 메인 API 트래픽과 격리된
        세션 풀에서 연결을 획득합니다.

        Returns:
            AsyncSession: 비동기 SQLAlchemy 세션
        """
        return BackgroundSessionLocal()

    def run_async(self, coro) -> Any:
        """
        비동기 코루틴을 동기적으로 실행

        Celery 태스크는 기본적으로 동기 함수이므로,
        비동기 DB 작업을 실행하려면 이벤트 루프가 필요합니다.

        Args:
            coro: 실행할 코루틴

        Returns:
            코루틴 실행 결과
        """
        loop = asyncio.new_event_loop()
        asyncio.set_event_loop(loop)
        try:
            return loop.run_until_complete(coro)
        finally:
            loop.close()

    def update_pipeline_status(
        self,
        task_id: str,
        stage: str,
        status: str,
        message: Optional[str] = None,
        extra_data: Optional[dict] = None
    ):
        """
        파이프라인 상태를 Redis에 업데이트

        Celery Result Backend와 별도로 커스텀 상태를 추적합니다.
        클라이언트는 이 정보를 통해 전체 파이프라인 진행 상황을 확인할 수 있습니다.

        Args:
            task_id: 프로젝트 task_id (Celery task ID 아님)
            stage: 현재 단계 ('lyric', 'song', 'video')
            status: 상태 ('pending', 'processing', 'completed', 'failed')
            message: 상태 메시지
            extra_data: 추가 데이터 (예: 결과 URL)

        Redis 키 구조:
            pipeline:{task_id}:status → 현재 단계
            pipeline:{task_id}:{stage} → 단계별 상세 정보
        """
        pipeline_key = f"pipeline:{task_id}:status"
        stage_key = f"pipeline:{task_id}:{stage}"

        # 전체 상태 업데이트
        redis_client.hset(pipeline_key, mapping={
            'current_stage': stage,
            'status': status,
            'updated_at': str(asyncio.get_event_loop().time()),
        })

        # 단계별 상세 정보
        stage_data = {
            'status': status,
            'message': message or '',
        }
        if extra_data:
            stage_data.update(extra_data)

        redis_client.hset(stage_key, mapping=stage_data)

        # 24시간 후 자동 만료
        redis_client.expire(pipeline_key, 86400)
        redis_client.expire(stage_key, 86400)

    def on_failure(self, exc, task_id, args, kwargs, einfo):
        """
        태스크 실패 시 호출되는 콜백

        최대 재시도 횟수 초과 또는 치명적 오류 시 호출됩니다.
        실패 상태를 DB와 Redis에 기록합니다.
        """
        project_task_id = kwargs.get('task_id')
        if project_task_id:
            self.update_pipeline_status(
                task_id=project_task_id,
                stage=self._get_stage_name(),
                status='failed',
                message=str(exc)
            )

        logger.error(
            f"Task {self.name} failed: {exc}",
            exc_info=einfo,
            extra={'task_id': project_task_id}
        )

    def on_retry(self, exc, task_id, args, kwargs, einfo):
        """
        태스크 재시도 시 호출되는 콜백

        재시도 시도를 로깅하고 상태를 업데이트합니다.
        """
        project_task_id = kwargs.get('task_id')
        retry_count = self.request.retries

        logger.warning(
            f"Task {self.name} retry #{retry_count}: {exc}",
            extra={'task_id': project_task_id}
        )

        if project_task_id:
            self.update_pipeline_status(
                task_id=project_task_id,
                stage=self._get_stage_name(),
                status='retrying',
                message=f"Retry #{retry_count}: {str(exc)}"
            )

    def _get_stage_name(self) -> str:
        """태스크 이름에서 단계명 추출"""
        if 'lyric' in self.name:
            return 'lyric'
        elif 'song' in self.name:
            return 'song'
        elif 'video' in self.name:
            return 'video'
        return 'unknown'

6.4 가사 생성 태스크

# app/tasks/lyric_tasks.py
"""
가사 생성 Celery 태스크

이 모듈은 lyric_queue를 구독하는 워커에서만 실행됩니다.
태스크 완료 후 song_queue로 다음 단계를 트리거합니다.
"""

from celery import states
from sqlalchemy import select
from sqlalchemy.orm import selectinload
import logging

from app.celery_app import celery_app
from app.tasks.base import BaseTaskWithDB
from app.home.models import Project
from app.lyric.models import Lyric
from app.utils.chatgpt_prompt import ChatgptService
from app.utils.prompts.prompts import Prompt

logger = logging.getLogger(__name__)


@celery_app.task(
    base=BaseTaskWithDB,
    bind=True,                          # self 인자 사용
    name='app.tasks.lyric_tasks.generate_lyric',
    queue='lyric_queue',                # 명시적 큐 지정
    max_retries=3,
    default_retry_delay=30,             # 30초 후 재시도
    acks_late=True,                     # 완료 후 ACK
    reject_on_worker_lost=True,         # 워커 손실 시 재큐
)
def generate_lyric(
    self,
    task_id: str,
    customer_name: str,
    region: str,
    detail_region_info: str,
    language: str = "Korean",
    auto_continue: bool = True,         # 완료 후 자동으로 노래 생성 진행
) -> dict:
    """
    가사 생성 태스크

    ChatGPT API를 호출하여 마케팅 가사를 생성합니다.
    완료 후 자동으로 song_queue에 다음 태스크를 발행합니다.

    Args:
        task_id: 프로젝트 고유 식별자 (UUID7)
        customer_name: 고객/매장 이름
        region: 지역
        detail_region_info: 상세 지역 정보
        language: 언어 (Korean, English, etc.)
        auto_continue: True면 완료 후 자동으로 노래 생성 시작

    Returns:
        dict: {
            'task_id': str,
            'status': 'completed' | 'failed',
            'lyric_result': str (성공 시),
            'error': str (실패 시)
        }

    Raises:
        Retry: 재시도 가능한 오류 발생 시

    독립성 보장:
        - 이 태스크는 오직 lyric_queue에서만 실행됩니다
        - song_queue, video_queue의 메시지는 처리하지 않습니다
        - 워커 실행: celery -A app.celery_app worker -Q lyric_queue
    """

    # ========================================================================
    # 1단계: 상태 업데이트 - 처리 시작
    # ========================================================================
    self.update_pipeline_status(
        task_id=task_id,
        stage='lyric',
        status='processing',
        message='가사 생성을 시작합니다.'
    )

    async def _generate():
        """비동기 가사 생성 로직"""

        # --------------------------------------------------------------------
        # 2단계: DB 세션 획득 및 데이터 준비
        # --------------------------------------------------------------------
        async with self.get_db_session() as session:
            # Project 조회 또는 생성
            project = await session.scalar(
                select(Project).where(Project.task_id == task_id)
            )

            if not project:
                project = Project(
                    task_id=task_id,
                    customer_name=customer_name,
                    region=region,
                )
                session.add(project)
                await session.flush()  # project.id 획득

            # Lyric 레코드 생성
            # 프롬프트 구성
            prompt = Prompt(
                customer_name=customer_name,
                region=region,
                detail_region_info=detail_region_info,
                language=language,
            )
            lyric_prompt = prompt.get_full_prompt()

            lyric = Lyric(
                project_id=project.id,
                task_id=task_id,
                status='processing',
                lyric_prompt=lyric_prompt,
                language=language,
            )
            session.add(lyric)
            await session.commit()
            lyric_id = lyric.id

        # --------------------------------------------------------------------
        # 3단계: 외부 API 호출 (DB 세션 외부에서)
        # --------------------------------------------------------------------
        # DB 연결을 해제한 상태에서 외부 API를 호출합니다.
        # 이는 커넥션 풀 고갈을 방지합니다.

        try:
            chatgpt = ChatgptService()
            lyric_result = await chatgpt.generate_lyric(lyric_prompt)

            if not lyric_result or len(lyric_result.strip()) < 50:
                raise ValueError("생성된 가사가 너무 짧습니다.")

        except Exception as e:
            # 실패 시 DB 상태 업데이트 후 재시도
            async with self.get_db_session() as session:
                lyric = await session.get(Lyric, lyric_id)
                lyric.status = 'failed'
                lyric.lyric_result = f"Error: {str(e)}"
                await session.commit()
            raise  # 재시도 트리거

        # --------------------------------------------------------------------
        # 4단계: 결과 저장
        # --------------------------------------------------------------------
        async with self.get_db_session() as session:
            lyric = await session.get(Lyric, lyric_id)
            lyric.status = 'completed'
            lyric.lyric_result = lyric_result
            await session.commit()

        return {
            'task_id': task_id,
            'lyric_id': lyric_id,
            'lyric_result': lyric_result,
        }

    # 비동기 함수 실행
    result = self.run_async(_generate())

    # ========================================================================
    # 5단계: 파이프라인 상태 업데이트 및 다음 단계 트리거
    # ========================================================================
    self.update_pipeline_status(
        task_id=task_id,
        stage='lyric',
        status='completed',
        message='가사 생성이 완료되었습니다.',
        extra_data={'lyric_result': result['lyric_result'][:100] + '...'}
    )

    # --------------------------------------------------------------------
    # 핵심: 다음 큐로 태스크 발행
    # --------------------------------------------------------------------
    # generate_lyric 태스크가 완료되면, song_queue에 새로운 태스크를 발행합니다.
    # 이때 Celery chain을 사용하지 않고, 명시적으로 다음 큐에 발행합니다.
    # 이렇게 하면 각 태스크가 완전히 독립적으로 동작합니다.

    if auto_continue:
        from app.tasks.song_tasks import generate_song

        # apply_async를 사용하여 명시적으로 큐 지정
        generate_song.apply_async(
            kwargs={
                'task_id': task_id,
                'genre': 'pop, ambient',  # 기본 장르
                'auto_continue': True,
            },
            queue='song_queue',           # 명시적 큐 지정
            routing_key='song.generate',
        )

        logger.info(f"[Lyric→Song] task_id={task_id} song_queue에 발행 완료")

    return {
        'task_id': task_id,
        'status': 'completed',
        'lyric_result': result['lyric_result'],
        'next_stage': 'song' if auto_continue else None,
    }

6.5 노래 생성 태스크

# app/tasks/song_tasks.py
"""
노래 생성 Celery 태스크

이 모듈은 song_queue를 구독하는 워커에서만 실행됩니다.
Suno API를 통해 음악을 생성하고 Azure Blob에 업로드합니다.
완료 후 video_queue로 다음 단계를 트리거합니다.
"""

from celery import states
from sqlalchemy import select, desc
from sqlalchemy.orm import selectinload
import aiohttp
import asyncio
import os
import logging

from app.celery_app import celery_app
from app.tasks.base import BaseTaskWithDB
from app.home.models import Project
from app.lyric.models import Lyric
from app.song.models import Song, SongTimestamp
from app.utils.suno import SunoService
from app.utils.upload_blob_as_request import AzureBlobUploader

logger = logging.getLogger(__name__)

# Suno API 폴링 설정
SUNO_POLL_INTERVAL = 10  # 초
SUNO_MAX_POLL_TIME = 300  # 5분


@celery_app.task(
    base=BaseTaskWithDB,
    bind=True,
    name='app.tasks.song_tasks.generate_song',
    queue='song_queue',                 # song_queue 전용
    max_retries=3,
    default_retry_delay=60,
    acks_late=True,
    reject_on_worker_lost=True,
    # Suno API는 시간이 오래 걸리므로 soft/hard 타임아웃 설정
    soft_time_limit=540,                # 9분 soft limit
    time_limit=600,                     # 10분 hard limit
)
def generate_song(
    self,
    task_id: str,
    genre: str = "pop, ambient",
    auto_continue: bool = True,
) -> dict:
    """
    노래 생성 태스크

    Suno API를 호출하여 가사로부터 음악을 생성합니다.
    생성된 음악은 Azure Blob Storage에 업로드됩니다.

    Args:
        task_id: 프로젝트 고유 식별자
        genre: 음악 장르
        auto_continue: True면 완료 후 자동으로 비디오 생성 시작

    Returns:
        dict: {
            'task_id': str,
            'status': str,
            'song_result_url': str (성공 시),
            'error': str (실패 시)
        }

    독립성 보장:
        - 이 태스크는 오직 song_queue에서만 실행됩니다
        - lyric_queue, video_queue의 메시지는 처리하지 않습니다
        - 워커 실행: celery -A app.celery_app worker -Q song_queue

    사전 조건:
        - Lyric 레코드가 존재하고 status='completed'여야 함
        - lyric_result가 비어있지 않아야 함
    """

    # ========================================================================
    # 1단계: 상태 업데이트 - 처리 시작
    # ========================================================================
    self.update_pipeline_status(
        task_id=task_id,
        stage='song',
        status='processing',
        message='노래 생성을 시작합니다.'
    )

    async def _generate():
        """비동기 노래 생성 로직"""

        # --------------------------------------------------------------------
        # 2단계: 사전 조건 확인 및 데이터 조회
        # --------------------------------------------------------------------
        async with self.get_db_session() as session:
            # Lyric 조회 (가장 최근 것)
            lyric = await session.scalar(
                select(Lyric)
                .where(Lyric.task_id == task_id)
                .order_by(desc(Lyric.created_at))
            )

            # 사전 조건 검증
            if not lyric:
                raise ValueError(f"Lyric not found for task_id={task_id}")

            if lyric.status != 'completed':
                # 가사 생성이 완료되지 않았으면 재시도
                raise self.retry(
                    exc=ValueError(f"Lyric not completed: {lyric.status}"),
                    countdown=30,  # 30초 후 재시도
                )

            if not lyric.lyric_result:
                raise ValueError("Lyric result is empty")

            # Project 조회
            project = await session.get(Project, lyric.project_id)

            # Song 레코드 생성
            song = Song(
                project_id=project.id,
                lyric_id=lyric.id,
                task_id=task_id,
                status='processing',
                song_prompt=f"{lyric.lyric_result}\n\nGenre: {genre}",
                language=lyric.language,
            )
            session.add(song)
            await session.commit()

            song_id = song.id
            lyrics_text = lyric.lyric_result

        # --------------------------------------------------------------------
        # 3단계: Suno API 호출 (DB 세션 외부)
        # --------------------------------------------------------------------
        suno = SunoService()

        try:
            # 음악 생성 요청
            suno_response = await suno.generate_music(
                prompt=lyrics_text,
                style=genre,
            )
            suno_task_id = suno_response.get('task_id')

            if not suno_task_id:
                raise ValueError("Suno API did not return task_id")

            # DB에 suno_task_id 저장
            async with self.get_db_session() as session:
                song = await session.get(Song, song_id)
                song.suno_task_id = suno_task_id
                await session.commit()

        except Exception as e:
            async with self.get_db_session() as session:
                song = await session.get(Song, song_id)
                song.status = 'failed'
                await session.commit()
            raise

        # --------------------------------------------------------------------
        # 4단계: Suno 상태 폴링
        # --------------------------------------------------------------------
        # Suno API는 비동기적으로 음악을 생성하므로,
        # 완료될 때까지 주기적으로 상태를 확인합니다.

        elapsed = 0
        audio_url = None
        duration = None
        suno_audio_id = None

        while elapsed < SUNO_MAX_POLL_TIME:
            await asyncio.sleep(SUNO_POLL_INTERVAL)
            elapsed += SUNO_POLL_INTERVAL

            # 파이프라인 상태 업데이트 (진행률 표시)
            self.update_pipeline_status(
                task_id=task_id,
                stage='song',
                status='processing',
                message=f'Suno 음악 생성 중... ({elapsed}초 경과)'
            )

            status_response = await suno.get_task_status(suno_task_id)
            status = status_response.get('status')

            logger.info(f"Suno polling: task_id={task_id}, status={status}")

            if status == 'SUCCESS':
                # 첫 번째 클립 정보 추출
                clips = status_response.get('clips', [])
                if clips:
                    audio_url = clips[0].get('audio_url')
                    duration = clips[0].get('duration')
                    suno_audio_id = clips[0].get('id')
                break

            elif status == 'failed':
                raise ValueError(f"Suno generation failed: {status_response}")

        if not audio_url:
            raise ValueError("Suno generation timed out or no audio_url")

        # --------------------------------------------------------------------
        # 5단계: 오디오 다운로드 및 Azure Blob 업로드
        # --------------------------------------------------------------------
        async with self.get_db_session() as session:
            song = await session.get(Song, song_id)
            song.status = 'uploading'
            song.suno_audio_id = suno_audio_id
            song.duration = duration
            await session.commit()

        self.update_pipeline_status(
            task_id=task_id,
            stage='song',
            status='uploading',
            message='오디오 파일을 업로드 중입니다.'
        )

        # 임시 파일 저장 경로
        temp_dir = f"media/temp/{task_id}"
        os.makedirs(temp_dir, exist_ok=True)
        temp_file = f"{temp_dir}/song.mp3"

        try:
            # 오디오 다운로드
            async with aiohttp.ClientSession() as http_session:
                async with http_session.get(audio_url) as response:
                    with open(temp_file, 'wb') as f:
                        f.write(await response.read())

            # Azure Blob 업로드
            uploader = AzureBlobUploader()
            blob_url = await uploader.upload_file(
                file_path=temp_file,
                blob_name=f"songs/{task_id}/song.mp3",
                content_type='audio/mpeg',
            )

        finally:
            # 임시 파일 정리
            if os.path.exists(temp_file):
                os.remove(temp_file)
            if os.path.exists(temp_dir):
                os.rmdir(temp_dir)

        # --------------------------------------------------------------------
        # 6단계: SongTimestamp 저장 (가사 타이밍 정보)
        # --------------------------------------------------------------------
        try:
            timestamps = await suno.get_lyric_timestamp(suno_audio_id)

            async with self.get_db_session() as session:
                for idx, ts in enumerate(timestamps):
                    song_ts = SongTimestamp(
                        suno_audio_id=suno_audio_id,
                        order_idx=idx,
                        lyric_line=ts.get('text', ''),
                        start_time=ts.get('start_time', 0),
                        end_time=ts.get('end_time', 0),
                    )
                    session.add(song_ts)
                await session.commit()

        except Exception as e:
            logger.warning(f"Failed to save timestamps: {e}")
            # 타임스탬프 저장 실패는 치명적이지 않으므로 계속 진행

        # --------------------------------------------------------------------
        # 7단계: 최종 상태 업데이트
        # --------------------------------------------------------------------
        async with self.get_db_session() as session:
            song = await session.get(Song, song_id)
            song.status = 'completed'
            song.song_result_url = blob_url
            await session.commit()

        return {
            'task_id': task_id,
            'song_id': song_id,
            'song_result_url': blob_url,
            'duration': duration,
        }

    # 비동기 함수 실행
    result = self.run_async(_generate())

    # ========================================================================
    # 8단계: 파이프라인 상태 업데이트 및 다음 단계 트리거
    # ========================================================================
    self.update_pipeline_status(
        task_id=task_id,
        stage='song',
        status='completed',
        message='노래 생성이 완료되었습니다.',
        extra_data={'song_result_url': result['song_result_url']}
    )

    # --------------------------------------------------------------------
    # 핵심: 다음 큐로 태스크 발행 (video_queue)
    # --------------------------------------------------------------------
    if auto_continue:
        from app.tasks.video_tasks import generate_video

        generate_video.apply_async(
            kwargs={
                'task_id': task_id,
                'orientation': 'vertical',  # 기본값
            },
            queue='video_queue',
            routing_key='video.generate',
        )

        logger.info(f"[Song→Video] task_id={task_id} video_queue에 발행 완료")

    return {
        'task_id': task_id,
        'status': 'completed',
        'song_result_url': result['song_result_url'],
        'next_stage': 'video' if auto_continue else None,
    }

6.6 비디오 생성 태스크

# app/tasks/video_tasks.py
"""
비디오 생성 Celery 태스크

이 모듈은 video_queue를 구독하는 워커에서만 실행됩니다.
Creatomate API를 통해 비디오를 렌더링하고 Azure Blob에 업로드합니다.
이 태스크가 파이프라인의 마지막 단계입니다.
"""

from celery import states
from sqlalchemy import select, desc
from sqlalchemy.orm import selectinload
import aiohttp
import asyncio
import os
import logging

from app.celery_app import celery_app
from app.tasks.base import BaseTaskWithDB
from app.home.models import Project, Image
from app.lyric.models import Lyric
from app.song.models import Song, SongTimestamp
from app.video.models import Video
from app.utils.creatomate import CreatomateService
from app.utils.upload_blob_as_request import AzureBlobUploader

logger = logging.getLogger(__name__)

# Creatomate 폴링 설정
CREATOMATE_POLL_INTERVAL = 15  # 초
CREATOMATE_MAX_POLL_TIME = 600  # 10분

# 템플릿 ID
TEMPLATE_ID_VERTICAL = "e8c7b43f-de4b-4ba3-b8eb-5df688569193"
TEMPLATE_ID_HORIZONTAL = "0f092a6a-f526-4ef0-9181-d4ad4426b9e7"


@celery_app.task(
    base=BaseTaskWithDB,
    bind=True,
    name='app.tasks.video_tasks.generate_video',
    queue='video_queue',                # video_queue 전용
    max_retries=2,                      # 비디오 렌더링은 비용이 높으므로 재시도 횟수 제한
    default_retry_delay=120,            # 2분 후 재시도
    acks_late=True,
    reject_on_worker_lost=True,
    soft_time_limit=840,                # 14분 soft limit
    time_limit=900,                     # 15분 hard limit
)
def generate_video(
    self,
    task_id: str,
    orientation: str = "vertical",
) -> dict:
    """
    비디오 생성 태스크 (파이프라인 최종 단계)

    Creatomate API를 호출하여 음악, 이미지, 가사를 조합한 비디오를 생성합니다.
    생성된 비디오는 Azure Blob Storage에 업로드됩니다.

    Args:
        task_id: 프로젝트 고유 식별자
        orientation: 비디오 방향 ('vertical' 또는 'horizontal')

    Returns:
        dict: {
            'task_id': str,
            'status': str,
            'result_movie_url': str (성공 시),
            'error': str (실패 시)
        }

    독립성 보장:
        - 이 태스크는 오직 video_queue에서만 실행됩니다
        - lyric_queue, song_queue의 메시지는 처리하지 않습니다
        - 워커 실행: celery -A app.celery_app worker -Q video_queue

    사전 조건:
        - Song 레코드가 존재하고 status='completed'여야 함
        - song_result_url이 유효해야 함
        - 최소 1개 이상의 Image가 있어야 함
    """

    # ========================================================================
    # 1단계: 상태 업데이트 - 처리 시작
    # ========================================================================
    self.update_pipeline_status(
        task_id=task_id,
        stage='video',
        status='processing',
        message='비디오 생성을 시작합니다.'
    )

    async def _generate():
        """비동기 비디오 생성 로직"""

        # --------------------------------------------------------------------
        # 2단계: 사전 조건 확인 및 데이터 조회
        # --------------------------------------------------------------------
        async with self.get_db_session() as session:
            # Song 조회 (가장 최근 것)
            song = await session.scalar(
                select(Song)
                .where(Song.task_id == task_id)
                .order_by(desc(Song.created_at))
            )

            # 사전 조건 검증
            if not song:
                raise ValueError(f"Song not found for task_id={task_id}")

            if song.status != 'completed':
                raise self.retry(
                    exc=ValueError(f"Song not completed: {song.status}"),
                    countdown=60,
                )

            if not song.song_result_url:
                raise ValueError("Song result URL is empty")

            # Lyric 조회
            lyric = await session.get(Lyric, song.lyric_id)

            # Project 및 Image 조회
            project = await session.get(Project, song.project_id)

            images = await session.scalars(
                select(Image)
                .where(Image.project_id == project.id)
                .where(Image.is_deleted == False)
                .order_by(Image.img_order)
            )
            image_list = list(images)

            if not image_list:
                raise ValueError("No images found for project")

            # SongTimestamp 조회
            timestamps = await session.scalars(
                select(SongTimestamp)
                .where(SongTimestamp.suno_audio_id == song.suno_audio_id)
                .order_by(SongTimestamp.order_idx)
            )
            timestamp_list = list(timestamps)

            # Video 레코드 생성
            video = Video(
                project_id=project.id,
                lyric_id=lyric.id,
                song_id=song.id,
                task_id=task_id,
                status='processing',
            )
            session.add(video)
            await session.commit()

            video_id = video.id

            # 필요한 데이터 복사 (세션 외부에서 사용)
            song_url = song.song_result_url
            song_duration = song.duration
            image_urls = [img.image_url for img in image_list]
            lyric_timestamps = [
                {
                    'text': ts.lyric_line,
                    'start': ts.start_time,
                    'end': ts.end_time,
                }
                for ts in timestamp_list
            ]

        # --------------------------------------------------------------------
        # 3단계: Creatomate 템플릿 준비
        # --------------------------------------------------------------------
        template_id = (
            TEMPLATE_ID_VERTICAL if orientation == 'vertical'
            else TEMPLATE_ID_HORIZONTAL
        )

        creatomate = CreatomateService()

        try:
            # 템플릿 조회
            template = await creatomate.get_template(template_id)

            # 템플릿 수정 데이터 구성
            modifications = {
                # 음악 설정
                'music_url': song_url,
                'duration': song_duration,

                # 이미지 매핑 (최대 10개)
                **{f'image_{i+1}': url for i, url in enumerate(image_urls[:10])},

                # 가사 타이밍 (Creatomate 형식으로 변환)
                'captions': lyric_timestamps,
            }

        except Exception as e:
            async with self.get_db_session() as session:
                video = await session.get(Video, video_id)
                video.status = 'failed'
                await session.commit()
            raise

        # --------------------------------------------------------------------
        # 4단계: Creatomate 렌더링 요청
        # --------------------------------------------------------------------
        self.update_pipeline_status(
            task_id=task_id,
            stage='video',
            status='rendering',
            message='Creatomate에서 비디오를 렌더링 중입니다.'
        )

        try:
            render_response = await creatomate.render(
                template_id=template_id,
                modifications=modifications,
            )
            render_id = render_response.get('id')

            if not render_id:
                raise ValueError("Creatomate did not return render_id")

            # DB에 render_id 저장
            async with self.get_db_session() as session:
                video = await session.get(Video, video_id)
                video.creatomate_render_id = render_id
                await session.commit()

        except Exception as e:
            async with self.get_db_session() as session:
                video = await session.get(Video, video_id)
                video.status = 'failed'
                await session.commit()
            raise

        # --------------------------------------------------------------------
        # 5단계: Creatomate 상태 폴링
        # --------------------------------------------------------------------
        elapsed = 0
        video_url = None

        while elapsed < CREATOMATE_MAX_POLL_TIME:
            await asyncio.sleep(CREATOMATE_POLL_INTERVAL)
            elapsed += CREATOMATE_POLL_INTERVAL

            self.update_pipeline_status(
                task_id=task_id,
                stage='video',
                status='rendering',
                message=f'비디오 렌더링 중... ({elapsed}초 경과)'
            )

            status_response = await creatomate.get_render_status(render_id)
            status = status_response.get('status')

            logger.info(f"Creatomate polling: task_id={task_id}, status={status}")

            if status == 'succeeded':
                video_url = status_response.get('url')
                break

            elif status == 'failed':
                error_msg = status_response.get('error_message', 'Unknown error')
                raise ValueError(f"Creatomate rendering failed: {error_msg}")

        if not video_url:
            raise ValueError("Creatomate rendering timed out")

        # --------------------------------------------------------------------
        # 6단계: 비디오 다운로드 및 Azure Blob 업로드
        # --------------------------------------------------------------------
        self.update_pipeline_status(
            task_id=task_id,
            stage='video',
            status='uploading',
            message='비디오 파일을 업로드 중입니다.'
        )

        temp_dir = f"media/temp/{task_id}"
        os.makedirs(temp_dir, exist_ok=True)
        temp_file = f"{temp_dir}/video.mp4"

        try:
            # 비디오 다운로드
            async with aiohttp.ClientSession() as http_session:
                async with http_session.get(video_url) as response:
                    with open(temp_file, 'wb') as f:
                        f.write(await response.read())

            # Azure Blob 업로드
            uploader = AzureBlobUploader()
            blob_url = await uploader.upload_file(
                file_path=temp_file,
                blob_name=f"videos/{task_id}/video.mp4",
                content_type='video/mp4',
            )

        finally:
            if os.path.exists(temp_file):
                os.remove(temp_file)
            if os.path.exists(temp_dir):
                os.rmdir(temp_dir)

        # --------------------------------------------------------------------
        # 7단계: 최종 상태 업데이트
        # --------------------------------------------------------------------
        async with self.get_db_session() as session:
            video = await session.get(Video, video_id)
            video.status = 'completed'
            video.result_movie_url = blob_url
            await session.commit()

        return {
            'task_id': task_id,
            'video_id': video_id,
            'result_movie_url': blob_url,
        }

    # 비동기 함수 실행
    result = self.run_async(_generate())

    # ========================================================================
    # 8단계: 파이프라인 완료
    # ========================================================================
    self.update_pipeline_status(
        task_id=task_id,
        stage='video',
        status='completed',
        message='비디오 생성이 완료되었습니다. 파이프라인 종료.',
        extra_data={'result_movie_url': result['result_movie_url']}
    )

    # 파이프라인 최종 단계이므로 다음 큐 발행 없음
    logger.info(f"[Pipeline Complete] task_id={task_id} 전체 파이프라인 완료")

    return {
        'task_id': task_id,
        'status': 'completed',
        'result_movie_url': result['result_movie_url'],
        'next_stage': None,  # 마지막 단계
    }

6.7 FastAPI 통합 API

# app/api/routers/v1/pipeline.py
"""
통합 파이프라인 API

클라이언트가 전체 파이프라인을 시작하고 상태를 조회할 수 있는 API입니다.
"""

from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel
from typing import Optional
import redis

from app.dependencies.auth import get_current_user
from app.tasks.lyric_tasks import generate_lyric
from app.celery_app import celery_app

router = APIRouter(prefix="/pipeline", tags=["Pipeline"])

# Redis 클라이언트
redis_client = redis.Redis.from_url(
    celery_app.conf.result_backend,
    decode_responses=True
)


class StartPipelineRequest(BaseModel):
    """파이프라인 시작 요청"""
    task_id: str
    customer_name: str
    region: str
    detail_region_info: str
    language: str = "Korean"
    auto_continue: bool = True  # 자동으로 다음 단계 진행


class PipelineStatusResponse(BaseModel):
    """파이프라인 상태 응답"""
    task_id: str
    current_stage: str
    overall_status: str
    stages: dict
    message: str


@router.post("/start", response_model=dict)
async def start_pipeline(
    request: StartPipelineRequest,
    current_user = Depends(get_current_user)
):
    """
    파이프라인 시작 API

    가사 생성 큐에 태스크를 발행하고 즉시 반환합니다.
    auto_continue=True면 가사→노래→비디오가 자동으로 진행됩니다.
    """
    # lyric_queue에 태스크 발행
    celery_task = generate_lyric.apply_async(
        kwargs={
            'task_id': request.task_id,
            'customer_name': request.customer_name,
            'region': request.region,
            'detail_region_info': request.detail_region_info,
            'language': request.language,
            'auto_continue': request.auto_continue,
        },
        queue='lyric_queue',
        routing_key='lyric.generate',
    )

    return {
        'success': True,
        'task_id': request.task_id,
        'celery_task_id': celery_task.id,
        'message': '파이프라인이 시작되었습니다.',
        'auto_continue': request.auto_continue,
    }


@router.get("/status/{task_id}", response_model=PipelineStatusResponse)
async def get_pipeline_status(
    task_id: str,
    current_user = Depends(get_current_user)
):
    """
    파이프라인 상태 조회 API

    Redis에 저장된 파이프라인 상태를 조회합니다.
    각 단계(lyric, song, video)의 상태를 통합하여 반환합니다.
    """
    pipeline_key = f"pipeline:{task_id}:status"
    pipeline_status = redis_client.hgetall(pipeline_key)

    if not pipeline_status:
        raise HTTPException(
            status_code=404,
            detail=f"Pipeline not found for task_id={task_id}"
        )

    # 각 단계 상태 조회
    stages = {}
    for stage in ['lyric', 'song', 'video']:
        stage_key = f"pipeline:{task_id}:{stage}"
        stage_status = redis_client.hgetall(stage_key)
        if stage_status:
            stages[stage] = stage_status

    # 전체 상태 결정
    current_stage = pipeline_status.get('current_stage', 'unknown')
    status = pipeline_status.get('status', 'unknown')

    # 메시지 구성
    if status == 'completed' and current_stage == 'video':
        message = '파이프라인이 완료되었습니다.'
        overall_status = 'completed'
    elif status == 'failed':
        message = f'{current_stage} 단계에서 실패했습니다.'
        overall_status = 'failed'
    else:
        message = f'{current_stage} 단계 진행 중...'
        overall_status = 'processing'

    return PipelineStatusResponse(
        task_id=task_id,
        current_stage=current_stage,
        overall_status=overall_status,
        stages=stages,
        message=message,
    )


@router.post("/retry/{task_id}/{stage}")
async def retry_stage(
    task_id: str,
    stage: str,
    current_user = Depends(get_current_user)
):
    """
    특정 단계 재시도 API

    실패한 단계를 수동으로 재시도합니다.
    """
    if stage not in ['lyric', 'song', 'video']:
        raise HTTPException(
            status_code=400,
            detail=f"Invalid stage: {stage}"
        )

    # 해당 단계 태스크 재발행
    if stage == 'lyric':
        from app.tasks.lyric_tasks import generate_lyric
        # DB에서 원본 데이터 조회 필요 (생략)
        pass
    elif stage == 'song':
        from app.tasks.song_tasks import generate_song
        generate_song.apply_async(
            kwargs={'task_id': task_id, 'auto_continue': True},
            queue='song_queue',
        )
    elif stage == 'video':
        from app.tasks.video_tasks import generate_video
        generate_video.apply_async(
            kwargs={'task_id': task_id},
            queue='video_queue',
        )

    return {
        'success': True,
        'task_id': task_id,
        'stage': stage,
        'message': f'{stage} 단계 재시도가 요청되었습니다.',
    }

7. 상태 관리 및 모니터링

7.1 이중 상태 관리 전략

┌─────────────────────────────────────────────────────────────────────────────┐
│                        이중 상태 관리 전략                                    │
└─────────────────────────────────────────────────────────────────────────────┘

                    ┌─────────────────────────────────────┐
                    │          상태 저장소 구조            │
                    └─────────────────────────────────────┘
                                     │
            ┌────────────────────────┼────────────────────────┐
            │                        │                        │
            ▼                        ▼                        ▼
    ┌───────────────┐      ┌───────────────┐      ┌───────────────┐
    │     MySQL     │      │ Redis (Custom)│      │Redis (Celery) │
    │   영구 저장    │      │  파이프라인   │      │  태스크 결과   │
    └───────────────┘      └───────────────┘      └───────────────┘
            │                        │                        │
            │                        │                        │
            ▼                        ▼                        ▼
    ┌───────────────┐      ┌───────────────┐      ┌───────────────┐
    │ Lyric.status  │      │pipeline:{id}: │      │celery-task-   │
    │ Song.status   │      │  status       │      │  meta-{uuid}  │
    │ Video.status  │      │pipeline:{id}: │      │               │
    │               │      │  lyric/song/  │      │               │
    │               │      │  video        │      │               │
    └───────────────┘      └───────────────┘      └───────────────┘
            │                        │                        │
            │                        │                        │
            ▼                        ▼                        ▼
      영구 보존              24시간 TTL               24시간 TTL
      감사 로그용            실시간 조회용            Celery 내부용

7.2 상태 동기화 흐름

sequenceDiagram
    participant Task as Celery Task
    participant DB as MySQL
    participant Redis as Redis (Custom)
    participant Celery as Redis (Celery)

    Note over Task,Celery: 태스크 시작
    Task->>DB: status = 'processing'
    Task->>Redis: pipeline:{id}:status = processing
    Task->>Celery: 자동 STARTED 상태

    Note over Task,Celery: 태스크 진행 중
    Task->>Redis: 진행 상황 업데이트 (폴링 중...)

    Note over Task,Celery: 태스크 완료
    Task->>DB: status = 'completed', result 저장
    Task->>Redis: pipeline:{id}:stage = completed
    Task->>Celery: 자동 SUCCESS 상태

7.3 Celery Result Backend 상태 코드

상태	설명	발생 시점
`PENDING`	태스크가 아직 시작되지 않음	큐에 발행 후 워커 수신 전
`STARTED`	태스크 실행 시작	워커가 태스크 수신
`RETRY`	재시도 예약됨	예외 발생 후 재시도 결정
`FAILURE`	태스크 최종 실패	최대 재시도 초과
`SUCCESS`	태스크 성공 완료	정상 완료

7.4 모니터링 도구

7.4.1 Flower (Celery 모니터링)

# Flower 설치 및 실행
pip install flower
celery -A app.celery_app flower --port=5555

# 접속: http://localhost:5555

┌─────────────────────────────────────────────────────────────────────────────┐
│                        Flower 대시보드 기능                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  [Workers]                                                                  │
│  ─────────                                                                  │
│  • 활성 워커 목록                                                            │
│  • 워커별 처리 중인 태스크                                                   │
│  • 워커 상태 (online/offline)                                               │
│                                                                             │
│  [Tasks]                                                                    │
│  ────────                                                                   │
│  • 실시간 태스크 목록                                                        │
│  • 태스크 상태 필터링 (PENDING/STARTED/SUCCESS/FAILURE)                      │
│  • 태스크 상세 정보 (인자, 결과, 실행 시간)                                   │
│                                                                             │
│  [Queues]                                                                   │
│  ────────                                                                   │
│  • 큐별 대기 태스크 수                                                       │
│  • 큐 처리량 그래프                                                          │
│                                                                             │
│  [Broker]                                                                   │
│  ────────                                                                   │
│  • Redis 연결 상태                                                           │
│  • 메모리 사용량                                                             │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

7.4.2 커스텀 상태 조회 CLI

# scripts/check_pipeline_status.py
"""파이프라인 상태 조회 스크립트"""

import redis
import sys
import json

def check_status(task_id: str):
    r = redis.Redis(host='localhost', port=6379, db=1, decode_responses=True)

    print(f"\n{'='*60}")
    print(f"Pipeline Status: {task_id}")
    print('='*60)

    # 전체 상태
    pipeline_status = r.hgetall(f"pipeline:{task_id}:status")
    if pipeline_status:
        print(f"\nCurrent Stage: {pipeline_status.get('current_stage', 'N/A')}")
        print(f"Status: {pipeline_status.get('status', 'N/A')}")

    # 단계별 상태
    for stage in ['lyric', 'song', 'video']:
        stage_data = r.hgetall(f"pipeline:{task_id}:{stage}")
        if stage_data:
            print(f"\n[{stage.upper()}]")
            for key, value in stage_data.items():
                print(f"  {key}: {value}")

    print('='*60)

if __name__ == "__main__":
    task_id = sys.argv[1] if len(sys.argv) > 1 else input("Task ID: ")
    check_status(task_id)

8. 실패 처리 전략

8.1 실패 유형 및 대응 전략

┌─────────────────────────────────────────────────────────────────────────────┐
│                          실패 유형 분류                                      │
└─────────────────────────────────────────────────────────────────────────────┘

                        ┌─────────────┐
                        │   실패 발생  │
                        └──────┬──────┘
                               │
          ┌────────────────────┼────────────────────┐
          │                    │                    │
          ▼                    ▼                    ▼
    ┌───────────┐        ┌───────────┐        ┌───────────┐
    │ 일시적    │        │ 영구적    │        │ 시스템    │
    │ Transient │        │ Permanent │        │ System    │
    └─────┬─────┘        └─────┬─────┘        └─────┬─────┘
          │                    │                    │
          ▼                    ▼                    ▼
    ┌───────────────┐  ┌───────────────┐  ┌───────────────┐
    │• 네트워크     │  │• 잘못된 입력   │  │• 워커 OOM     │
    │  타임아웃     │  │• API 키 만료   │  │• 디스크 부족   │
    │• API 일시 장애│  │• 인증 실패     │  │• DB 연결 실패  │
    │• Rate Limit  │  │• 잘못된 형식   │  │               │
    └───────────────┘  └───────────────┘  └───────────────┘
          │                    │                    │
          ▼                    ▼                    ▼
    ┌───────────────┐  ┌───────────────┐  ┌───────────────┐
    │ 자동 재시도   │  │ 즉시 실패 처리│  │ 알림 + 수동   │
    │ (지수 백오프) │  │ (재시도 안함) │  │  개입 필요    │
    └───────────────┘  └───────────────┘  └───────────────┘

8.2 재시도 전략

# 재시도 설정 예시
@celery_app.task(
    bind=True,

    # 자동 재시도 대상 예외
    autoretry_for=(
        ConnectionError,           # 네트워크 오류
        TimeoutError,              # 타임아웃
        aiohttp.ClientError,       # HTTP 클라이언트 오류
    ),

    # 재시도하지 않을 예외 (즉시 실패)
    dont_autoretry_for=(
        ValueError,                # 잘못된 입력
        PermissionError,           # 권한 오류
        KeyError,                  # 데이터 누락
    ),

    # 재시도 설정
    max_retries=3,                 # 최대 3회 재시도
    retry_backoff=True,            # 지수 백오프 활성화
    retry_backoff_max=600,         # 최대 10분 대기
    retry_jitter=True,             # 랜덤 지터 (thundering herd 방지)
)
def my_task(self, task_id: str):
    try:
        # 작업 수행
        pass
    except SunoAPIError as e:
        # 커스텀 재시도 로직
        if e.is_rate_limited:
            # Rate limit: 더 긴 대기
            raise self.retry(countdown=300, exc=e)
        elif e.is_temporary:
            # 일시적 오류: 기본 재시도
            raise self.retry(exc=e)
        else:
            # 영구적 오류: 재시도 없이 실패
            raise

8.3 지수 백오프 시각화

┌─────────────────────────────────────────────────────────────────────────────┐
│                        지수 백오프 재시도 타이밍                              │
└─────────────────────────────────────────────────────────────────────────────┘

시도 횟수    대기 시간 (jitter 포함)    누적 시간
─────────────────────────────────────────────────────────
1차 실패     │
             ├─── 30초 (±5초) ───────► 1차 재시도
                                      │
                                      ├─── 60초 (±10초) ──► 2차 재시도
                                                           │
                                                           ├─── 120초 ───► 3차 재시도
                                                                          │
                                                                          ├─► 최종 실패
                                                                              (약 3.5분 후)

※ retry_backoff_max=600 설정 시 최대 대기 시간은 10분으로 제한
※ retry_jitter=True로 여러 태스크가 동시에 재시도하는 것을 방지

8.4 Dead Letter Queue (DLQ) 패턴

# app/tasks/dlq.py
"""
Dead Letter Queue 처리

최대 재시도 후에도 실패한 태스크를 별도 큐에 저장하여
나중에 수동으로 처리할 수 있게 합니다.
"""

from celery import Celery
from kombu import Queue, Exchange

# DLQ 설정
dlq_exchange = Exchange('dlq', type='direct')

celery_app.conf.task_queues += (
    Queue(
        'dead_letter_queue',
        dlq_exchange,
        routing_key='dlq',
    ),
)


class TaskWithDLQ(BaseTaskWithDB):
    """DLQ 지원 태스크 베이스 클래스"""

    def on_failure(self, exc, task_id, args, kwargs, einfo):
        """최종 실패 시 DLQ로 이동"""
        super().on_failure(exc, task_id, args, kwargs, einfo)

        # DLQ에 실패 정보 저장
        from app.tasks.dlq_handler import store_failed_task
        store_failed_task.apply_async(
            kwargs={
                'original_task': self.name,
                'task_id': kwargs.get('task_id'),
                'args': args,
                'kwargs': kwargs,
                'exception': str(exc),
                'traceback': str(einfo),
            },
            queue='dead_letter_queue',
        )


@celery_app.task(queue='dead_letter_queue')
def store_failed_task(
    original_task: str,
    task_id: str,
    args: tuple,
    kwargs: dict,
    exception: str,
    traceback: str,
):
    """
    실패한 태스크 정보를 저장

    저장된 정보는 관리자가 검토하고 수동으로 재처리할 수 있습니다.
    """
    import json
    from datetime import datetime

    failed_task_data = {
        'original_task': original_task,
        'task_id': task_id,
        'args': args,
        'kwargs': kwargs,
        'exception': exception,
        'traceback': traceback,
        'failed_at': datetime.utcnow().isoformat(),
    }

    # Redis에 저장 (또는 DB)
    redis_client.lpush(
        'failed_tasks',
        json.dumps(failed_task_data)
    )

    # Slack/Email 알림 (선택적)
    # send_alert(f"Task failed: {original_task}, task_id={task_id}")

8.5 O2O Castad 프로젝트 최적 실패 처리

┌─────────────────────────────────────────────────────────────────────────────┐
│               O2O Castad 프로젝트 실패 처리 전략                              │
└─────────────────────────────────────────────────────────────────────────────┘

[가사 생성 (Lyric)]
──────────────────
• ChatGPT API 오류 → 최대 3회 재시도 (30초, 60초, 120초)
• 생성된 가사가 너무 짧음 → 1회 재시도 후 실패 처리
• API 키 오류 → 즉시 실패, 관리자 알림
• 권장 재시도 횟수: 3회

[노래 생성 (Song)]
──────────────────
• Suno API Rate Limit → 5분 대기 후 재시도
• 생성 타임아웃 → 1회 재시도 (Suno 서버 부하)
• 오디오 다운로드 실패 → 3회 재시도
• Azure 업로드 실패 → 3회 재시도
• 권장 재시도 횟수: 3회 (폴링 타임아웃별 별도)

[비디오 생성 (Video)]
──────────────────
• Creatomate 렌더링 실패 → 2회 재시도 (비용 고려)
• 템플릿 오류 → 즉시 실패 (수정 필요)
• 비디오 다운로드 실패 → 3회 재시도
• Azure 업로드 실패 → 3회 재시도
• 권장 재시도 횟수: 2회 (렌더링 비용 때문)

[공통 처리]
────────────
• 모든 최종 실패 → DLQ 저장 + Slack 알림
• DB 상태 업데이트 → 실패해도 반드시 기록
• 사용자 알림 → 실패 시 이메일/푸시 (선택적)

8.6 부분 실패 복구

# app/api/routers/v1/pipeline.py (추가)

@router.post("/resume/{task_id}")
async def resume_pipeline(
    task_id: str,
    current_user = Depends(get_current_user)
):
    """
    파이프라인 이어하기 API

    중간에 실패한 파이프라인을 마지막 성공 단계부터 재개합니다.
    예: 노래 생성에서 실패 → 가사는 유지하고 노래부터 재시작
    """
    # 현재 상태 확인
    pipeline_key = f"pipeline:{task_id}:status"
    status = redis_client.hgetall(pipeline_key)

    if not status:
        raise HTTPException(404, "Pipeline not found")

    current_stage = status.get('current_stage')
    current_status = status.get('status')

    if current_status != 'failed':
        raise HTTPException(400, f"Pipeline is not in failed state: {current_status}")

    # 실패한 단계부터 재시작
    if current_stage == 'lyric':
        # 가사부터 다시 (원본 데이터 필요)
        pass
    elif current_stage == 'song':
        # 노래부터 재시작
        from app.tasks.song_tasks import generate_song
        generate_song.apply_async(
            kwargs={'task_id': task_id, 'auto_continue': True},
            queue='song_queue',
        )
    elif current_stage == 'video':
        # 비디오부터 재시작
        from app.tasks.video_tasks import generate_video
        generate_video.apply_async(
            kwargs={'task_id': task_id},
            queue='video_queue',
        )

    return {
        'success': True,
        'task_id': task_id,
        'resumed_from': current_stage,
        'message': f'{current_stage} 단계부터 재개합니다.',
    }

9. 설계 및 동작 설명

9.1 아키텍처 설계 철학

9.1.1 단일 책임 원칙 (Single Responsibility)

각 워커는 정확히 하나의 책임만 가집니다:

┌─────────────────────────────────────────────────────────────────────────────┐
│                        단일 책임 원칙 적용                                   │
└─────────────────────────────────────────────────────────────────────────────┘

Lyric Worker의 책임:
├─ ✓ ChatGPT API 호출
├─ ✓ 가사 생성 결과 저장
├─ ✓ song_queue에 다음 작업 발행
├─ ✗ 노래 생성 (Song Worker의 책임)
└─ ✗ 비디오 생성 (Video Worker의 책임)

이 원칙이 중요한 이유:
1. 디버깅 용이: 문제 발생 시 해당 워커만 확인
2. 독립적 스케일링: 병목 지점의 워커만 증설
3. 장애 격리: 한 워커의 문제가 다른 워커에 영향 없음
4. 코드 단순화: 각 태스크가 명확한 범위의 코드만 포함

9.1.2 느슨한 결합 (Loose Coupling)

태스크 간 연결은 task_id만으로 이루어집니다:

graph LR
    subgraph "강한 결합 (안티패턴)"
        A1[Lyric Task] -->|lyric_result 직접 전달| B1[Song Task]
        B1 -->|song_url 직접 전달| C1[Video Task]
    end

    subgraph "느슨한 결합 (채택)"
        A2[Lyric Task] -->|task_id만 전달| Q2[(song_queue)]
        Q2 --> B2[Song Task]
        B2 -->|task_id만 전달| Q3[(video_queue)]
        Q3 --> C2[Video Task]
    end

느슨한 결합의 장점:

각 태스크가 DB에서 필요한 데이터를 직접 조회
이전 단계의 결과가 변경되어도 영향 없음
중간 단계 재시도 시 최신 데이터 사용
태스크 간 데이터 직렬화 문제 없음

9.1.3 멱등성 (Idempotency)

동일한 태스크가 여러 번 실행되어도 결과가 동일합니다:

# 멱등성 보장 예시
async def _generate():
    async with self.get_db_session() as session:
        # 기존 완료된 Lyric이 있는지 확인
        existing = await session.scalar(
            select(Lyric)
            .where(Lyric.task_id == task_id)
            .where(Lyric.status == 'completed')
        )

        if existing:
            # 이미 완료됨 - 중복 실행 방지
            return {
                'task_id': task_id,
                'lyric_result': existing.lyric_result,
                'already_exists': True,
            }

        # 새로 생성
        ...

9.2 큐 기반 파이프라인 동작 원리

9.2.1 메시지 흐름 상세

┌─────────────────────────────────────────────────────────────────────────────┐
│                        메시지 흐름 상세                                      │
└─────────────────────────────────────────────────────────────────────────────┘

1. API 서버 → lyric_queue
   ────────────────────────
   FastAPI가 generate_lyric.apply_async() 호출
                ↓
   Celery가 메시지를 JSON 직렬화
                ↓
   Redis의 lyric_queue 리스트에 LPUSH
                ↓
   메시지 형태: {"task": "...", "kwargs": {"task_id": "xxx", ...}}

2. lyric_queue → Lyric Worker
   ────────────────────────────
   워커가 BRPOP으로 메시지 대기 (blocking pop)
                ↓
   메시지 수신 시 JSON 역직렬화
                ↓
   generate_lyric 함수 호출 (task_id, customer_name, ...)
                ↓
   작업 완료 후 ACK (메시지 제거)

3. Lyric Worker → song_queue
   ────────────────────────────
   generate_lyric 태스크 내부에서:
   generate_song.apply_async(kwargs={'task_id': task_id}, queue='song_queue')
                ↓
   새 메시지가 song_queue에 발행
                ↓
   Lyric Worker는 이 메시지를 처리하지 않음 (구독하지 않는 큐)

4. 이하 동일한 패턴 반복
   song_queue → Song Worker → video_queue → Video Worker

9.2.2 워커 구독 모델

┌─────────────────────────────────────────────────────────────────────────────┐
│                        워커 구독 모델                                        │
└─────────────────────────────────────────────────────────────────────────────┘

                          Redis Broker
    ┌─────────────────────────────────────────────────────┐
    │  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐   │
    │  │lyric_queue │ │ song_queue  │ │video_queue │   │
    │  │ [msg1]     │ │ [msg2]      │ │ [msg3]     │   │
    │  │ [msg4]     │ │             │ │            │   │
    │  └─────┬───────┘ └──────┬──────┘ └──────┬──────┘   │
    └────────┼────────────────┼───────────────┼──────────┘
             │                │               │
             │ BRPOP         │ BRPOP         │ BRPOP
             │                │               │
    ┌────────▼────────┐ ┌────▼─────┐ ┌───────▼───────┐
    │ Lyric Worker 1  │ │Song      │ │Video Worker 1 │
    │ Lyric Worker 2  │ │Worker 1  │ │               │
    │ Lyric Worker 3  │ │Song      │ │               │
    │                 │ │Worker 2  │ │               │
    └─────────────────┘ └──────────┘ └───────────────┘

    ※ 각 워커 그룹은 자신이 구독한 큐의 메시지만 처리
    ※ 여러 워커가 같은 큐를 구독하면 라운드로빈으로 분배

9.3 상태 추적 메커니즘

9.3.1 Celery 내장 상태 vs 커스텀 상태

┌─────────────────────────────────────────────────────────────────────────────┐
│                  Celery 상태 vs 커스텀 상태 비교                             │
└─────────────────────────────────────────────────────────────────────────────┘

Celery 내장 상태 (celery-task-meta-{uuid})
──────────────────────────────────────────
장점:
• 자동 관리 (별도 코드 불필요)
• Flower 등 도구와 통합
• 표준화된 상태 코드

단점:
• Celery task ID 기반 (프로젝트 task_id와 다름)
• 세부 진행 상황 표현 불가
• 파이프라인 전체 뷰 없음

커스텀 상태 (pipeline:{task_id}:*)
──────────────────────────────────
장점:
• 프로젝트 task_id 기반 조회
• 단계별 세부 상태 저장
• 파이프라인 전체 상태 한눈에 파악
• 비즈니스 로직에 맞는 상태 정의

단점:
• 직접 관리 필요
• TTL 설정 필요
• 코드 복잡도 증가

→ 결론: 둘 다 사용 (Celery는 내부용, 커스텀은 API용)

9.3.2 상태 조회 API 동작

sequenceDiagram
    participant Client
    participant API as FastAPI
    participant Redis as Redis (Custom)
    participant DB as MySQL

    Client->>API: GET /pipeline/status/{task_id}

    API->>Redis: HGETALL pipeline:{task_id}:status
    Redis-->>API: {current_stage, status}

    API->>Redis: HGETALL pipeline:{task_id}:lyric
    Redis-->>API: {status, message, ...}

    API->>Redis: HGETALL pipeline:{task_id}:song
    Redis-->>API: {status, message, song_url, ...}

    API->>Redis: HGETALL pipeline:{task_id}:video
    Redis-->>API: {status, message, video_url, ...}

    Note over API: 상태 통합 및 응답 구성

    API-->>Client: PipelineStatusResponse

9.4 외부 API 폴링 전략

각 외부 API(Suno, Creatomate)는 비동기로 작업을 처리하므로, 완료까지 폴링이 필요합니다:

┌─────────────────────────────────────────────────────────────────────────────┐
│                        외부 API 폴링 전략                                    │
└─────────────────────────────────────────────────────────────────────────────┘

[Suno API 폴링]
────────────────
요청 → task_id 수신 → 폴링 시작
      │
      ├─ 10초 간격으로 상태 확인
      ├─ PENDING → 계속 대기
      ├─ processing → 계속 대기
      ├─ SUCCESS → 오디오 URL 추출, 완료
      └─ failed → 예외 발생, 재시도

최대 대기: 5분 (30회 폴링)
타임아웃 시: ValueError 발생 → Celery 재시도

[Creatomate API 폴링]
────────────────────
요청 → render_id 수신 → 폴링 시작
      │
      ├─ 15초 간격으로 상태 확인
      ├─ planned → 계속 대기
      ├─ rendering → 계속 대기
      ├─ succeeded → 비디오 URL 추출, 완료
      └─ failed → 예외 발생, 재시도

최대 대기: 10분 (40회 폴링)
타임아웃 시: ValueError 발생 → Celery 재시도

[폴링 중 상태 업데이트]
────────────────────────
폴링 루프 내에서 Redis 상태를 주기적으로 업데이트하여
클라이언트가 "Suno 음악 생성 중... (30초 경과)" 같은
진행 상황을 확인할 수 있게 합니다.

9.5 DB 세션 관리 전략

┌─────────────────────────────────────────────────────────────────────────────┐
│                     DB 세션 관리 전략                                        │
└─────────────────────────────────────────────────────────────────────────────┘

[문제 상황]
───────────
외부 API 호출 (Suno/Creatomate)은 수 분이 걸릴 수 있음.
이 동안 DB 연결을 유지하면:
• 커넥션 풀 고갈
• 연결 타임아웃 (MySQL wait_timeout=300초)
• Lost connection 오류

[해결책: 3단계 세션 패턴]
───────────────────────

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  1단계:     │     │  2단계:     │     │  3단계:     │
│  데이터     │────▶│  외부 API   │────▶│  결과       │
│  준비       │     │  호출       │     │  저장       │
└─────────────┘     └─────────────┘     └─────────────┘
      │                   │                   │
      ▼                   ▼                   ▼
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│ DB 세션    │     │ DB 세션    │     │ DB 세션    │
│ 사용       │     │ 없음       │     │ 사용       │
│ (짧은 시간)│     │ (긴 시간)  │     │ (짧은 시간)│
└─────────────┘     └─────────────┘     └─────────────┘

코드 패턴:
```python
# 1단계: 데이터 준비
async with self.get_db_session() as session:
    # DB 조회/저장
    lyric = await session.get(Lyric, lyric_id)
    lyrics_text = lyric.lyric_result
# 세션 자동 종료

# 2단계: 외부 API (세션 없음)
audio_url = await suno.generate_and_poll(lyrics_text)

# 3단계: 결과 저장
async with self.get_db_session() as session:
    song = await session.get(Song, song_id)
    song.song_result_url = audio_url
    await session.commit()


---

## 10. 배포 및 운영

### 10.1 개발 환경 실행

```bash
# 1. Redis 실행 (Docker)
docker run -d --name redis -p 6379:6379 redis:7-alpine

# 2. 환경 변수 설정
export CELERY_BROKER_URL=redis://localhost:6379/0
export CELERY_RESULT_BACKEND=redis://localhost:6379/1

# 3. FastAPI 서버 실행
uv run uvicorn main:app --reload

# 4. Celery 워커 실행 (각각 별도 터미널)
# 가사 워커
uv run celery -A app.celery_app worker -Q lyric_queue -c 2 --loglevel=info -n lyric@%h

# 노래 워커
uv run celery -A app.celery_app worker -Q song_queue -c 2 --loglevel=info -n song@%h

# 비디오 워커
uv run celery -A app.celery_app worker -Q video_queue -c 1 --loglevel=info -n video@%h

# 5. Flower 모니터링 (선택)
uv run celery -A app.celery_app flower --port=5555

10.2 프로덕션 Docker Compose

# docker-compose.yml
version: '3.8'

services:
  # Redis (브로커 + 결과 백엔드)
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data
    command: redis-server --appendonly yes
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5

  # FastAPI 서버
  api:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "8000:8000"
    environment:
      - CELERY_BROKER_URL=redis://redis:6379/0
      - CELERY_RESULT_BACKEND=redis://redis:6379/1
      - DATABASE_URL=${DATABASE_URL}
    depends_on:
      redis:
        condition: service_healthy
    command: uvicorn main:app --host 0.0.0.0 --port 8000

  # Lyric Worker
  lyric-worker:
    build:
      context: .
      dockerfile: Dockerfile
    environment:
      - CELERY_BROKER_URL=redis://redis:6379/0
      - CELERY_RESULT_BACKEND=redis://redis:6379/1
      - DATABASE_URL=${DATABASE_URL}
      - OPENAI_API_KEY=${OPENAI_API_KEY}
    depends_on:
      redis:
        condition: service_healthy
    command: celery -A app.celery_app worker -Q lyric_queue -c 4 --loglevel=info -n lyric@%h
    deploy:
      replicas: 2  # 스케일 아웃

  # Song Worker
  song-worker:
    build:
      context: .
      dockerfile: Dockerfile
    environment:
      - CELERY_BROKER_URL=redis://redis:6379/0
      - CELERY_RESULT_BACKEND=redis://redis:6379/1
      - DATABASE_URL=${DATABASE_URL}
      - SUNO_API_KEY=${SUNO_API_KEY}
      - AZURE_STORAGE_CONNECTION_STRING=${AZURE_STORAGE_CONNECTION_STRING}
    depends_on:
      redis:
        condition: service_healthy
    command: celery -A app.celery_app worker -Q song_queue -c 2 --loglevel=info -n song@%h
    deploy:
      replicas: 1

  # Video Worker
  video-worker:
    build:
      context: .
      dockerfile: Dockerfile
    environment:
      - CELERY_BROKER_URL=redis://redis:6379/0
      - CELERY_RESULT_BACKEND=redis://redis:6379/1
      - DATABASE_URL=${DATABASE_URL}
      - CREATOMATE_API_KEY=${CREATOMATE_API_KEY}
      - AZURE_STORAGE_CONNECTION_STRING=${AZURE_STORAGE_CONNECTION_STRING}
    depends_on:
      redis:
        condition: service_healthy
    command: celery -A app.celery_app worker -Q video_queue -c 2 --loglevel=info -n video@%h
    deploy:
      replicas: 1

  # Flower 모니터링
  flower:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "5555:5555"
    environment:
      - CELERY_BROKER_URL=redis://redis:6379/0
      - CELERY_RESULT_BACKEND=redis://redis:6379/1
    depends_on:
      - redis
    command: celery -A app.celery_app flower --port=5555

volumes:
  redis_data:

10.3 Kubernetes 배포 (선택적)

# k8s/lyric-worker-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: lyric-worker
spec:
  replicas: 3
  selector:
    matchLabels:
      app: lyric-worker
  template:
    metadata:
      labels:
        app: lyric-worker
    spec:
      containers:
      - name: lyric-worker
        image: your-registry/castad-backend:latest
        command: ["celery", "-A", "app.celery_app", "worker", "-Q", "lyric_queue", "-c", "4"]
        env:
        - name: CELERY_BROKER_URL
          valueFrom:
            secretKeyRef:
              name: celery-secrets
              key: broker-url
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"
---
# HPA (Horizontal Pod Autoscaler)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: lyric-worker-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: lyric-worker
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

10.4 운영 체크리스트

┌─────────────────────────────────────────────────────────────────────────────┐
│                        운영 체크리스트                                       │
└─────────────────────────────────────────────────────────────────────────────┘

[배포 전]
────────
□ Redis 영속성 설정 확인 (AOF/RDB)
□ 환경 변수 설정 완료
□ DB 마이그레이션 완료
□ 외부 API 키 유효성 확인
□ Azure Blob Storage 접근 권한 확인

[배포 중]
────────
□ 기존 워커 graceful shutdown
□ 진행 중인 태스크 완료 대기
□ 새 워커 배포
□ 헬스체크 통과 확인

[배포 후]
────────
□ Flower 대시보드에서 워커 상태 확인
□ 테스트 파이프라인 실행
□ 로그 모니터링
□ 에러율 확인

[일상 모니터링]
──────────────
□ 큐 대기열 길이 (병목 감지)
□ 태스크 실패율 (< 1% 목표)
□ 평균 처리 시간
□ Redis 메모리 사용량
□ DB 커넥션 풀 사용량

10.5 트러블슈팅 가이드

┌─────────────────────────────────────────────────────────────────────────────┐
│                        트러블슈팅 가이드                                     │
└─────────────────────────────────────────────────────────────────────────────┘

[문제: 태스크가 큐에 쌓이고 처리되지 않음]
─────────────────────────────────────────
1. 워커가 실행 중인지 확인
   celery -A app.celery_app inspect active

2. 워커가 올바른 큐를 구독하는지 확인
   celery -A app.celery_app inspect active_queues

3. Redis 연결 확인
   redis-cli ping

[문제: 태스크가 계속 재시도됨]
─────────────────────────────────
1. 워커 로그 확인
   docker logs lyric-worker

2. 예외 내용 확인 (Flower 또는 로그)

3. 외부 API 상태 확인 (Suno/Creatomate/ChatGPT)

[문제: Lost connection to MySQL during query]
──────────────────────────────────────────────
1. 외부 API 호출 중 DB 세션 유지 여부 확인
2. pool_recycle 설정 확인 (< MySQL wait_timeout)
3. pool_pre_ping=True 설정 확인

[문제: Redis 메모리 부족]
─────────────────────────
1. 만료된 결과 정리
   celery -A app.celery_app purge

2. result_expires 설정 확인 (기본 24시간)

3. 실패한 태스크 DLQ 정리

[문제: 특정 단계에서 파이프라인 멈춤]
─────────────────────────────────────
1. 해당 단계 큐 상태 확인
   redis-cli llen song_queue

2. 해당 워커 상태 확인

3. 수동 재시도 API 호출
   POST /pipeline/retry/{task_id}/{stage}

부록: 요약 다이어그램

A. 전체 시스템 아키텍처

┌─────────────────────────────────────────────────────────────────────────────┐
│                     O2O Castad Celery 아키텍처 전체 뷰                        │
└─────────────────────────────────────────────────────────────────────────────┘

                                ┌─────────────┐
                                │   Client    │
                                │ (Frontend)  │
                                └──────┬──────┘
                                       │
                                       │ REST API
                                       ▼
                              ┌────────────────────┐
                              │     FastAPI        │
                              │   (API Server)     │
                              │ ┌──────────────┐   │
                              │ │ POST /start  │   │
                              │ │ GET /status  │   │
                              │ │ POST /retry  │   │
                              │ └──────────────┘   │
                              └─────────┬──────────┘
                                        │
                    ┌───────────────────┼───────────────────┐
                    │                   │                   │
                    ▼                   ▼                   ▼
           ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
           │lyric_queue │     │ song_queue  │     │video_queue │
           └──────┬──────┘     └──────┬──────┘     └──────┬──────┘
                  │                   │                   │
                  │                   │                   │
        ┌─────────▼─────────┐ ┌───────▼───────┐ ┌─────────▼─────────┐
        │   Lyric Workers   │ │ Song Workers  │ │  Video Workers    │
        │  ┌─────────────┐  │ │ ┌───────────┐ │ │ ┌─────────────┐   │
        │  │ ChatGPT API │  │ │ │ Suno API  │ │ │ │ Creatomate  │   │
        │  └─────────────┘  │ │ └───────────┘ │ │ │    API      │   │
        └───────────────────┘ └───────────────┘ │ └─────────────┘   │
                                               └───────────────────┘
                    │                   │                   │
                    └───────────────────┼───────────────────┘
                                        │
                                        ▼
                              ┌────────────────────┐
                              │      MySQL         │
                              │ ┌──────────────┐   │
                              │ │   Project    │   │
                              │ │   Lyric      │   │
                              │ │   Song       │   │
                              │ │   Video      │   │
                              │ └──────────────┘   │
                              └────────────────────┘
                                        │
                                        │
                              ┌────────────────────┐
                              │   Azure Blob       │
                              │ ┌──────────────┐   │
                              │ │  songs/*.mp3 │   │
                              │ │ videos/*.mp4 │   │
                              │ └──────────────┘   │
                              └────────────────────┘

┌─────────────────────────────────────────────────────────────────────────────┐
│                              Redis                                          │
│  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐          │
│  │ DB 0: Broker     │  │ DB 1: Results    │  │ DB 1: Pipeline   │          │
│  │ (Task Queues)    │  │ (Celery State)   │  │ (Custom State)   │          │
│  └──────────────────┘  └──────────────────┘  └──────────────────┘          │
└─────────────────────────────────────────────────────────────────────────────┘

B. 파이프라인 상태 전이도

┌─────────────────────────────────────────────────────────────────────────────┐
│                        파이프라인 상태 전이                                   │
└─────────────────────────────────────────────────────────────────────────────┘

    ┌──────────┐
    │ pending  │ ← API 요청
    └────┬─────┘
         │
         ▼
    ┌──────────┐     ┌──────────┐
    │  lyric   │────▶│  lyric   │
    │processing│     │ failed   │ → 재시도 또는 DLQ
    └────┬─────┘     └──────────┘
         │ 완료
         ▼
    ┌──────────┐
    │  lyric   │
    │completed │
    └────┬─────┘
         │ song_queue 발행
         ▼
    ┌──────────┐     ┌──────────┐
    │   song   │────▶│   song   │
    │processing│     │  failed  │ → 재시도 또는 DLQ
    └────┬─────┘     └──────────┘
         │ 완료
         ▼
    ┌──────────┐
    │   song   │
    │completed │
    └────┬─────┘
         │ video_queue 발행
         ▼
    ┌──────────┐     ┌──────────┐
    │  video   │────▶│  video   │
    │processing│     │  failed  │ → 재시도 또는 DLQ
    └────┬─────┘     └──────────┘
         │ 완료
         ▼
    ┌──────────┐
    │  video   │
    │completed │
    └────┬─────┘
         │
         ▼
    ┌──────────┐
    │ PIPELINE │
    │ COMPLETE │
    └──────────┘

11. 의존성 및 설치

11.1 필요한 패키지

# pyproject.toml에 추가
[project]
dependencies = [
    # ... 기존 의존성 ...

    # Celery 관련
    "celery[redis]>=5.3.0",    # Celery + Redis 지원
    "kombu>=5.3.0",            # 메시지 큐 추상화 (Celery 의존성)
    "flower>=2.0.0",           # Celery 모니터링 UI
    "redis>=5.0.0",            # Redis 클라이언트
]

11.2 설치 명령어

# uv 사용 시
uv add "celery[redis]>=5.3.0" "flower>=2.0.0" "redis>=5.0.0"

# pip 사용 시
pip install "celery[redis]>=5.3.0" flower redis

11.3 Redis 설치

# macOS (Homebrew)
brew install redis
brew services start redis

# Ubuntu/Debian
sudo apt-get install redis-server
sudo systemctl start redis

# Docker (권장)
docker run -d --name redis -p 6379:6379 redis:7-alpine

12. 마이그레이션 계획

12.1 단계별 전환 계획

┌─────────────────────────────────────────────────────────────────────────────┐
│                        마이그레이션 단계                                     │
└─────────────────────────────────────────────────────────────────────────────┘

Phase 1: 인프라 준비 (1일)
─────────────────────────
□ Redis 서버 설치 및 설정
□ Celery 패키지 설치
□ celery_app.py, celery_config.py 작성
□ 기본 태스크 구조 생성

Phase 2: 가사 생성 전환 (2일)
─────────────────────────────
□ lyric_tasks.py 구현
□ 기존 lyric_task.py 로직 이전
□ /lyric/generate API 수정 (Celery 호출)
□ 로컬 테스트

Phase 3: 노래 생성 전환 (2일)
─────────────────────────────
□ song_tasks.py 구현
□ Suno API 폴링 로직 이전
□ lyric_task → song_task 연결 구현
□ 로컬 테스트

Phase 4: 비디오 생성 전환 (2일)
──────────────────────────────
□ video_tasks.py 구현
□ Creatomate 폴링 로직 이전
□ song_task → video_task 연결 구현
□ 전체 파이프라인 테스트

Phase 5: 통합 및 모니터링 (1일)
──────────────────────────────
□ Pipeline API 구현
□ Flower 설정
□ 커스텀 상태 추적 구현
□ 알림 설정 (Slack/Email)

Phase 6: 배포 및 검증 (2일)
─────────────────────────────
□ 스테이징 환경 배포
□ 부하 테스트
□ 실패 복구 테스트
□ 프로덕션 배포

12.2 롤백 계획

┌─────────────────────────────────────────────────────────────────────────────┐
│                        롤백 전략                                            │
└─────────────────────────────────────────────────────────────────────────────┘

[병행 운영 기간]
─────────────────
전환 초기에는 기존 BackgroundTasks와 Celery를 병행 운영합니다.

1. 환경 변수로 스위칭:
   USE_CELERY=true   → Celery 사용
   USE_CELERY=false  → 기존 BackgroundTasks 사용

2. 코드 예시:
   ```python
   if settings.USE_CELERY:
       generate_lyric.apply_async(kwargs={...}, queue='lyric_queue')
   else:
       background_tasks.add_task(generate_lyric_background, ...)

[롤백 시] ─────────

USE_CELERY=false 설정
API 서버 재시작
Celery 워커 중지 (진행 중인 작업 완료 후)
Redis 큐 정리

[데이터 무결성] ────────────── • DB 스키마는 변경 없음 (Lyric, Song, Video 모델 그대로) • 진행 중인 작업은 DB status로 확인 가능 • 실패한 작업은 수동 재처리 가능


### 12.3 기존 코드 변경 최소화

```python
# 기존: app/lyric/worker/lyric_task.py
# 변경 없이 유지 (병행 운영용)

# 신규: app/tasks/lyric_tasks.py
# 기존 로직을 Celery 태스크로 래핑

# 예시: 기존 함수 재사용
from app.lyric.worker.lyric_task import generate_lyric_logic

@celery_app.task(base=BaseTaskWithDB, bind=True)
def generate_lyric(self, task_id: str, ...):
    """Celery 태스크 - 기존 로직 재사용"""

    async def _run():
        # 기존 로직 함수 호출
        return await generate_lyric_logic(
            task_id=task_id,
            ...
        )

    result = self.run_async(_run())

    # 다음 단계 트리거 (Celery 전용)
    if result['status'] == 'completed':
        generate_song.apply_async(...)

    return result

13. 테스트 전략

13.1 단위 테스트

# tests/test_tasks/test_lyric_tasks.py
"""가사 생성 태스크 단위 테스트"""

import pytest
from unittest.mock import AsyncMock, patch, MagicMock
from app.tasks.lyric_tasks import generate_lyric


class TestGenerateLyricTask:
    """generate_lyric 태스크 테스트"""

    @pytest.fixture
    def mock_chatgpt(self):
        """ChatGPT 서비스 모킹"""
        with patch('app.tasks.lyric_tasks.ChatgptService') as mock:
            service = AsyncMock()
            service.generate_lyric.return_value = "테스트 가사입니다..."
            mock.return_value = service
            yield mock

    @pytest.fixture
    def mock_db_session(self):
        """DB 세션 모킹"""
        with patch('app.tasks.base.BackgroundSessionLocal') as mock:
            session = AsyncMock()
            mock.return_value.__aenter__.return_value = session
            yield session

    def test_generate_lyric_success(self, mock_chatgpt, mock_db_session):
        """정상 가사 생성 테스트"""
        # Given
        task_id = "test-task-123"

        # When
        result = generate_lyric.apply(
            kwargs={
                'task_id': task_id,
                'customer_name': '테스트 매장',
                'region': '서울',
                'detail_region_info': '강남구',
                'language': 'Korean',
                'auto_continue': False,  # 다음 단계 트리거 안함
            }
        ).get()

        # Then
        assert result['status'] == 'completed'
        assert result['task_id'] == task_id
        assert 'lyric_result' in result

    def test_generate_lyric_chatgpt_failure(self, mock_chatgpt, mock_db_session):
        """ChatGPT API 실패 시 재시도 테스트"""
        # Given
        mock_chatgpt.return_value.generate_lyric.side_effect = Exception("API Error")

        # When/Then
        with pytest.raises(Exception):
            generate_lyric.apply(
                kwargs={
                    'task_id': 'test-123',
                    'customer_name': 'Test',
                    'region': 'Seoul',
                    'detail_region_info': 'Gangnam',
                    'auto_continue': False,
                }
            ).get()

13.2 통합 테스트

# tests/test_tasks/test_pipeline_integration.py
"""파이프라인 통합 테스트"""

import pytest
from celery import chain
from app.tasks.lyric_tasks import generate_lyric
from app.tasks.song_tasks import generate_song
from app.tasks.video_tasks import generate_video


class TestPipelineIntegration:
    """전체 파이프라인 통합 테스트"""

    @pytest.fixture
    def celery_app(self):
        """테스트용 Celery 앱"""
        from app.celery_app import celery_app
        celery_app.conf.update(
            task_always_eager=True,  # 동기 실행
            task_eager_propagates=True,
        )
        return celery_app

    @pytest.mark.integration
    def test_full_pipeline(self, celery_app):
        """전체 파이프라인 실행 테스트"""
        # Given
        task_id = "integration-test-123"

        # When
        result = generate_lyric.apply(
            kwargs={
                'task_id': task_id,
                'customer_name': '통합테스트 매장',
                'region': '서울',
                'detail_region_info': '테스트구',
                'language': 'Korean',
                'auto_continue': True,
            }
        ).get(timeout=300)  # 5분 타임아웃

        # Then
        assert result['status'] == 'completed'
        # 비디오까지 생성되었는지 DB 확인
        # ...

    @pytest.mark.integration
    def test_pipeline_failure_recovery(self, celery_app):
        """파이프라인 실패 복구 테스트"""
        # 중간 단계 실패 후 재시도 테스트
        pass

13.3 워커 테스트 실행

# 테스트용 워커 실행 (eager 모드)
CELERY_TASK_ALWAYS_EAGER=true pytest tests/test_tasks/

# 실제 워커로 통합 테스트 (별도 터미널 필요)
# 터미널 1: 워커 실행
celery -A app.celery_app worker -Q lyric_queue,song_queue,video_queue --loglevel=debug

# 터미널 2: 테스트 실행
pytest tests/test_tasks/test_pipeline_integration.py -v

13.4 부하 테스트

# scripts/load_test.py
"""파이프라인 부하 테스트"""

import asyncio
import aiohttp
import time
from concurrent.futures import ThreadPoolExecutor


async def start_pipeline(session: aiohttp.ClientSession, index: int):
    """단일 파이프라인 시작"""
    url = "http://localhost:8000/api/v1/pipeline/start"
    payload = {
        "task_id": f"load-test-{index}-{int(time.time())}",
        "customer_name": f"부하테스트 매장 {index}",
        "region": "서울",
        "detail_region_info": "테스트구",
        "language": "Korean",
        "auto_continue": True,
    }

    async with session.post(url, json=payload) as response:
        return await response.json()


async def run_load_test(concurrency: int = 10, total_requests: int = 100):
    """부하 테스트 실행"""
    async with aiohttp.ClientSession() as session:
        tasks = []
        for i in range(total_requests):
            task = asyncio.create_task(start_pipeline(session, i))
            tasks.append(task)

            # 동시성 제한
            if len(tasks) >= concurrency:
                await asyncio.gather(*tasks)
                tasks = []

        if tasks:
            await asyncio.gather(*tasks)


if __name__ == "__main__":
    import sys
    concurrency = int(sys.argv[1]) if len(sys.argv) > 1 else 10
    total = int(sys.argv[2]) if len(sys.argv) > 2 else 100

    print(f"Starting load test: {concurrency} concurrent, {total} total")
    asyncio.run(run_load_test(concurrency, total))

14. 보안 고려사항

14.1 민감 정보 관리

┌─────────────────────────────────────────────────────────────────────────────┐
│                        민감 정보 관리                                        │
└─────────────────────────────────────────────────────────────────────────────┘

[환경 변수로 관리할 항목]
─────────────────────────
• CELERY_BROKER_URL          - Redis 연결 정보
• CELERY_RESULT_BACKEND      - Result 저장소 연결 정보
• OPENAI_API_KEY             - ChatGPT API 키
• SUNO_API_KEY               - Suno API 키
• CREATOMATE_API_KEY         - Creatomate API 키
• AZURE_STORAGE_CONNECTION   - Azure 연결 문자열

[Redis 보안]
────────────
• Redis 비밀번호 설정 (AUTH)
• 네트워크 격리 (내부망에서만 접근)
• TLS 암호화 (프로덕션)

[워커 격리]
────────────
• 각 워커는 필요한 API 키만 환경 변수로 전달
• Lyric Worker: OPENAI_API_KEY만 필요
• Song Worker: SUNO_API_KEY, AZURE_* 필요
• Video Worker: CREATOMATE_API_KEY, AZURE_* 필요

14.2 Rate Limiting

# 외부 API Rate Limit 대응
from celery import Task

class RateLimitedTask(Task):
    """Rate Limit 대응 태스크"""

    # 분당 최대 호출 횟수
    rate_limit = '10/m'

    def __call__(self, *args, **kwargs):
        # Rate Limit 체크
        return super().__call__(*args, **kwargs)

문서 버전

버전	날짜	변경 내용
1.0	2024-XX-XX	초안 작성
1.1	2024-XX-XX	의존성, 마이그레이션, 테스트, 보안 섹션 추가

작성자: Claude AI 검토자: [검토자 이름] 승인자: [승인자 이름]

144 KiB Raw Blame History