# API 타임아웃 및 재시도 로직 개선 계획

## 개요

외부 API 호출 시 타임아웃 미설정 및 재시도 로직 부재로 인한 안정성 문제를 해결합니다.

---

## 현재 상태

| 모듈 | 외부 API | 타임아웃 | 재시도 |
|------|----------|----------|--------|
| Lyric | ChatGPT (OpenAI) | ❌ 미설정 (SDK 기본 ~600초) | ❌ 없음 |
| Song | Suno API | ✅ 30-120초 | ❌ 없음 |
| Video | Creatomate API | ✅ 30-60초 | ❌ 없음 |

---

## 수정 계획

### 1. ChatGPT API 타임아웃 설정

**파일:** `app/utils/chatgpt_prompt.py`

**현재 코드:**
```python
class ChatgptService:
    def __init__(self):
        self.client = AsyncOpenAI(api_key=apikey_settings.CHATGPT_API_KEY)
```

**수정 코드:**
```python
class ChatgptService:
    # 타임아웃 설정 (초)
    DEFAULT_TIMEOUT = 60.0  # 전체 타임아웃
    CONNECT_TIMEOUT = 10.0  # 연결 타임아웃

    def __init__(self):
        self.client = AsyncOpenAI(
            api_key=apikey_settings.CHATGPT_API_KEY,
            timeout=httpx.Timeout(
                self.DEFAULT_TIMEOUT,
                connect=self.CONNECT_TIMEOUT,
            ),
        )
```

**필요한 import 추가:**
```python
import httpx
```

---

### 2. 재시도 유틸리티 함수 생성

**파일:** `app/utils/retry.py` (새 파일)

```python
"""
API 호출 재시도 유틸리티

지수 백오프(Exponential Backoff)를 사용한 재시도 로직을 제공합니다.
"""

import asyncio
import logging
from functools import wraps
from typing import Callable, Tuple, Type

logger = logging.getLogger(__name__)


class RetryExhaustedError(Exception):
    """모든 재시도 실패 시 발생하는 예외"""
    def __init__(self, message: str, last_exception: Exception):
        super().__init__(message)
        self.last_exception = last_exception


async def retry_async(
    func: Callable,
    max_retries: int = 3,
    base_delay: float = 1.0,
    max_delay: float = 30.0,
    exponential_base: float = 2.0,
    retry_on: Tuple[Type[Exception], ...] = (Exception,),
    on_retry: Callable[[int, Exception], None] | None = None,
):
    """
    비동기 함수 재시도 실행

    Args:
        func: 실행할 비동기 함수 (인자 없음)
        max_retries: 최대 재시도 횟수 (기본: 3)
        base_delay: 첫 번째 재시도 대기 시간 (초)
        max_delay: 최대 대기 시간 (초)
        exponential_base: 지수 백오프 배수 (기본: 2.0)
        retry_on: 재시도할 예외 타입들
        on_retry: 재시도 시 호출될 콜백 (attempt, exception)

    Returns:
        함수 실행 결과

    Raises:
        RetryExhaustedError: 모든 재시도 실패 시

    Example:
        result = await retry_async(
            lambda: api_call(),
            max_retries=3,
            retry_on=(httpx.TimeoutException, httpx.HTTPStatusError),
        )
    """
    last_exception = None

    for attempt in range(max_retries + 1):
        try:
            return await func()
        except retry_on as e:
            last_exception = e

            if attempt == max_retries:
                break

            # 지수 백오프 계산
            delay = min(base_delay * (exponential_base ** attempt), max_delay)

            logger.warning(
                f"[retry_async] 시도 {attempt + 1}/{max_retries + 1} 실패, "
                f"{delay:.1f}초 후 재시도: {type(e).__name__}: {e}"
            )

            if on_retry:
                on_retry(attempt + 1, e)

            await asyncio.sleep(delay)

    raise RetryExhaustedError(
        f"최대 재시도 횟수({max_retries + 1}회) 초과",
        last_exception,
    )


def with_retry(
    max_retries: int = 3,
    base_delay: float = 1.0,
    max_delay: float = 30.0,
    retry_on: Tuple[Type[Exception], ...] = (Exception,),
):
    """
    재시도 데코레이터

    Args:
        max_retries: 최대 재시도 횟수
        base_delay: 첫 번째 재시도 대기 시간 (초)
        max_delay: 최대 대기 시간 (초)
        retry_on: 재시도할 예외 타입들

    Example:
        @with_retry(max_retries=3, retry_on=(httpx.TimeoutException,))
        async def call_api():
            ...
    """
    def decorator(func: Callable):
        @wraps(func)
        async def wrapper(*args, **kwargs):
            return await retry_async(
                lambda: func(*args, **kwargs),
                max_retries=max_retries,
                base_delay=base_delay,
                max_delay=max_delay,
                retry_on=retry_on,
            )
        return wrapper
    return decorator
```

---

### 3. Suno API 재시도 로직 적용

**파일:** `app/utils/suno.py`

**수정 대상 메서드:**
- `generate()` - 노래 생성 요청
- `get_task_status()` - 상태 조회
- `get_lyric_timestamp()` - 타임스탬프 조회

**수정 예시 (generate 메서드):**

```python
# 상단 import 추가
import httpx
from app.utils.retry import retry_async

# 재시도 대상 예외 정의
RETRY_EXCEPTIONS = (
    httpx.TimeoutException,
    httpx.ConnectError,
    httpx.ReadError,
)

async def generate(
    self,
    prompt: str,
    genre: str | None = None,
    callback_url: str | None = None,
) -> str:
    # ... 기존 payload 구성 코드 ...

    async def _call_api():
        async with httpx.AsyncClient() as client:
            response = await client.post(
                f"{self.BASE_URL}/generate",
                headers=self.headers,
                json=payload,
                timeout=30.0,
            )
            response.raise_for_status()
            return response.json()

    # 재시도 로직 적용
    data = await retry_async(
        _call_api,
        max_retries=3,
        base_delay=1.0,
        retry_on=RETRY_EXCEPTIONS,
    )

    # ... 기존 응답 처리 코드 ...
```

---

### 4. Creatomate API 재시도 로직 적용

**파일:** `app/utils/creatomate.py`

**수정 대상:**
- `_request()` 메서드 (모든 API 호출의 기반)

**수정 코드:**

```python
# 상단 import 추가
from app.utils.retry import retry_async

# 재시도 대상 예외 정의
RETRY_EXCEPTIONS = (
    httpx.TimeoutException,
    httpx.ConnectError,
    httpx.ReadError,
)

async def _request(
    self,
    method: str,
    url: str,
    timeout: float = 30.0,
    max_retries: int = 3,
    **kwargs,
) -> httpx.Response:
    """HTTP 요청을 수행합니다 (재시도 로직 포함)."""
    logger.info(f"[Creatomate] {method} {url}")

    async def _call():
        client = await get_shared_client()
        if method.upper() == "GET":
            response = await client.get(
                url, headers=self.headers, timeout=timeout, **kwargs
            )
        elif method.upper() == "POST":
            response = await client.post(
                url, headers=self.headers, timeout=timeout, **kwargs
            )
        else:
            raise ValueError(f"Unsupported HTTP method: {method}")
        response.raise_for_status()
        return response

    response = await retry_async(
        _call,
        max_retries=max_retries,
        base_delay=1.0,
        retry_on=RETRY_EXCEPTIONS,
    )

    logger.info(f"[Creatomate] Response - Status: {response.status_code}")
    return response
```

---

### 5. ChatGPT API 재시도 로직 적용

**파일:** `app/utils/chatgpt_prompt.py`

**수정 코드:**

```python
# 상단 import 추가
import httpx
from openai import APITimeoutError, APIConnectionError, RateLimitError
from app.utils.retry import retry_async

# 재시도 대상 예외 정의
RETRY_EXCEPTIONS = (
    APITimeoutError,
    APIConnectionError,
    RateLimitError,
)

class ChatgptService:
    DEFAULT_TIMEOUT = 60.0
    CONNECT_TIMEOUT = 10.0
    MAX_RETRIES = 3

    def __init__(self):
        self.client = AsyncOpenAI(
            api_key=apikey_settings.CHATGPT_API_KEY,
            timeout=httpx.Timeout(
                self.DEFAULT_TIMEOUT,
                connect=self.CONNECT_TIMEOUT,
            ),
        )

    async def _call_structured_output_with_response_gpt_api(
        self, prompt: str, output_format: dict, model: str
    ) -> dict:
        content = [{"type": "input_text", "text": prompt}]

        async def _call():
            response = await self.client.responses.create(
                model=model,
                input=[{"role": "user", "content": content}],
                text=output_format,
            )
            return json.loads(response.output_text) or {}

        return await retry_async(
            _call,
            max_retries=self.MAX_RETRIES,
            base_delay=2.0,  # OpenAI Rate Limit 대비 더 긴 대기
            retry_on=RETRY_EXCEPTIONS,
        )
```

---

## 타임아웃 설정 권장값

| API | 용도 | 권장 타임아웃 | 재시도 횟수 | 재시도 간격 |
|-----|------|---------------|-------------|-------------|
| ChatGPT | 가사 생성 | 60초 | 3회 | 2초 → 4초 → 8초 |
| Suno | 노래 생성 요청 | 30초 | 3회 | 1초 → 2초 → 4초 |
| Suno | 상태 조회 | 30초 | 2회 | 1초 → 2초 |
| Suno | 타임스탬프 | 120초 | 2회 | 2초 → 4초 |
| Creatomate | 템플릿 조회 | 30초 | 2회 | 1초 → 2초 |
| Creatomate | 렌더링 요청 | 60초 | 3회 | 1초 → 2초 → 4초 |
| Creatomate | 상태 조회 | 30초 | 2회 | 1초 → 2초 |

---

## 구현 순서

1. **1단계: retry.py 유틸리티 생성**
   - 재사용 가능한 재시도 로직 구현
   - 단위 테스트 작성

2. **2단계: ChatGPT 타임아웃 설정**
   - 가장 시급한 문제 (현재 600초 기본값)
   - 타임아웃 + 재시도 동시 적용

3. **3단계: Suno API 재시도 적용**
   - generate(), get_task_status(), get_lyric_timestamp()

4. **4단계: Creatomate API 재시도 적용**
   - _request() 메서드 수정으로 전체 적용

---

## 테스트 체크리스트

각 수정 후 확인 사항:

- [ ] 정상 요청 시 기존과 동일하게 동작
- [ ] 타임아웃 발생 시 지정된 시간 내 예외 발생
- [ ] 일시적 오류 시 재시도 후 성공
- [ ] 모든 재시도 실패 시 적절한 에러 메시지 반환
- [ ] 로그에 재시도 시도 기록 확인

---

## 롤백 계획

문제 발생 시:
1. retry.py 사용 코드 제거 (기존 직접 호출로 복구)
2. ChatGPT 타임아웃 설정 제거 (SDK 기본값으로 복구)

---

## 참고 사항

- OpenAI SDK는 내부적으로 일부 재시도 로직이 있으나, 커스텀 제어가 제한적
- httpx의 `TimeoutException`은 `ConnectTimeout`, `ReadTimeout`, `WriteTimeout`, `PoolTimeout`을 포함
- Rate Limit 에러(429)는 재시도 시 더 긴 대기 시간 필요 (Retry-After 헤더 참고)