Initial commit

2025-12-29 09:08:37 +09:00 · 2025-12-29 09:08:37 +09:00 · 94bbc309fd
commit 94bbc309fd
33 changed files with 10761 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,5 @@
 .venv/
 __pycache__/
 *.py[cod]
 .DS_Store
 .env
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,89 @@
 aiohappyeyeballs==2.6.1
 aiohttp==3.13.2
 aiosignal==1.4.0
 aiosqlite==0.21.0
 annotated-doc==0.0.4
 annotated-types==0.7.0
 anyio==4.11.0
 asyncpg==0.31.0
 attrs==25.4.0
 certifi==2025.11.12
 charset-normalizer==3.4.4
 click==8.3.1
 cloudpickle==3.1.2
 contourpy==1.3.3
 cycler==0.12.1
 distro==1.9.0
 Farama-Notifications==0.0.4
 fastapi==0.122.0
 fonttools==4.60.1
 frozenlist==1.8.0
 greenlet==3.2.4
 gunicorn==23.0.0
 gymnasium==1.2.2
 h11==0.16.0
 h5py==3.15.1
 httpcore==1.0.9
 httpx==0.28.1
 idna==3.11
 iniconfig==2.3.0
 JayDeBeApi==1.2.3
 Jinja2==3.1.6
 jiter==0.12.0
 jpype1==1.6.0
 jsonpatch==1.33
 jsonpointer==3.0.0
 kiwisolver==1.4.9
 langchain-core==1.1.0
 langchain-openai==1.1.0
 langgraph==1.0.4
 langgraph-checkpoint==3.0.1
 langgraph-checkpoint-sqlite==3.0.0
 langgraph-prebuilt==1.0.5
 langgraph-sdk==0.2.10
 langsmith==0.4.49
 MarkupSafe==3.0.3
 matplotlib==3.10.7
 multidict==6.7.0
 numpy==2.3.5
 openai==2.8.1
 orjson==3.11.4
 ormsgpack==1.12.0
 packaging==25.0
 passlib==1.7.4
 pillow==12.0.0
 pluggy==1.6.0
 propcache==0.4.1
 psycopg2-binary==2.9.11
 pydantic==2.12.5
 pydantic-settings==2.12.0
 pydantic_core==2.41.5
 Pygments==2.19.2
 PyJWT==2.10.1
 pyparsing==3.2.5
 pypdf==6.1.3
 pytest==9.0.1
 pytest-asyncio==1.3.0
 python-dateutil==2.9.0.post0
 python-dotenv==1.2.1
 python-multipart==0.0.20
 pytz==2025.2
 PyYAML==6.0.3
 regex==2025.11.3
 requests==2.32.5
 requests-toolbelt==1.0.0
 six==1.17.0
 sniffio==1.3.1
 SQLAlchemy==2.0.44
 sqlite-vec==0.1.6
 starlette==0.50.0
 tenacity==9.1.2
 tiktoken==0.12.0
 tqdm==4.67.1
 typing-inspection==0.4.2
 typing_extensions==4.15.0
 urllib3==2.5.0
 uvicorn==0.38.0
 xxhash==3.6.0
 yarl==1.22.0
 zstandard==0.25.0
--- a/setup_env.sh
+++ b/setup_env.sh
@ -0,0 +1,5 @@
 #!/bin/bash
 python3 -m venv .venv
 source .venv/bin/activate
 pip install -r requirements.txt
 echo "Setup complete. Activate with 'source .venv/bin/activate'"
--- a/src/negotiation_agent/Q_Table/CHANGELOG.md
+++ b/src/negotiation_agent/Q_Table/CHANGELOG.md
@ -0,0 +1,63 @@
 # 변경 사항 (Changelog)
 ## Version 3.0 - 비즈니스 용어 적용 (최종)
 ### 주요 변경 사항
 #### 1. State 변수명 최종 확정 (03_state_design.tex)
 **Version 2.0 (이전):**
 - 견적 금액 구간, 제조사 구분, 파트너사 구조, 가격 수용률 분위, 현재 가격 구간
 **Version 3.0 (최종):**
 - 매출액 가격구간, 유통 구조, 파트너사 종류, 가격 수용률 구간, 입력 금액 구간
 **변경 이유:**
 - 실제 비즈니스에서 사용하는 용어로 통일하여 가독성 및 이해도 향상
 **최종 State 변수명:**
 | 순번 | 변수                 | 구분 | 설명                                                       |
 | :--- | :------------------- | :--- | :--------------------------------------------------------- |
 | 1    | **매출액 가격구간**  | 3개  | Low (≤1,000만원), Mid (1,000~3,000만원), High (>3,000만원) |
 | 2    | **유통 구조**        | 3개  | 제조, 총판, 유통                                           |
 | 3    | **파트너사 종류**    | 3개  | Single (단독), Multiple (다수), None (없음)                |
 | 4    | **가격 수용률 구간** | 3개  | Low (<30%), Mid (30~90%), High (>90%)                      |
 | 5    | **입력 금액 구간**   | 2개  | PZ1 (A ≤ P ≤ T), PZ2 (P > T)                               |
 #### 2. 동적 가중치 (W) 설계 업데이트 (04_reward_function.tex)
 - 변수명 변경에 따라 수식 및 테이블의 변수명도 모두 업데이트
 - `S_manu` → `S_dist`
 - 계산 로직 및 가중치 값은 동일하게 유지
 ```
 W_raw = w1×S_amount + w2×S_dist + w3×S_partner + w4×S_accept + w5×S_pricezone
 ```
 **새로운 유통 구조 가중치 설계 의도:**
 | 유통 구조 | 정규화 값 | 설계 의도                                  |
 | :-------- | :-------- | :----------------------------------------- |
 | 제조      | 0.2       | 제조사 직공급으로 가격 협상 여지 가장 적음 |
 | 총판      | 0.5       | 중간 유통 단계로 협상 여지 중간            |
 | 유통      | 1.0       | 복잡한 유통 단계로 가격 협상 여지 가장 큼  |
 ### 파일 변경 내역
 - `sections/03_state_design.tex`: 변수명 변경
 - `sections/04_reward_function.tex`: 변수명 변경
 - `EXAMPLE_CALCULATION.md`: 변수명 변경
 - `CHANGELOG.md`: 이 파일 (Version 4.0 반영)
 ### 최종 검증
 - [x] State 변수명 5개 모두 비즈니스 용어로 변경 완료
 - [x] 총 State 수 162개 유지
 - [x] 동적 가중치 W 계산식 변수명 업데이트 완료
 - [x] 계산 예시 변수명 업데이트 완료
 이 버전은 기능적 변경 없이, 문서의 가독성과 현업 적용성을 높이는 데 중점을 둔 최종 버전입니다.
--- a/src/negotiation_agent/Q_Table/MIGRATION_V3.md
+++ b/src/negotiation_agent/Q_Table/MIGRATION_V3.md
@ -0,0 +1,173 @@
 # Q-Table Version 3.0 Migration Guide
 ## 개요
 Q-Table 프로젝트가 Version 2.0 (36 states)에서 Version 3.0 (162 states)으로 업그레이드되었습니다.
 ## 주요 변경사항
 ### 1. State 구조 변경
 #### Before (Version 2.0 - 36 states)
 ```python
 from negotiation_agent.Q_Table.domain.model.state import (
    State, Scenario, PriceZone, AcceptanceRate
 )
 state = State(
    scenario=Scenario.PRICE_FIRST,           # 4개 값
    price_zone=PriceZone.AT_OR_BELOW_ANCHOR, # 3개 값
    acceptance_rate=AcceptanceRate.MEDIUM     # 3개 값
 )
 # 총 4 × 3 × 3 = 36 states
 ```
 #### After (Version 3.0 - 162 states)
 ```python
 from negotiation_agent.Q_Table.domain.model.state import (
    State,
    RevenueRange,
    DistributionStructure,
    PartnerType,
    AcceptanceRate,
    InputPriceZone,
 )
 state = State(
    revenue_range=RevenueRange.MID,                    # 3개 값
    distribution=DistributionStructure.WHOLESALER,      # 3개 값
    partner_type=PartnerType.SINGLE,                    # 3개 값
    acceptance_rate=AcceptanceRate.MID,                 # 3개 값
    input_price_zone=InputPriceZone.PZ1,                # 2개 값
 )
 # 총 3 × 3 × 3 × 3 × 2 = 162 states
 ```
 ### 2. State Builder 변경
 #### Before (Version 2.0)
 ```python
 from negotiation_agent.Q_Table.domain.service.state_calculator import (
    NegotiationSnapshot, build_state
 )
 snapshot = NegotiationSnapshot(
    scenario_code="A",
    anchor_price=10000,
    target_price=12000,
    seller_initial_price=15000,
    current_price=11000,
 )
 state = build_state(snapshot)
 ```
 #### After (Version 3.0)
 ```python
 from negotiation_agent.Q_Table.domain.service.state_calculator import (
    NegotiationSnapshot, build_state
 )
 snapshot = NegotiationSnapshot(
    revenue_amount=2000,          # 매출액 (만원)
    distribution_code="W",        # "M": 제조, "W": 총판, "R": 유통
    partner_count=1,              # 파트너사 수
    anchor_price=10000,
    target_price=12000,
    input_price=11000,
    acceptance_ratio=0.5,         # 0~1 사이 값
 )
 state = build_state(snapshot)
 ```
 ### 3. Reward 계산 변경
 #### Before (Version 2.0)
 ```python
 from negotiation_agent.Q_Table.domain.service.reward_calculator import (
    calculate_reward, NegotiationOutcome
 )
 breakdown = calculate_reward(
    scenario=state.scenario,
    price_zone=state.price_zone,
    current_price=11000,
    anchor_price=10000,
    target_price=12000,
    round_number=3,
    outcome=NegotiationOutcome.ONGOING,
 )
 ```
 #### After (Version 3.0)
 ```python
 from negotiation_agent.Q_Table.domain.service.reward_calculator import (
    calculate_reward, NegotiationOutcome
 )
 breakdown = calculate_reward(
    revenue_range=state.revenue_range,
    distribution=state.distribution,
    partner_type=state.partner_type,
    acceptance_rate=state.acceptance_rate,
    input_price_zone=state.input_price_zone,
    current_price=11000,
    anchor_price=10000,
    target_price=12000,
    round_number=3,
    outcome=NegotiationOutcome.ONGOING,
 )
 ```
 ### 4. Q-Table 크기 변경
 #### Before
 ```python
 q_table = QTable(state_space_size=36, action_space_size=21)
 visit_table = VisitTable(state_space_size=36, action_space_size=21)
 ```
 #### After
 ```python
 q_table = QTable(state_space_size=162, action_space_size=21)
 visit_table = VisitTable(state_space_size=162, action_space_size=21)
 ```
 ## 마이그레이션 체크리스트
 - [ ] State 생성 코드를 새로운 5-변수 구조로 변경
 - [ ] NegotiationSnapshot 생성 코드 업데이트
 - [ ] calculate_reward() 호출부 매개변수 변경
 - [ ] Q-Table, VisitTable 초기화 시 state_space_size를 162로 변경
 - [ ] 기존 학습된 모델 파일(.npy) 재학습 필요 (36→162 차원 불일치)
 - [ ] 단위 테스트 업데이트
 ## 호환성 주의사항
 ⚠️ **기존 학습 모델 사용 불가**
 - Version 2.0에서 학습된 Q-Table (36 × N)은 Version 3.0 (162 × N)과 호환되지 않습니다.
 - 기존 모델을 사용하려면 재학습이 필요합니다.
 ⚠️ **Experience 데이터 재수집 권장**
 - 기존 Experience에는 새로운 State 변수 정보가 없습니다.
 - 새로운 State 정의에 맞춰 Experience를 재수집하는 것을 권장합니다.
 ## 변경 이유
 1. **비즈니스 용어 적용**: 실제 협상 도메인에서 사용하는 용어로 통일
 2. **더 세밀한 상태 표현**: 매출액, 유통 구조, 파트너사 정보 등을 명시적으로 포함
 3. **확장성 향상**: 새로운 비즈니스 변수 추가 시 유연한 대응 가능
 4. **협상 전략 다양화**: 162개의 상태로 더 정교한 협상 전략 학습 가능
 ## 문의
 변경사항에 대한 문의는 팀 리드에게 연락해주세요.
--- a/src/negotiation_agent/Q_Table/REFACTORING_SUMMARY.md
+++ b/src/negotiation_agent/Q_Table/REFACTORING_SUMMARY.md
@ -0,0 +1,364 @@
 # Q_Table 리팩토링 완료 요약
 ## 📅 작업 일자
 - 2025-10-29: Version 2.0 (36 states)
 - 2025-11-10: **Version 3.0 (162 states)** - 비즈니스 용어 적용 및 State 재설계
 ## ✅ Version 3.0 주요 변경사항
 ### State 공간 재설계 (36 → 162 states)
 **기존 Version 2.0:**
 - State = (Scenario, PriceZone, AcceptanceRate)
 - 총 상태 수: 4 × 3 × 3 = **36 states**
 **신규 Version 3.0:**
 - State = (매출액 가격구간, 유통 구조, 파트너사 종류, 가격 수용률 구간, 입력 금액 구간)
 - 총 상태 수: 3 × 3 × 3 × 3 × 2 = **162 states**
 | 순번 | 변수                 | 구분 | 설명                                                       |
 | :--- | :------------------- | :--- | :--------------------------------------------------------- |
 | 1    | **매출액 가격구간**  | 3개  | Low (≤1,000만원), Mid (1,000~3,000만원), High (>3,000만원) |
 | 2    | **유통 구조**        | 3개  | 제조, 총판, 유통                                           |
 | 3    | **파트너사 종류**    | 3개  | Single (단독), Multiple (다수), None (없음)                |
 | 4    | **가격 수용률 구간** | 3개  | Low (<30%), Mid (30~90%), High (>90%)                      |
 | 5    | **입력 금액 구간**   | 2개  | PZ1 (A ≤ P ≤ T), PZ2 (P > T)                               |
 ### 동적 가중치 (W) 설계 업데이트
 **기존 Version 2.0:**
 ```
 W = (scenario.weight + price_zone.weight) / 2.0
 ```
 **신규 Version 3.0:**
 ```
 W_raw = w1×S_amount + w2×S_dist + w3×S_partner + w4×S_accept + w5×S_pricezone
 W = clip(W_raw, 0.2, 0.8)
 ```
 기본 가중치 계수:
 - w1 = 0.20 (매출액 가격구간)
 - w2 = 0.25 (유통 구조)
 - w3 = 0.20 (파트너사 종류)
 - w4 = 0.25 (가격 수용률)
 - w5 = 0.10 (입력 금액 구간)
 ## ✅ 완료된 작업 (Version 2.0)
 ### 1. 새로운 핵심 컴포넌트 구현
 #### **QTable** (`domain/model/q_table.py`)
 - ✅ 동적 `action_space_size` 지원 (21개 카드)
 - ✅ Q-Learning 업데이트 메서드
 - ✅ 직렬화/역직렬화 지원
 - ✅ 인덱스 유효성 검증
 #### **EpisodePolicy** (`domain/agents/policy.py`)
 - ✅ 하드코딩된 크기(9) → 동적 크기로 변경
 - ✅ 생성자에서 `action_space_size` 주입
 - ✅ `get_action_mask()` 동적 크기 적용
 #### **State** (`domain/model/state.py`)
 - ✅ `to_index()` 메서드 추가 (State → 1D 인덱스)
 - ✅ `from_index()` 메서드 추가 (1D 인덱스 → State)
 #### **ActionCardMapper** (`integration/action_card_mapper.py`) 🆕
 - ✅ action_id ↔ card_id 양방향 매핑
 - ✅ JSON 파일 기반 설정
 - ✅ 유틸리티 메서드 제공
 #### **action_card_mapping.json** (`integration/data/`) 🆕
 - ✅ 21개 카드 매핑 (0-20 → "no_0" ~ "no_20")
 #### **GetBestActionUsecase** (`usecase/get_best_action_usecase.py`)
 - ✅ State → action_id → card_id 전체 플로우
 - ✅ Policy 사용 여부 선택 가능
 - ✅ Top-K 추천 기능
 - ✅ 사용 가능한 액션/카드 조회
 #### **CollectExperienceUsecase** (`usecase/collect_experience_usecase.py`) 🆕
 - ✅ 협상 중 Experience 수집
 - ✅ 에피소드 단위 관리
 - ✅ JSONL 형식 자동 저장
 - ✅ 수집 정보 조회
 #### **TrainOfflineUsecase** (`usecase/train_offline_usecase.py`) 🆕
 - ✅ 저장된 Experience로 Q-Table 학습
 - ✅ 배치 학습 및 에포크 반복
 - ✅ 특정 에피소드 선택 학습
 - ✅ 성능 평가 기능
 ### 2. 레거시 코드 정리
 #### 제거된 파일
 ```
 ❌ domain/action_space.py
 ❌ domain/model/action.py
 ❌ domain/constants.py
 ❌ domain/spaces.py
 ```
 #### Legacy로 이동된 파일 (참고용 보관)
 ```
 📦 legacy/environment.py
 📦 legacy/calculate_reward_usecase.py
 📦 legacy/evaluate_agent_usecase.py
 📦 legacy/execute_step_usecase.py
 📦 legacy/get_q_value_usecase.py
 📦 legacy/get_state_info_usecase.py
 📦 legacy/initialize_env_usecase.py
 📦 legacy/load_q_table_usecase.py
 📦 legacy/train_agent_usecase.py
 📦 legacy/update_q_table_usecase.py
 ```
 ## 📁 최종 디렉토리 구조
 ```
 negotiation_agent/
 │
 ├── card_management/                    # 카드 관리 (DB 기반)
 │   ├── domain/
 │   │   ├── model/
 │   │   │   └── nego_card.py           ✅ DB 엔티티
 │   │   ├── repository/
 │   │   │   ├── nego_card_repository.py
 │   │   │   └── nego_card_script_repository.py  ✅ JSON CRUD
 │   │   └── value/
 │   │       └── nego_card_types.py     ✅ 6가지 atomic elements
 │   └── data/
 │       └── nego_card_scripts.json     ✅ 21개 카드 스크립트
 │
 ├── Q_Table/                            # Q-Learning (추상 action_id)
 │   ├── domain/
 │   │   ├── model/
 │   │   │   ├── state.py               ✅ to_index() 추가
 │   │   │   ├── q_table.py             ✅ 새로 생성
 │   │   │   └── experience.py          ✅ 새로 생성
 │   │   ├── agents/
 │   │   │   ├── policy.py              ✅ 동적 크기 수정
 │   │   │   └── offline_agent.py
 │   │   ├── repository/
 │   │   │   ├── action_repository.py
 │   │   │   └── experience_repository.py  ✅ 새로 생성
 │   │   └── service/
 │   │       └── state_calculator.py
 │   ├── usecase/
 │   │   ├── get_best_action_usecase.py    ✅ 완전 재작성
 │   │   ├── collect_experience_usecase.py ✅ 새로 생성
 │   │   └── train_offline_usecase.py      ✅ 새로 생성
 │   ├── infra/
 │   │   ├── data_collector.py
 │   │   └── gym/
 │   │       └── env_wrapper.py
 │   ├── data/                           🆕 데이터 저장소
 │   │   └── experiences/
 │   │       └── *.jsonl
 │   └── legacy/                         📦 레거시 코드 보관
 │       ├── README.md
 │       └── ... (10개 파일)
 │
 └── integration/                        🆕 매핑 레이어
    ├── action_card_mapper.py          ✅
    └── data/
        └── action_card_mapping.json   ✅
 ```
 ## 🎯 핵심 설계 원칙
 ### 1. 명확한 책임 분리
 - **Q_Table**: action_id (0~20)만 다룸
 - **Card_Management**: card_id ("no_0"~"no_20")만 다룸
 - **ActionCardMapper**: 둘을 연결하는 단일 책임
 ### 2. 동적 확장성
 - 카드 추가/삭제 시 `action_card_mapping.json`만 수정
 - Q-Table과 Policy가 자동으로 새 크기에 대응
 ### 3. 유지보수성
 - 레거시 코드는 legacy/ 폴더에 보관
 - 새로운 코드와 명확히 분리
 ## 🔄 사용 예시
 ### 1. 추론 (Inference)
 ```python
 from negotiation_agent.Q_Table.domain.model.q_table import QTable
 from negotiation_agent.Q_Table.domain.model.visit_table import VisitTable
 from negotiation_agent.Q_Table.domain.agents.policy import UCBPolicy
 from negotiation_agent.integration.action_card_mapper import ActionCardMapper
 from negotiation_agent.Q_Table.usecase.get_best_action_usecase import GetBestActionUsecase
 # 매퍼 로드
 mapper = ActionCardMapper()
 action_space_size = mapper.get_action_space_size()  # 21
 # Q-Table 생성 (Version 3.0: 162 states)
 q_table = QTable(
    state_space_size=162,  # 3 x 3 x 3 x 3 x 2
    action_space_size=action_space_size  # 21
 )
 # Policy 생성
 visit_table = VisitTable(state_space_size=162, action_space_size=action_space_size)
 policy = UCBPolicy(visit_table=visit_table)
 # Usecase 생성
 usecase = GetBestActionUsecase(
    q_table=q_table,
    policy=policy,
    action_card_mapper=mapper
 )
 # 협상 추론 (Version 3.0)
 from negotiation_agent.Q_Table.domain.model.state import (
    State,
    RevenueRange,
    DistributionStructure,
    PartnerType,
    AcceptanceRate,
    InputPriceZone,
 )
 state = State(
    revenue_range=RevenueRange.MID,  # 1,000~3,000만원
    distribution=DistributionStructure.WHOLESALER,  # 총판
    partner_type=PartnerType.SINGLE,  # 단독 파트너
    acceptance_rate=AcceptanceRate.MID,  # 30~90%
    input_price_zone=InputPriceZone.PZ1,  # A≤P≤T
 )
 result = usecase.execute(state)
 print(result)
 # Output:
 # {
 #     'action_id': 5,
 #     'card_id': 'no_5',
 #     'q_value': 0.0,
 #     'state_index': 1
 # }
 ```
 ### 2. Experience 수집
 ```python
 from negotiation_agent.Q_Table.domain.repository.experience_repository import ExperienceRepository
 from negotiation_agent.Q_Table.usecase.collect_experience_usecase import CollectExperienceUsecase
 # Repository 생성
 exp_repo = ExperienceRepository()
 # Usecase 생성
 collect_usecase = CollectExperienceUsecase(exp_repo)
 # 에피소드 시작
 collect_usecase.start_episode("ep_001")
 # 협상 진행 중...
 for step in range(10):
    # 현재 상태
    current_state = State(...)
    # 액션 선택 (GetBestActionUsecase 사용)
    result = usecase.execute(current_state)
    action_id = result['action_id']
    # 액션 실행 후 보상과 다음 상태 받음
    reward = calculate_reward(...)  # 보상 계산
    next_state = get_next_state(...)  # 다음 상태
    done = check_done(...)  # 종료 여부
    # Experience 수집
    collect_usecase.collect(
        state=current_state,
        action_id=action_id,
        reward=reward,
        next_state=next_state,
        done=done
    )
    if done:
        break
 # 에피소드 종료
 collect_usecase.end_episode()
 # 수집 정보 확인
 info = collect_usecase.get_collection_info()
 print(info)
 # {
 #     'total_experiences': 10,
 #     'episodes': 1,
 #     'current_episode': None
 # }
 ```
 ### 3. 오프라인 학습
 ```python
 from negotiation_agent.Q_Table.usecase.train_offline_usecase import TrainOfflineUsecase
 # Usecase 생성
 train_usecase = TrainOfflineUsecase(
    q_table=q_table,
    experience_repository=exp_repo
 )
 # 학습 실행
 result = train_usecase.train(
    filename="experiences.jsonl",
    epochs=10,
    batch_size=32
 )
 print(result)
 # {
 #     'total_experiences': 1000,
 #     'epochs': 10,
 #     'updates': 10000,
 #     'avg_loss': 0.05
 # }
 # 성능 평가
 eval_result = train_usecase.evaluate(filename="experiences.jsonl")
 print(eval_result)
 # {
 #     'avg_q_value': 0.5,
 #     'avg_reward': 1.0,
 #     'total_samples': 1000
 # }
 ```
 ## 📋 향후 작업 (선택사항)
 - [x] Experience 수집 Usecase 구현 ✅
 - [x] 오프라인 학습 Usecase 구현 ✅
 - [ ] Q-Table Repository 구현 (영속성)
 - [ ] 카드 추가 시 Q-Table 자동 확장 기능
 - [ ] 통계적 Q-value 초기화 기능
 ## 📚 참고 문서
 - 설계 문서: `REFACTORING_GUIDE.md`
 - 레거시 코드: `legacy/README.md`
--- a/src/negotiation_agent/Q_Table/VERSION_3_SUMMARY.md
+++ b/src/negotiation_agent/Q_Table/VERSION_3_SUMMARY.md
@ -0,0 +1,279 @@
 # Q-Table Version 3.0 변경사항 정리
 ## 📋 변경 개요
 CHANGELOG.md에 기술된 Version 3.0 업데이트를 코드에 반영했습니다.
 - **State 공간**: 36 states → **162 states**
 - **변수 구조**: 3개 변수 → **5개 변수** (비즈니스 용어 적용)
 - **동적 가중치**: 2변수 평균 → **5변수 가중합**
 ---
 ## 🔄 주요 변경 파일
 ### 1. State 모델 (`domain/model/state.py`)
 #### 기존 (Version 2.0)
 ```python
 class Scenario(IntEnum): ...        # 4개 값
 class PriceZone(IntEnum): ...       # 3개 값  
 class AcceptanceRate(IntEnum): ...  # 3개 값
 State = (Scenario, PriceZone, AcceptanceRate)  # 4×3×3 = 36
 ```
 #### 변경 (Version 3.0)
 ```python
 class RevenueRange(IntEnum): ...           # 3개 값 (매출액 가격구간)
 class DistributionStructure(IntEnum): ...  # 3개 값 (유통 구조)
 class PartnerType(IntEnum): ...            # 3개 값 (파트너사 종류)
 class AcceptanceRate(IntEnum): ...         # 3개 값 (가격 수용률 구간)
 class InputPriceZone(IntEnum): ...         # 2개 값 (입력 금액 구간)
 State = (RevenueRange, DistributionStructure, PartnerType, 
         AcceptanceRate, InputPriceZone)  # 3×3×3×3×2 = 162
 ```
 **주요 특징:**
 - 모든 클래스에 `.weight` 속성 추가 (동적 가중치 계산용)
 - 비즈니스 용어로 명명 (매출액, 유통 구조, 파트너사 등)
 - 각 변수마다 `from_*()` 클래스 메서드로 분류 로직 제공
 ---
 ### 2. Reward Calculator (`domain/service/reward_calculator.py`)
 #### 기존 (Version 2.0)
 ```python
 def calculate_reward(
    scenario: Scenario,
    price_zone: PriceZone,
    ...
 )
 def _calculate_weight(scenario, price_zone, cfg):
    raw = (scenario.priority_weight + price_zone.zone_weight) / 2.0
    return clip(raw, 0.2, 0.8)
 ```
 #### 변경 (Version 3.0)
 ```python
 def calculate_reward(
    revenue_range: RevenueRange,
    distribution: DistributionStructure,
    partner_type: PartnerType,
    acceptance_rate: AcceptanceRate,
    input_price_zone: InputPriceZone,
    ...
 )
 def _calculate_weight(...):
    W_raw = (
        config.w1 * revenue_range.weight +
        config.w2 * distribution.weight +
        config.w3 * partner_type.weight +
        config.w4 * acceptance_rate.weight +
        config.w5 * input_price_zone.weight
    )
    return clip(W_raw, 0.2, 0.8)
 ```
 **기본 가중치 계수:**
 - w1 = 0.20 (매출액)
 - w2 = 0.25 (유통 구조)
 - w3 = 0.20 (파트너사)
 - w4 = 0.25 (수용률)
 - w5 = 0.10 (입력 금액)
 ---
 ### 3. State Calculator (`domain/service/state_calculator.py`)
 #### 기존 (Version 2.0)
 ```python
@dataclass
 class NegotiationSnapshot:
    scenario_code: str
    anchor_price: float
    target_price: float
    seller_initial_price: float
    current_price: float
 ```
 #### 변경 (Version 3.0)
 ```python
@dataclass
 class NegotiationSnapshot:
    revenue_amount: float          # 매출액 (만원)
    distribution_code: str         # "M", "W", "R"
    partner_count: int             # 파트너사 수
    anchor_price: float
    target_price: float
    input_price: float
    acceptance_ratio: Optional[float]
    initial_price: Optional[float]
 ```
 **변경 이유:**
 - 실제 비즈니스 데이터에 맞춰 필드 재구성
 - 5개 State 변수를 도출할 수 있는 정보 제공
 ---
 ### 4. 모듈 Export (`domain/model/__init__.py`)
 ```python
 # Version 3.0
 from .state import (
    AcceptanceRate,
    DistributionStructure,
    InputPriceZone,
    PartnerType,
    RevenueRange,
    State,
 )
 ```
 ---
 ## 📊 State 변수 상세 명세
 | 변수 | 클래스 | 값 | 설명 | 가중치 |
 |------|--------|---|------|--------|
 | 매출액 가격구간 | `RevenueRange` | LOW (≤1,000만원) | 낮은 매출액 | 0.3 |
 | | | MID (1,000~3,000만원) | 중간 매출액 | 0.6 |
 | | | HIGH (>3,000만원) | 높은 매출액 | 1.0 |
 | 유통 구조 | `DistributionStructure` | MANUFACTURER | 제조사 직공급 | 0.2 |
 | | | WHOLESALER | 총판 경유 | 0.5 |
 | | | RETAILER | 유통 경유 | 1.0 |
 | 파트너사 종류 | `PartnerType` | NONE | 파트너 없음 | 0.3 |
 | | | SINGLE | 단독 파트너 | 0.5 |
 | | | MULTIPLE | 다수 파트너 | 1.0 |
 | 가격 수용률 | `AcceptanceRate` | LOW (<30%) | 낮은 수용률 | 0.3 |
 | | | MID (30~90%) | 중간 수용률 | 0.6 |
 | | | HIGH (>90%) | 높은 수용률 | 1.0 |
 | 입력 금액 구간 | `InputPriceZone` | PZ1 (A≤P≤T) | 목표 범위 내 | 1.0 |
 | | | PZ2 (P>T or P<A) | 목표 범위 밖 | 0.3 |
 ---
 ## 🧪 새로운 테스트 파일
 ### `tests/Q_Table/test_state_v3.py`
 - State 변수 분류 로직 검증
 - State ↔ index 변환 정합성 테스트
 - 162개 state 전체 순회 검증
 - 가중치 값 검증
 ### `tests/Q_Table/test_reward_v3.py`
 - NegotiationSnapshot → State 변환 테스트
 - 동적 가중치 계산 검증
 - 성공/실패/진행중 보상 계산 검증
 - 최소/최대 가중치 경계 케이스 테스트
 ---
 ## 📚 문서 업데이트
 ### `REFACTORING_SUMMARY.md`
 - Version 3.0 변경사항 추가
 - 사용 예시 업데이트 (36 → 162 states)
 - State 구조 비교표 추가
 ### `MIGRATION_V3.md` (신규)
 - Version 2.0 → 3.0 마이그레이션 가이드
 - Before/After 코드 비교
 - 호환성 주의사항
 - 체크리스트 제공
 ### `CHANGELOG.md` (기존 파일 참조)
 - Version 3.0 변경사항 문서화
 - 비즈니스 용어 적용 배경 설명
 ---
 ## ⚠️ Breaking Changes
 ### 1. Q-Table 크기 변경
 ```python
 # Before
 QTable(state_space_size=36, action_space_size=21)
 # After
 QTable(state_space_size=162, action_space_size=21)
 ```
 ### 2. State 생성 방식 변경
 ```python
 # Before
 State(
    scenario=Scenario.PRICE_FIRST,
    price_zone=PriceZone.AT_OR_BELOW_ANCHOR,
    acceptance_rate=AcceptanceRate.MEDIUM,
 )
 # After
 State(
    revenue_range=RevenueRange.MID,
    distribution=DistributionStructure.WHOLESALER,
    partner_type=PartnerType.SINGLE,
    acceptance_rate=AcceptanceRate.MID,
    input_price_zone=InputPriceZone.PZ1,
 )
 ```
 ### 3. 기존 학습 모델 호환 불가
 - Version 2.0 모델 파일(.npy)은 36×N 크기
 - Version 3.0은 162×N 크기로 로드 불가
 - **재학습 필수**
 ---
 ## ✅ 검증 항목
 - [x] State 클래스 5개 변수로 재정의
 - [x] 각 변수의 `.weight` 속성 구현
 - [x] State.to_index() / from_index() 162 states 대응
 - [x] RewardConfig에 w1~w5 가중치 계수 추가
 - [x] _calculate_weight() 5변수 가중합으로 변경
 - [x] NegotiationSnapshot 필드 재구성
 - [x] build_state() 5변수 매핑 로직 구현
 - [x] __init__.py export 업데이트
 - [x] REFACTORING_SUMMARY.md 업데이트
 - [x] MIGRATION_V3.md 작성
 - [x] 단위 테스트 작성 (test_state_v3.py, test_reward_v3.py)
 ---
 ## 🚀 다음 단계
 1. **Python 환경 설정**
   ```bash
   poetry install
   # or
   pip install -r requirements.txt
   ```
 2. **테스트 실행**
   ```bash
   pytest tests/Q_Table/test_state_v3.py -v
   pytest tests/Q_Table/test_reward_v3.py -v
   ```
 3. **기존 코드 마이그레이션**
   - `MIGRATION_V3.md` 가이드 참조
   - State 생성 코드 일괄 변경
   - Q-Table 초기화 크기 변경
 4. **모델 재학습**
   - 새로운 162-state 공간으로 학습 데이터 재수집
   - Q-Table 재학습 실행
 ---
 ## 📞 문의
 Version 3.0 적용 중 문제 발생 시:
 1. `MIGRATION_V3.md` 참조
 2. 테스트 코드 참조 (`test_state_v3.py`, `test_reward_v3.py`)
 3. 팀 리드에게 문의
--- a/src/negotiation_agent/Q_Table/init.py
+++ b/src/negotiation_agent/Q_Table/init.py
@ -0,0 +1,3 @@
 """Q-Table package exports."""
 from . import domain, usecase  # noqa: F401
--- a/src/negotiation_agent/Q_Table/data/model/q_table.json
+++ b/src/negotiation_agent/Q_Table/data/model/q_table.json
--- a/src/negotiation_agent/Q_Table/data/model/q_table.xlsx
+++ b/src/negotiation_agent/Q_Table/data/model/q_table.xlsx
--- a/src/negotiation_agent/Q_Table/data/model/visit_table.json
+++ b/src/negotiation_agent/Q_Table/data/model/visit_table.json
--- a/src/negotiation_agent/Q_Table/domain/init.py
+++ b/src/negotiation_agent/Q_Table/domain/init.py
--- a/src/negotiation_agent/Q_Table/domain/agents/init.py
+++ b/src/negotiation_agent/Q_Table/domain/agents/init.py
--- a/src/negotiation_agent/Q_Table/domain/agents/offline_agent.py
+++ b/src/negotiation_agent/Q_Table/domain/agents/offline_agent.py
@ -0,0 +1,120 @@
 """Offline Q-learning agent backed by a UCB policy."""
 from __future__ import annotations
 import os
 from typing import Optional
 import numpy as np
 from ..model.visit_table import VisitTable
 from ..model.q_table import QTable
 from .policy import UCBPolicy
 class QLearningAgent:
    def __init__(
        self,
        agent_params,
        state_size: int,
        action_size: int,
        visit_table: Optional[VisitTable] = None,
    ) -> None:
        """
        Args:
            agent_params: 에이전트 파라미터 (learning_rate, discount_factor 등)
            state_size: 상태 공간 크기
            action_size: 액션 공간 크기
            visit_table: 방문 기록 테이블 (None이면 새로 생성)
        """
        self.state_size = state_size
        self.action_size = action_size
        # Q-Table 객체 생성 (Composition)
        self.q_table = QTable(
            state_space_size=state_size,
            action_space_size=action_size,
            learning_rate=agent_params["learning_rate"],
            discount_factor=agent_params["discount_factor"]
        )
        self.visit_table = visit_table or VisitTable(state_size, action_size)
        self.policy = UCBPolicy(
            visit_table=self.visit_table,
            exploration_constant=agent_params.get("exploration_constant", np.sqrt(2.0)),
        )
    def get_action(self, state: int, action_mask=None):
        """
        현재 상태에서 액션 선택 (UCB 정책 사용)
        Args:
            state: 현재 상태 인덱스
            action_mask: 가능한 액션 마스크 (Optional)
        Returns:
            선택된 액션 ID
        """
        # QTable 객체에서 Q-value 조회
        q_values = self.q_table.get_q_values_for_state(state)
        mask = None if action_mask is None else np.asarray(action_mask, dtype=bool)
        return self.policy.select_action(state, q_values, available_mask=mask)
    def learn(self, batch):
        """
        배치 데이터를 사용하여 Q-Table 업데이트
        Args:
            batch: 학습 데이터 배치 (observations, actions, rewards, next_observations, terminals)
        """
        for state, action, reward, next_state, terminated in zip(
            batch["observations"],
            batch["actions"],
            batch["rewards"],
            batch["next_observations"],
            batch["terminals"],
        ):
            # QTable 객체의 update 메서드 사용
            # terminated가 True이면 next_state는 의미가 없거나 None이어야 함
            # QTable.update는 next_state_index가 None이면 종료 상태로 처리
            next_s = next_state if not terminated else None
            self.q_table.update(
                state_index=state,
                action_id=action,
                reward=reward,
                next_state_index=next_s,
                done=terminated
            )
            self.visit_table.increment(state, action)
    def save_model(self, path):
        """
        Q-Table을 파일로 저장
        Args:
            path: 저장할 파일 경로
        """
        # QTable 내부의 numpy array를 저장 (기존 호환성 유지)
        np.save(path, self.q_table.q_values)
        print(f"Q-Table saved to {path}")
    def load_q_table(self, file_path):
        """
        파일에서 Q-Table 로드
        Args:
            file_path: 로드할 파일 경로
        """
        if os.path.exists(file_path):
            # QTable 객체의 q_values 속성에 직접 할당
            self.q_table.q_values = np.load(file_path)
            print(f"Q-Table loaded from {file_path}")
        else:
            print(f"Error: No Q-Table found at {file_path}")
    def reset_episode(self):
        """에피소드 초기화 (정책 상태 등 리셋)"""
        self.policy.reset_episode()
--- a/src/negotiation_agent/Q_Table/domain/agents/policy.py
+++ b/src/negotiation_agent/Q_Table/domain/agents/policy.py
@ -0,0 +1,123 @@
 from abc import ABC, abstractmethod
 from typing import Optional
 import numpy as np
 from ..model.visit_table import VisitTable
 class Policy(ABC):
    @abstractmethod
    def select_action(
        self,
        state_index: int,
        q_values: np.ndarray,
        available_mask: Optional[np.ndarray] = None,
    ) -> Optional[int]:
        """
        주어진 상태와 Q-value를 기반으로 액션 선택
        Args:
            state_index: 현재 상태 인덱스
            q_values: 해당 상태의 Q-value 배열
            available_mask: 선택 가능한 액션 마스크 (True=선택가능)
        Returns:
            선택된 액션 ID (선택 불가 시 None)
        """
        raise NotImplementedError
    def reset_episode(self) -> None:
        """에피소드 시작 시 상태 초기화"""
        raise NotImplementedError
    def get_action_mask(self) -> np.ndarray:
        """
        현재 정책 상태에 따른 액션 마스크 반환
        Returns:
            액션 마스크 배열 (1=선택가능, 0=선택불가)
        """
        raise NotImplementedError
 class UCBPolicy(Policy):
    """Upper Confidence Bound policy with per-episode action masking."""
    def __init__(
        self,
        visit_table: VisitTable,
        exploration_constant: float = np.sqrt(2.0),
        rng: Optional[np.random.Generator] = None,
    ) -> None:
        self.visit_table = visit_table
        self.action_space_size = visit_table.action_space_size
        self.exploration_constant = exploration_constant
        self._rng = rng or np.random.default_rng()
        self._episode_actions: set[int] = set()
    def select_action(
        self,
        state_index: int,
        q_values: np.ndarray,
        available_mask: Optional[np.ndarray] = None,
    ) -> Optional[int]:
        """
        UCB 알고리즘을 사용하여 액션 선택
        UCB = Q(s,a) + c * sqrt(ln(N(s)) / N(s,a))
        """
        mask = self._prepare_mask(available_mask)
        if not mask.any():
            self.reset_episode()
            mask = self._prepare_mask(available_mask)
            if not mask.any():
                return None
        counts = self.visit_table.get_state_counts(state_index)
        masked_counts = np.where(mask, counts, 0)
        zero_visit_candidates = np.where((counts == 0) & mask)[0]
        if zero_visit_candidates.size > 0:
            best = zero_visit_candidates[np.argmax(q_values[zero_visit_candidates])]
            action = int(best)
        else:
            total = masked_counts.sum()
            # Avoid division by zero while keeping exploration pressure.
            denom = counts.astype(float) + 1e-9
            bonus = self.exploration_constant * np.sqrt(np.log(total + 1.0) / denom)
            scores = q_values + bonus
            scores[~mask] = -np.inf
            action = int(np.argmax(scores))
        self.visit_table.increment(state_index, action)
        self._episode_actions.add(action)
        return action
    def reset_episode(self) -> None:
        """에피소드 내 사용된 액션 기록 초기화"""
        self._episode_actions.clear()
    def get_action_mask(self) -> np.ndarray:
        """
        이미 사용된 액션을 제외한 마스크 반환
        Returns:
            마스크 배열 (1=선택가능, 0=선택불가)
        """
        mask = np.ones(self.action_space_size, dtype=int)
        if self._episode_actions:
            for action in self._episode_actions:
                mask[action] = 0
        return mask
    def _prepare_mask(self, available_mask: Optional[np.ndarray]) -> np.ndarray:
        """입력 마스크와 이미 사용된 액션을 결합하여 최종 마스크 생성"""
        if available_mask is None:
            mask = np.ones(self.action_space_size, dtype=bool)
        else:
            mask = np.asarray(available_mask, dtype=bool).copy()
        if self._episode_actions:
            for action in self._episode_actions:
                mask[action] = False
        return mask
--- a/src/negotiation_agent/Q_Table/domain/model/init.py
+++ b/src/negotiation_agent/Q_Table/domain/model/init.py
@ -0,0 +1,11 @@
 """Domain model exports for Q-Table (Version 3.0 - 162 states)."""
 from .state import (  # noqa: F401
    AcceptanceRate,
    DistributionStructure,
    InputPriceZone,
    PartnerType,
    RevenueRange,
    State,
 )
 from .visit_table import VisitTable  # noqa: F401
--- a/src/negotiation_agent/Q_Table/domain/model/experience.py
+++ b/src/negotiation_agent/Q_Table/domain/model/experience.py
@ -0,0 +1,94 @@
 """
 Experience 모델
 Q-Learning을 위한 경험 데이터 (SARS: State, Action, Reward, next_State)
 """
 from dataclasses import dataclass
 from typing import Optional
 from .state import State
@dataclass
 class Experience:
    """
    Q-Learning에서 사용하는 경험 데이터
    SARS 형식:
    - State: 현재 상태
    - Action: 선택한 액션 (action_id)
    - Reward: 받은 보상
    - next_State: 다음 상태 (종료 시 None)
    """
    state: State
    action_id: int
    reward: float
    next_state: Optional[State]
    done: bool  # 에피소드 종료 여부
    # 메타데이터 (선택사항)
    episode_id: Optional[str] = None
    step: Optional[int] = None
    timestamp: Optional[str] = None
    def to_dict(self) -> dict:
        """
        Experience를 딕셔너리로 직렬화
        Returns:
            {
                'state': [scenario, price_zone, acceptance_rate],
                'action_id': 5,
                'reward': 1.0,
                'next_state': [scenario, price_zone, acceptance_rate] or None,
                'done': False,
                'episode_id': 'ep_001',
                'step': 3,
                'timestamp': '2025-10-29T16:30:00'
            }
        """
        return {
            "state": self.state.to_array(),
            "action_id": self.action_id,
            "reward": self.reward,
            "next_state": self.next_state.to_array() if self.next_state else None,
            "done": self.done,
            "episode_id": self.episode_id,
            "step": self.step,
            "timestamp": self.timestamp,
        }
    @classmethod
    def from_dict(cls, data: dict) -> "Experience":
        """
        딕셔너리에서 Experience 복원
        Args:
            data: 직렬화된 Experience 데이터
        Returns:
            Experience 인스턴스
        """
        return cls(
            state=State.from_array(data["state"]),
            action_id=data["action_id"],
            reward=data["reward"],
            next_state=(
                State.from_array(data["next_state"]) if data["next_state"] else None
            ),
            done=data["done"],
            episode_id=data.get("episode_id"),
            step=data.get("step"),
            timestamp=data.get("timestamp"),
        )
    def __repr__(self) -> str:
        return (
            f"Experience("
            f"state={self.state.to_array()}, "
            f"action_id={self.action_id}, "
            f"reward={self.reward}, "
            f"next_state={self.next_state.to_array() if self.next_state else None}, "
            f"done={self.done}"
            f")"
        )
--- a/src/negotiation_agent/Q_Table/domain/model/q_table.py
+++ b/src/negotiation_agent/Q_Table/domain/model/q_table.py
@ -0,0 +1,209 @@
 """
 Q-Table 모델
 동적 action_space_size를 지원하여 협상 카드 추가/삭제에 대응
 """
 import numpy as np
 from typing import Optional, Dict
 class QTable:
    """
    Q-Learning을 위한 Q-Table
    Q-Table은 추상적인 action_id만 다루며,
    실제 협상 카드(card_id)는 ActionCardMapper에서 매핑
    """
    def __init__(
        self,
        state_space_size: int,
        action_space_size: int,
        learning_rate: float = 0.1,
        discount_factor: float = 0.95,
    ):
        """
        Args:
            state_space_size: 상태 공간 크기
                             (Scenario=4 x PriceZone=3 x AcceptanceRate=3 = 36)
            action_space_size: 액션 공간 크기 (협상 카드 개수, 현재 21개)
            learning_rate: 학습률 (alpha)
            discount_factor: 할인 인자 (gamma)
        """
        self.state_space_size = state_space_size
        self.action_space_size = action_space_size
        self.learning_rate = learning_rate
        self.discount_factor = discount_factor
        # Q-Table 초기화: (state_size, action_size) 형태
        self.q_values = np.zeros((state_space_size, action_space_size))
    def get_q_value(self, state_index: int, action_id: int) -> float:
        """
        특정 (state, action)의 Q-value 조회
        Args:
            state_index: State의 인덱스 (0 ~ state_space_size-1)
            action_id: Action ID (0 ~ action_space_size-1)
        Returns:
            Q-value
        """
        self._validate_indices(state_index, action_id)
        return float(self.q_values[state_index, action_id])
    def get_q_values_for_state(self, state_index: int) -> np.ndarray:
        """
        특정 state의 모든 action에 대한 Q-values
        Args:
            state_index: State의 인덱스
        Returns:
            shape (action_space_size,)의 Q-values 배열
        """
        if state_index < 0 or state_index >= self.state_space_size:
            raise ValueError(
                f"state_index {state_index} out of range [0, {self.state_space_size})"
            )
        return self.q_values[state_index, :].copy()
    def set_q_value(self, state_index: int, action_id: int, value: float):
        """
        특정 (state, action)의 Q-value 직접 설정
        Args:
            state_index: State의 인덱스
            action_id: Action ID
            value: 설정할 Q-value
        """
        self._validate_indices(state_index, action_id)
        self.q_values[state_index, action_id] = value
    def update(
        self,
        state_index: int,
        action_id: int,
        reward: float,
        next_state_index: Optional[int] = None,
        done: bool = False,
    ):
        """
        Q-Learning 업데이트
        Q(s,a) ← Q(s,a) + α[r + γ·max_a'Q(s',a') - Q(s,a)]
        Args:
            state_index: 현재 상태 인덱스
            action_id: 선택한 액션 ID
            reward: 받은 보상
            next_state_index: 다음 상태 인덱스 (종료 시 None)
            done: 에피소드 종료 여부
        """
        self._validate_indices(state_index, action_id)
        current_q = self.q_values[state_index, action_id]
        if done or next_state_index is None:
            # 종료 상태: target = reward만 사용
            target = reward
        else:
            # 비종료 상태: target = reward + γ·max Q(s',a')
            if next_state_index < 0 or next_state_index >= self.state_space_size:
                raise ValueError(f"next_state_index {next_state_index} out of range")
            max_next_q = np.max(self.q_values[next_state_index, :])
            target = reward + self.discount_factor * max_next_q
        # Q-Learning 업데이트
        self.q_values[state_index, action_id] += self.learning_rate * (
            target - current_q
        )
    def get_best_action(self, state_index: int) -> int:
        """
        해당 state에서 최고 Q-value를 가진 action_id 반환
        Args:
            state_index: State의 인덱스
        Returns:
            action_id (Q-value가 최대인 액션)
        """
        if state_index < 0 or state_index >= self.state_space_size:
            raise ValueError(
                f"state_index {state_index} out of range [0, {self.state_space_size})"
            )
        return int(np.argmax(self.q_values[state_index, :]))
    def get_best_actions_with_ties(self, state_index: int) -> list:
        """
        동일한 최대 Q-value를 가진 모든 action_id 반환
        Args:
            state_index: State의 인덱스
        Returns:
            최대 Q-value를 가진 action_id들의 리스트
        """
        if state_index < 0 or state_index >= self.state_space_size:
            raise ValueError(
                f"state_index {state_index} out of range [0, {self.state_space_size})"
            )
        max_q = np.max(self.q_values[state_index, :])
        max_actions = np.where(self.q_values[state_index, :] == max_q)[0]
        return max_actions.tolist()
    def to_dict(self) -> Dict:
        """
        Q-Table을 딕셔너리로 직렬화
        Returns:
            직렬화된 Q-Table 데이터
        """
        return {
            "state_space_size": self.state_space_size,
            "action_space_size": self.action_space_size,
            "learning_rate": self.learning_rate,
            "discount_factor": self.discount_factor,
            "q_values": self.q_values.tolist(),
        }
    @classmethod
    def from_dict(cls, data: Dict) -> "QTable":
        """
        딕셔너리에서 Q-Table 복원
        Args:
            data: 직렬화된 Q-Table 데이터
        Returns:
            복원된 QTable 인스턴스
        """
        q_table = cls(
            state_space_size=data["state_space_size"],
            action_space_size=data["action_space_size"],
            learning_rate=data["learning_rate"],
            discount_factor=data["discount_factor"],
        )
        q_table.q_values = np.array(data["q_values"])
        return q_table
    def _validate_indices(self, state_index: int, action_id: int):
        """인덱스 유효성 검증"""
        if state_index < 0 or state_index >= self.state_space_size:
            raise ValueError(
                f"state_index {state_index} out of range [0, {self.state_space_size})"
            )
        if action_id < 0 or action_id >= self.action_space_size:
            raise ValueError(
                f"action_id {action_id} out of range [0, {self.action_space_size})"
            )
    def __repr__(self) -> str:
        return (
            f"QTable(state_space_size={self.state_space_size}, "
            f"action_space_size={self.action_space_size}, "
            f"learning_rate={self.learning_rate}, "
            f"discount_factor={self.discount_factor})"
        )
--- a/src/negotiation_agent/Q_Table/domain/model/state.py
+++ b/src/negotiation_agent/Q_Table/domain/model/state.py
@ -0,0 +1,258 @@
 from dataclasses import dataclass
 from enum import IntEnum, nonmember
 from typing import List
 class RevenueRange(IntEnum):
    """매출액 가격구간 (Revenue Price Range)"""
    LOW = 0  # ≤ 1,000만원
    MID = 1  # 1,000만원 ~ 3,000만원
    HIGH = 2  # > 3,000만원
    _DESCRIPTIONS = nonmember(("Low (≤1,000만원)", "Mid (1,000~3,000만원)", "High (>3,000만원)"))
    _WEIGHTS = nonmember((0.3, 0.6, 1.0))  # 매출액이 높을수록 협상 여지 큼
    @property
    def description(self) -> str:
        return self._DESCRIPTIONS[self.value]
    @property
    def weight(self) -> float:
        return self._WEIGHTS[self.value]
    @classmethod
    def from_amount(cls, amount: float) -> "RevenueRange":
        """매출액(만원 단위)으로부터 구간 결정"""
        if amount <= 1000:
            return cls.LOW
        elif amount <= 3000:
            return cls.MID
        else:
            return cls.HIGH
 class DistributionStructure(IntEnum):
    """유통 구조 (Distribution Structure)"""
    MANUFACTURER = 0  # 제조
    WHOLESALER = 1  # 총판
    RETAILER = 2  # 유통
    _DESCRIPTIONS = nonmember(("제조", "총판", "유통"))
    _WEIGHTS = nonmember((0.2, 0.5, 1.0))  # 유통 단계가 복잡할수록 협상 여지 큼
    @property
    def description(self) -> str:
        return self._DESCRIPTIONS[self.value]
    @property
    def weight(self) -> float:
        return self._WEIGHTS[self.value]
    @classmethod
    def from_code(cls, code: str) -> "DistributionStructure":
        """코드로부터 유통 구조 결정"""
        normalized = (code or "").strip().upper()
        mapping = {"M": cls.MANUFACTURER, "W": cls.WHOLESALER, "R": cls.RETAILER}
        if normalized not in mapping:
            raise ValueError(f"unknown distribution code: {code}")
        return mapping[normalized]
 class PartnerType(IntEnum):
    """파트너사 종류 (Partner Type)"""
    SINGLE = 0  # 단독
    MULTIPLE = 1  # 다수
    NONE = 2  # 없음
    _DESCRIPTIONS = nonmember(("Single (단독)", "Multiple (다수)", "None (없음)"))
    _WEIGHTS = nonmember((0.5, 1.0, 0.3))  # 다수 파트너사일수록 협상 복잡도 증가
    @property
    def description(self) -> str:
        return self._DESCRIPTIONS[self.value]
    @property
    def weight(self) -> float:
        return self._WEIGHTS[self.value]
    @classmethod
    def from_count(cls, count: int) -> "PartnerType":
        """파트너사 수로부터 구간 결정"""
        if count == 0:
            return cls.NONE
        elif count == 1:
            return cls.SINGLE
        else:
            return cls.MULTIPLE
 class AcceptanceRate(IntEnum):
    """가격 수용률 구간 (Price Acceptance Rate)"""
    LOW = 0  # < 30%
    MID = 1  # 30% ~ 90%
    HIGH = 2  # > 90%
    _DESCRIPTIONS = nonmember(("Low (<30%)", "Mid (30~90%)", "High (>90%)"))
    _WEIGHTS = nonmember((0.3, 0.6, 1.0))  # 수용률이 높을수록 협상 성공 가능성 높음
    @property
    def description(self) -> str:
        return self._DESCRIPTIONS[self.value]
    @property
    def weight(self) -> float:
        return self._WEIGHTS[self.value]
    @classmethod
    def from_ratio(cls, ratio: float) -> "AcceptanceRate":
        """수용률(0~1)로부터 구간 결정"""
        normalized = max(0.0, min(1.0, ratio))
        if normalized < 0.30:
            return cls.LOW
        elif normalized <= 0.90:
            return cls.MID
        else:
            return cls.HIGH
 class InputPriceZone(IntEnum):
    """입력 금액 구간 (Input Price Zone)"""
    PZ1 = 0  # A ≤ P ≤ T (앵커가격 ~ 목표가격)
    PZ2 = 1  # P > T (목표가격 초과)
    _DESCRIPTIONS = nonmember(("PZ1 (A≤P≤T)", "PZ2 (P>T)"))
    _WEIGHTS = nonmember((1.0, 0.3))  # 목표가격 이내일수록 협상 유리
    @property
    def description(self) -> str:
        return self._DESCRIPTIONS[self.value]
    @property
    def weight(self) -> float:
        return self._WEIGHTS[self.value]
    @classmethod
    def from_prices(
        cls,
        input_price: float,
        anchor_price: float,
        target_price: float,
    ) -> "InputPriceZone":
        """입력가격, 앵커가격, 목표가격으로부터 구간 결정"""
        if anchor_price <= 0 or target_price <= 0:
            raise ValueError("anchor_price and target_price must be positive")
        if anchor_price > target_price:
            raise ValueError("anchor_price must not exceed target_price")
        if anchor_price <= input_price <= target_price:
            return cls.PZ1
        else:
            return cls.PZ2
@dataclass
 class State:
    """
    Q-Learning State 표현 (Version 3.0 - 162 states)
    State = (매출액 가격구간, 유통 구조, 파트너사 종류, 가격 수용률 구간, 입력 금액 구간)
    Total: 3 × 3 × 3 × 3 × 2 = 162 states
    """
    revenue_range: RevenueRange
    distribution: DistributionStructure
    partner_type: PartnerType
    acceptance_rate: AcceptanceRate
    input_price_zone: InputPriceZone
    def to_array(self) -> List[int]:
        """State를 배열로 변환"""
        return [
            self.revenue_range.value,
            self.distribution.value,
            self.partner_type.value,
            self.acceptance_rate.value,
            self.input_price_zone.value,
        ]
    def to_index(self) -> int:
        """
        State를 1차원 인덱스로 변환 (Q-Table 인덱싱용)
        State space: 3 × 3 × 3 × 3 × 2 = 162
        Returns:
            0 ~ 161 사이의 인덱스
        """
        # 5D to 1D index conversion
        # index = revenue * (3*3*3*2) + dist * (3*3*2) + partner * (3*2) + accept * 2 + pricezone
        return (
            self.revenue_range.value * 54
            + self.distribution.value * 18
            + self.partner_type.value * 6
            + self.acceptance_rate.value * 2
            + self.input_price_zone.value
        )
    @classmethod
    def from_index(cls, index: int) -> "State":
        """
        1차원 인덱스에서 State 복원
        Args:
            index: 0 ~ 161 사이의 인덱스
        Returns:
            State 객체
        """
        if index < 0 or index >= 162:
            raise ValueError(f"index {index} out of range [0, 162)")
        # 1D to 5D index conversion
        revenue_value = index // 54
        remainder = index % 54
        dist_value = remainder // 18
        remainder = remainder % 18
        partner_value = remainder // 6
        remainder = remainder % 6
        accept_value = remainder // 2
        pricezone_value = remainder % 2
        return cls(
            revenue_range=RevenueRange(revenue_value),
            distribution=DistributionStructure(dist_value),
            partner_type=PartnerType(partner_value),
            acceptance_rate=AcceptanceRate(accept_value),
            input_price_zone=InputPriceZone(pricezone_value),
        )
    @classmethod
    def from_array(cls, arr: List[int]) -> "State":
        """배열로부터 State 생성"""
        if len(arr) != 5:
            raise ValueError(f"Expected 5 elements, got {len(arr)}")
        return cls(
            revenue_range=RevenueRange(arr[0]),
            distribution=DistributionStructure(arr[1]),
            partner_type=PartnerType(arr[2]),
            acceptance_rate=AcceptanceRate(arr[3]),
            input_price_zone=InputPriceZone(arr[4]),
        )
    def __str__(self) -> str:
        return (
            f"State("
            f"revenue={self.revenue_range.description}, "
            f"distribution={self.distribution.description}, "
            f"partner={self.partner_type.description}, "
            f"acceptance={self.acceptance_rate.description}, "
            f"price_zone={self.input_price_zone.description})"
        )
--- a/src/negotiation_agent/Q_Table/domain/model/visit_table.py
+++ b/src/negotiation_agent/Q_Table/domain/model/visit_table.py
@ -0,0 +1,148 @@
 """Visit (N) table for UCB-based policies."""
 from __future__ import annotations
 from dataclasses import dataclass
 from typing import Dict
 import numpy as np
@dataclass
 class VisitTable:
    """Tracks state-action visit counts for UCB selection."""
    state_space_size: int
    action_space_size: int
    def __post_init__(self) -> None:
        """
        Args:
            state_space_size: 상태 공간 크기
            action_space_size: 액션 공간 크기
        """
        self._counts = np.zeros(
            (self.state_space_size, self.action_space_size), dtype=np.int64
        )
    def increment(self, state_index: int, action_id: int, count: int = 1) -> None:
        """
        특정 (state, action)의 방문 횟수 증가
        Args:
            state_index: State 인덱스
            action_id: Action ID
            count: 증가시킬 횟수 (기본 1)
        """
        if state_index < 0 or state_index >= self.state_space_size:
            raise ValueError(
                f"state_index {state_index} out of range [0, {self.state_space_size})"
            )
        if action_id < 0 or action_id >= self.action_space_size:
            raise ValueError(
                f"action_id {action_id} out of range [0, {self.action_space_size})"
            )
        if count < 0:
            raise ValueError("count must be non-negative")
        self._counts[state_index, action_id] += count
    def get_action_count(self, state_index: int, action_id: int) -> int:
        """
        특정 (state, action)의 방문 횟수 조회
        Args:
            state_index: State 인덱스
            action_id: Action ID
        Returns:
            방문 횟수
        """
        self._validate_indices(state_index, action_id)
        return int(self._counts[state_index, action_id])
    def get_state_counts(self, state_index: int) -> np.ndarray:
        """
        특정 state의 모든 action에 대한 방문 횟수 조회
        Args:
            state_index: State 인덱스
        Returns:
            shape (action_space_size,)의 방문 횟수 배열
        """
        if state_index < 0 or state_index >= self.state_space_size:
            raise ValueError(
                f"state_index {state_index} out of range [0, {self.state_space_size})"
            )
        return self._counts[state_index, :].copy()
    def total_visits(self, state_index: int) -> int:
        """
        특정 state의 총 방문 횟수 (모든 action 합계)
        Args:
            state_index: State 인덱스
        Returns:
            총 방문 횟수
        """
        if state_index < 0 or state_index >= self.state_space_size:
            raise ValueError(
                f"state_index {state_index} out of range [0, {self.state_space_size})"
            )
        return int(self._counts[state_index, :].sum())
    def reset_state(self, state_index: int) -> None:
        """
        특정 state의 방문 기록 초기화
        Args:
            state_index: State 인덱스
        """
        if state_index < 0 or state_index >= self.state_space_size:
            raise ValueError(
                f"state_index {state_index} out of range [0, {self.state_space_size})"
            )
        self._counts[state_index, :] = 0
    def to_dict(self) -> Dict:
        """
        VisitTable을 딕셔너리로 직렬화
        Returns:
            직렬화된 VisitTable 데이터
        """
        return {
            "state_space_size": self.state_space_size,
            "action_space_size": self.action_space_size,
            "counts": self._counts.tolist(),
        }
    @classmethod
    def from_dict(cls, data: Dict) -> "VisitTable":
        """
        딕셔너리에서 VisitTable 복원
        Args:
            data: 직렬화된 VisitTable 데이터
        Returns:
            복원된 VisitTable 인스턴스
        """
        table = cls(
            state_space_size=data["state_space_size"],
            action_space_size=data["action_space_size"],
        )
        table._counts = np.array(data["counts"], dtype=np.int64)
        return table
    def _validate_indices(self, state_index: int, action_id: int) -> None:
        """인덱스 유효성 검증"""
        if state_index < 0 or state_index >= self.state_space_size:
            raise ValueError(
                f"state_index {state_index} out of range [0, {self.state_space_size})"
            )
        if action_id < 0 or action_id >= self.action_space_size:
            raise ValueError(
                f"action_id {action_id} out of range [0, {self.action_space_size})"
            )
--- a/src/negotiation_agent/Q_Table/domain/repository/experience_repository.py
+++ b/src/negotiation_agent/Q_Table/domain/repository/experience_repository.py
@ -0,0 +1,200 @@
 """
 Experience Repository
 Experience 데이터를 JSONL 형식으로 저장/로드
 """
 import json
 from pathlib import Path
 from typing import List, Optional
 from datetime import datetime
 from ..model.experience import Experience
 class ExperienceRepository:
    """
    Experience 데이터를 파일 시스템에 저장/로드하는 Repository
    JSONL (JSON Lines) 형식 사용:
    - 각 줄이 하나의 JSON 객체 (Experience)
    - 스트리밍 방식으로 읽기/쓰기 가능
    - 대용량 데이터 처리에 유리
    """
    def __init__(self, data_dir: Optional[str] = None):
        """
        Args:
            data_dir: Experience 데이터를 저장할 디렉토리
                     (기본: Q_Table/data/experiences/)
        """
        if data_dir is None:
            current_dir = Path(__file__).parent.parent.parent
            data_dir = current_dir / "data" / "experiences"
        self.data_dir = Path(data_dir)
        self.data_dir.mkdir(parents=True, exist_ok=True)
    def save(self, experience: Experience, filename: str = "experiences.jsonl"):
        """
        단일 Experience를 파일에 추가 저장
        Args:
            experience: 저장할 Experience
            filename: 파일명 (기본: experiences.jsonl)
        """
        file_path = self.data_dir / filename
        with open(file_path, "a", encoding="utf-8") as f:
            json.dump(experience.to_dict(), f, ensure_ascii=False)
            f.write("\n")
    def save_batch(
        self, experiences: List[Experience], filename: str = "experiences.jsonl"
    ):
        """
        여러 Experience를 배치로 저장
        Args:
            experiences: Experience 리스트
            filename: 파일명
        """
        file_path = self.data_dir / filename
        with open(file_path, "a", encoding="utf-8") as f:
            for exp in experiences:
                json.dump(exp.to_dict(), f, ensure_ascii=False)
                f.write("\n")
    def load_all(self, filename: str = "experiences.jsonl") -> List[Experience]:
        """
        파일에서 모든 Experience 로드
        Args:
            filename: 파일명
        Returns:
            Experience 리스트
        """
        file_path = self.data_dir / filename
        if not file_path.exists():
            return []
        experiences = []
        with open(file_path, "r", encoding="utf-8") as f:
            for line in f:
                if line.strip():
                    data = json.loads(line)
                    experiences.append(Experience.from_dict(data))
        return experiences
    def load_by_episode(
        self, episode_id: str, filename: str = "experiences.jsonl"
    ) -> List[Experience]:
        """
        특정 에피소드의 Experience들만 로드
        Args:
            episode_id: 에피소드 ID
            filename: 파일명
        Returns:
            해당 에피소드의 Experience 리스트
        """
        all_experiences = self.load_all(filename)
        return [exp for exp in all_experiences if exp.episode_id == episode_id]
    def count(self, filename: str = "experiences.jsonl") -> int:
        """
        저장된 Experience 개수
        Args:
            filename: 파일명
        Returns:
            Experience 개수
        """
        file_path = self.data_dir / filename
        if not file_path.exists():
            return 0
        with open(file_path, "r", encoding="utf-8") as f:
            return sum(1 for line in f if line.strip())
    def clear(self, filename: str = "experiences.jsonl"):
        """
        파일 내용 삭제
        Args:
            filename: 파일명
        """
        file_path = self.data_dir / filename
        if file_path.exists():
            file_path.unlink()
    def create_new_file(self, prefix: str = "exp") -> str:
        """
        타임스탬프가 포함된 새 파일 생성
        Args:
            prefix: 파일명 접두사
        Returns:
            생성된 파일명
        Example:
            >>> repo.create_new_file("train")
            'train_20251029_163000.jsonl'
        """
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        filename = f"{prefix}_{timestamp}.jsonl"
        file_path = self.data_dir / filename
        file_path.touch()
        return filename
    def list_files(self) -> List[str]:
        """
        데이터 디렉토리의 모든 JSONL 파일 목록
        Returns:
            파일명 리스트
        """
        return [f.name for f in self.data_dir.glob("*.jsonl")]
    def get_file_info(self, filename: str) -> dict:
        """
        파일 정보 조회
        Args:
            filename: 파일명
        Returns:
            {
                'filename': 'experiences.jsonl',
                'path': '/path/to/file',
                'size': 1024,  # bytes
                'count': 100,   # experience 개수
                'created': '2025-10-29 16:30:00'
            }
        """
        file_path = self.data_dir / filename
        if not file_path.exists():
            return None
        stat = file_path.stat()
        return {
            "filename": filename,
            "path": str(file_path),
            "size": stat.st_size,
            "count": self.count(filename),
            "created": datetime.fromtimestamp(stat.st_ctime).strftime(
                "%Y-%m-%d %H:%M:%S"
            ),
            "modified": datetime.fromtimestamp(stat.st_mtime).strftime(
                "%Y-%m-%d %H:%M:%S"
            ),
        }
--- a/src/negotiation_agent/Q_Table/domain/service/reward_calculator.py
+++ b/src/negotiation_agent/Q_Table/domain/service/reward_calculator.py
@ -0,0 +1,166 @@
 """Reward function aligned with the Q-Table design specification (Version 3.0)."""
 from __future__ import annotations
 from dataclasses import dataclass
 from enum import Enum
 from typing import Optional
 from ..model.state import (
    AcceptanceRate,
    DistributionStructure,
    InputPriceZone,
    PartnerType,
    RevenueRange,
 )
 class NegotiationOutcome(Enum):
    ONGOING = "ongoing"
    SUCCESS = "success"
    FAILURE = "failure"
@dataclass
 class RewardConfig:
    """보상 함수 설정"""
    beta: float = 0.2
    success_reward: float = 1.0
    ongoing_reward: float = 0.0
    failure_penalty: float = -0.5
    penalty_lambda: float = 0.02
    # 동적 가중치 계수 (Version 3.0)
    w1: float = 0.2  # 매출액 가격구간
    w2: float = 0.25  # 유통 구조
    w3: float = 0.2  # 파트너사 종류
    w4: float = 0.25  # 가격 수용률
    w5: float = 0.1  # 입력 금액 구간
    min_weight: float = 0.2
    max_weight: float = 0.8
@dataclass
 class RewardBreakdown:
    price_reward: float
    end_reward: float
    penalty: float
    weight: float
    total: float
 def calculate_reward(
    *,
    revenue_range: RevenueRange,
    distribution: DistributionStructure,
    partner_type: PartnerType,
    acceptance_rate: AcceptanceRate,
    input_price_zone: InputPriceZone,
    current_price: float,
    anchor_price: float,
    target_price: float,
    round_number: int,
    outcome: NegotiationOutcome,
    config: Optional[RewardConfig] = None,
 ) -> RewardBreakdown:
    """
    보상 계산 (Version 3.0)
    R = W × R_price + (1-W) × R_end - λ × t
    W = clip(W_raw, min_weight, max_weight)
    W_raw = w1×S_amount + w2×S_dist + w3×S_partner + w4×S_accept + w5×S_pricezone
    """
    cfg = config or RewardConfig()
    price_reward = _calculate_price_reward(
        current_price=current_price,
        anchor_price=anchor_price,
        target_price=target_price,
        beta=cfg.beta,
    )
    end_reward = _calculate_end_reward(outcome, cfg)
    weight = _calculate_weight(
        revenue_range=revenue_range,
        distribution=distribution,
        partner_type=partner_type,
        acceptance_rate=acceptance_rate,
        input_price_zone=input_price_zone,
        config=cfg,
    )
    penalty = _calculate_penalty(round_number, cfg)
    total = weight * price_reward + (1.0 - weight) * end_reward - penalty
    return RewardBreakdown(
        price_reward=price_reward,
        end_reward=end_reward,
        penalty=penalty,
        weight=weight,
        total=total,
    )
 def _calculate_price_reward(
    *,
    current_price: float,
    anchor_price: float,
    target_price: float,
    beta: float,
 ) -> float:
    if anchor_price <= 0 or target_price <= 0:
        raise ValueError("anchor_price and target_price must be positive")
    if anchor_price > target_price:
        raise ValueError("anchor_price must not exceed target_price")
    if current_price < anchor_price:
        improvement = (anchor_price - current_price) / anchor_price
        return 1.0 + beta * improvement
    if anchor_price <= current_price <= target_price:
        span = target_price - anchor_price
        if span <= 0:
            return 0.0
        return (target_price - current_price) / span
    return 0.0
 def _calculate_end_reward(outcome: NegotiationOutcome, cfg: RewardConfig) -> float:
    if outcome is NegotiationOutcome.SUCCESS:
        return cfg.success_reward
    if outcome is NegotiationOutcome.FAILURE:
        return cfg.failure_penalty
    return cfg.ongoing_reward
 def _calculate_weight(
    *,
    revenue_range: RevenueRange,
    distribution: DistributionStructure,
    partner_type: PartnerType,
    acceptance_rate: AcceptanceRate,
    input_price_zone: InputPriceZone,
    config: RewardConfig,
 ) -> float:
    """
    동적 가중치 계산 (Version 3.0)
    W_raw = w1×S_amount + w2×S_dist + w3×S_partner + w4×S_accept + w5×S_pricezone
    W = clip(W_raw, min_weight, max_weight)
    """
    w_raw = (
        config.w1 * revenue_range.weight
        + config.w2 * distribution.weight
        + config.w3 * partner_type.weight
        + config.w4 * acceptance_rate.weight
        + config.w5 * input_price_zone.weight
    )
    return max(config.min_weight, min(config.max_weight, w_raw))
 def _calculate_penalty(round_number: int, cfg: RewardConfig) -> float:
    if round_number < 0:
        raise ValueError("round_number must be non-negative")
    return cfg.penalty_lambda * float(round_number)
--- a/src/negotiation_agent/Q_Table/domain/service/state_calculator.py
+++ b/src/negotiation_agent/Q_Table/domain/service/state_calculator.py
@ -0,0 +1,87 @@
 """Utility helpers to build Q-Table states from negotiation snapshots (Version 3.0)."""
 from __future__ import annotations
 from dataclasses import dataclass
 from typing import Optional
 from ..model.state import (
    AcceptanceRate,
    DistributionStructure,
    InputPriceZone,
    PartnerType,
    RevenueRange,
    State,
 )
@dataclass(frozen=True)
 class NegotiationSnapshot:
    """협상 스냅샷 - 실제 비즈니스 데이터"""
    # 매출액 (만원 단위)
    revenue_amount: float
    # 유통 구조 코드 ("M": 제조, "W": 총판, "R": 유통)
    distribution_code: str
    # 파트너사 수
    partner_count: int
    # 가격 정보
    anchor_price: float
    target_price: float
    input_price: float
    # 가격 수용률 (0~1, 계산되거나 직접 입력)
    acceptance_ratio: Optional[float] = None
    # 초기 가격 (수용률 계산용, acceptance_ratio가 없을 때)
    initial_price: Optional[float] = None
 def build_state(snapshot: NegotiationSnapshot) -> State:
    """
    협상 스냅샷으로부터 Q-Learning State 생성 (Version 3.0)
    Args:
        snapshot: 협상 스냅샷
    Returns:
        162-dimensional state
    """
    # 1. 매출액 가격구간
    revenue_range = RevenueRange.from_amount(snapshot.revenue_amount)
    # 2. 유통 구조
    distribution = DistributionStructure.from_code(snapshot.distribution_code)
    # 3. 파트너사 종류
    partner_type = PartnerType.from_count(snapshot.partner_count)
    # 4. 가격 수용률 구간
    if snapshot.acceptance_ratio is not None:
        acceptance_rate = AcceptanceRate.from_ratio(snapshot.acceptance_ratio)
    elif snapshot.initial_price is not None:
        # 초기가격으로부터 수용률 계산: (initial - input) / initial
        if snapshot.initial_price <= 0:
            raise ValueError("initial_price must be positive")
        ratio = (snapshot.initial_price - snapshot.input_price) / snapshot.initial_price
        acceptance_rate = AcceptanceRate.from_ratio(ratio)
    else:
        raise ValueError("Either acceptance_ratio or initial_price must be provided")
    # 5. 입력 금액 구간
    input_price_zone = InputPriceZone.from_prices(
        input_price=snapshot.input_price,
        anchor_price=snapshot.anchor_price,
        target_price=snapshot.target_price,
    )
    return State(
        revenue_range=revenue_range,
        distribution=distribution,
        partner_type=partner_type,
        acceptance_rate=acceptance_rate,
        input_price_zone=input_price_zone,
    )
--- a/src/negotiation_agent/Q_Table/infra/repository/model_repository.py
+++ b/src/negotiation_agent/Q_Table/infra/repository/model_repository.py
@ -0,0 +1,46 @@
 import json
 from pathlib import Path
 from typing import Optional, Tuple
 from src.negotiation_agent.Q_Table.domain.model.q_table import QTable
 from src.negotiation_agent.Q_Table.domain.model.visit_table import VisitTable
 class ModelRepository:
    """Simple repository to save/load Q-Table and VisitTable models."""
    def __init__(self, model_dir: Optional[str] = None):
        if model_dir is None:
            # Current: src/negotiation_agent/Q_Table/infra/repository/model_repository.py
            # Target Data: src/negotiation_agent/Q_Table/data/model
            # Path: ../../../data/model
            current_dir = Path(__file__).parent
            model_dir = current_dir.parent.parent / "data" / "model"
        self.model_dir = Path(model_dir)
        self.model_dir.mkdir(parents=True, exist_ok=True)
        self.q_table_path = self.model_dir / "q_table.json"
        self.visit_table_path = self.model_dir / "visit_table.json"
    def save(self, q_table: QTable, visit_table: VisitTable):
        with open(self.q_table_path, "w", encoding="utf-8") as f:
            json.dump(q_table.to_dict(), f, indent=2)
        with open(self.visit_table_path, "w", encoding="utf-8") as f:
            json.dump(visit_table.to_dict(), f, indent=2)
        print(f"Models saved to {self.model_dir}")
    def load(self) -> Tuple[Optional[QTable], Optional[VisitTable]]:
        q_table = None
        visit_table = None
        if self.q_table_path.exists():
            with open(self.q_table_path, "r", encoding="utf-8") as f:
                q_table = QTable.from_dict(json.load(f))
        if self.visit_table_path.exists():
            with open(self.visit_table_path, "r", encoding="utf-8") as f:
                visit_table = VisitTable.from_dict(json.load(f))
        return q_table, visit_table
--- a/src/negotiation_agent/Q_Table/main.pdf
+++ b/src/negotiation_agent/Q_Table/main.pdf
--- a/src/negotiation_agent/Q_Table/usecase/init.py
+++ b/src/negotiation_agent/Q_Table/usecase/init.py
--- a/src/negotiation_agent/Q_Table/usecase/collect_experience_usecase.py
+++ b/src/negotiation_agent/Q_Table/usecase/collect_experience_usecase.py
@ -0,0 +1,143 @@
 """
 CollectExperienceUsecase
 협상 중 Experience 데이터를 수집하여 저장
 """
 from typing import Optional
 from datetime import datetime
 from ..domain.model.state import State
 from ..domain.model.experience import Experience
 from ..domain.repository.experience_repository import ExperienceRepository
 class CollectExperienceUsecase:
    """
    협상 과정에서 발생하는 Experience를 수집하여 저장
    오프라인 학습을 위한 데이터 수집 담당
    """
    def __init__(
        self,
        experience_repository: ExperienceRepository,
        filename: Optional[str] = None,
    ):
        """
        Args:
            experience_repository: Experience 저장소
            filename: 저장할 파일명 (None이면 기본 파일 사용)
        """
        self.experience_repository = experience_repository
        self.filename = filename or "experiences.jsonl"
        self.current_episode_id: Optional[str] = None
        self.current_step: int = 0
    def start_episode(self, episode_id: Optional[str] = None):
        """
        새로운 에피소드 시작
        Args:
            episode_id: 에피소드 ID (None이면 자동 생성)
        """
        if episode_id is None:
            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
            episode_id = f"ep_{timestamp}"
        self.current_episode_id = episode_id
        self.current_step = 0
    def collect(
        self,
        state: State,
        action_id: int,
        reward: float,
        next_state: Optional[State],
        done: bool,
    ) -> Experience:
        """
        단일 Experience 수집 및 저장
        Args:
            state: 현재 상태
            action_id: 선택한 액션 ID
            reward: 받은 보상
            next_state: 다음 상태 (종료 시 None)
            done: 에피소드 종료 여부
        Returns:
            수집된 Experience
        """
        if self.current_episode_id is None:
            self.start_episode()
        # Experience 생성
        experience = Experience(
            state=state,
            action_id=action_id,
            reward=reward,
            next_state=next_state,
            done=done,
            episode_id=self.current_episode_id,
            step=self.current_step,
            timestamp=datetime.now().isoformat(),
        )
        # 저장
        self.experience_repository.save(experience, self.filename)
        # 스텝 증가
        self.current_step += 1
        return experience
    def end_episode(self):
        """에피소드 종료"""
        self.current_episode_id = None
        self.current_step = 0
    def get_episode_count(self) -> int:
        """수집된 에피소드 개수 (근사치)"""
        # 파일에서 unique episode_id 개수 계산
        experiences = self.experience_repository.load_all(self.filename)
        episode_ids = set(exp.episode_id for exp in experiences if exp.episode_id)
        return len(episode_ids)
    def get_total_count(self) -> int:
        """수집된 총 Experience 개수"""
        return self.experience_repository.count(self.filename)
    def get_collection_info(self) -> dict:
        """
        수집 정보 조회
        Returns:
            {
                'filename': 'experiences.jsonl',
                'total_experiences': 1000,
                'episodes': 50,
                'current_episode': 'ep_001' or None,
                'current_step': 5
            }
        """
        return {
            "filename": self.filename,
            "total_experiences": self.get_total_count(),
            "episodes": self.get_episode_count(),
            "current_episode": self.current_episode_id,
            "current_step": self.current_step,
            "file_info": self.experience_repository.get_file_info(self.filename),
        }
    def create_new_collection_file(self, prefix: str = "exp") -> str:
        """
        새로운 수집 파일 생성 및 전환
        Args:
            prefix: 파일명 접두사
        Returns:
            생성된 파일명
        """
        new_filename = self.experience_repository.create_new_file(prefix)
        self.filename = new_filename
        return new_filename
--- a/src/negotiation_agent/Q_Table/usecase/evaluate_agent_usecase.py
+++ b/src/negotiation_agent/Q_Table/usecase/evaluate_agent_usecase.py
@ -0,0 +1,72 @@
 """
 EvaluateAgentUsecase
 학습된 Q-Table의 성능을 평가하는 Usecase
 """
 from typing import Optional
 from ..domain.model.q_table import QTable
 from ..domain.repository.experience_repository import ExperienceRepository
 class EvaluateAgentUsecase:
    """
    저장된 Experience 데이터를 사용하여 현재 에이전트(Q-Table)의 성능을 평가
    """
    def __init__(
        self,
        q_table: QTable,
        experience_repository: ExperienceRepository,
    ):
        """
        Args:
            q_table: 평가할 Q-Table
            experience_repository: Experience 저장소
        """
        self.q_table = q_table
        self.experience_repository = experience_repository
    def execute(
        self, filename: str = "experiences.jsonl", sample_size: Optional[int] = None
    ) -> dict:
        """
        현재 Q-Table의 성능 평가
        Args:
            filename: 평가용 Experience 파일
            sample_size: 샘플링 크기 (None이면 전체)
        Returns:
            {
                'avg_q_value': 0.5,
                'avg_reward': 1.0,
                'total_samples': 100
            }
        """
        experiences = self.experience_repository.load_all(filename)
        if not experiences:
            return {"avg_q_value": 0.0, "avg_reward": 0.0, "total_samples": 0}
        # 샘플링
        if sample_size and sample_size < len(experiences):
            import random
            experiences = random.sample(experiences, sample_size)
        total_q_value = 0.0
        total_reward = 0.0
        for exp in experiences:
            state_index = exp.state.to_index()
            q_value = self.q_table.get_q_value(state_index, exp.action_id)
            total_q_value += q_value
            total_reward += exp.reward
        n = len(experiences)
        return {
            "avg_q_value": total_q_value / n,
            "avg_reward": total_reward / n,
            "total_samples": n,
        }
--- a/src/negotiation_agent/Q_Table/usecase/get_best_action_usecase.py
+++ b/src/negotiation_agent/Q_Table/usecase/get_best_action_usecase.py
@ -0,0 +1,179 @@
 """
 GetBestActionUsecase
 State를 받아 최적의 협상 카드를 선택하는 Usecase
 """
 from typing import Dict, Optional
 from ..domain.model.state import State
 from ..domain.model.q_table import QTable
 from ..domain.agents.policy import UCBPolicy
 from ...integration.action_card_mapper import ActionCardMapper
 class GetBestActionUsecase:
    """
    협상 추론 Usecase: State → action_id → card_id
    Q_Table은 추상적인 action_id만 반환하고,
    ActionCardMapper를 통해 구체적인 card_id로 변환한다.
    """
    def __init__(
        self,
        q_table: QTable,
        policy: UCBPolicy,
        action_card_mapper: ActionCardMapper,
    ):
        """
        Args:
            q_table: 학습된 Q-Table
            policy: 액션 선택 정책 (UCB)
            action_card_mapper: action_id ↔ card_id 매핑
        """
        self.q_table = q_table
        self.policy = policy
        self.action_card_mapper = action_card_mapper
    def execute(self, state: State, use_policy: bool = True) -> Dict:
        """
        현재 상태에서 최적의 협상 카드 선택
        Args:
            state: 현재 협상 상태
            use_policy: True면 Policy 사용 (UCB + 중복 방지)
                       False면 순수 Q-value 최대값만 사용
        Returns:
            {
                'action_id': 5,
                'card_id': 'no_5',
                'q_value': 0.85,
                'state_index': 12,
                'all_q_values': [0.1, 0.2, ..., 0.85, ...]  # optional
            }
            사용 가능한 액션이 없으면:
            {
                'action_id': None,
                'card_id': None,
                'q_value': None,
                'message': 'No available actions'
            }
        """
        # 1. State → state_index 변환
        state_index = state.to_index()
        # 2. Q-values 조회
        q_values = self.q_table.get_q_values_for_state(state_index)
        # 3. Action 선택
        if use_policy:
            action_id = self.policy.select_action(state_index, q_values)
        else:
            # 순수 Q-value 최대값
            action_id = self.q_table.get_best_action(state_index)
        if action_id is None:
            return {
                "action_id": None,
                "card_id": None,
                "q_value": None,
                "state_index": state_index,
                "message": "No available actions (all actions used in this episode)",
            }
        # 4. action_id → card_id 매핑
        card_id = self.action_card_mapper.get_card_id(action_id)
        if card_id is None:
            return {
                "action_id": action_id,
                "card_id": None,
                "q_value": float(q_values[action_id]),
                "state_index": state_index,
                "message": f"action_id {action_id} has no card mapping",
            }
        # 5. Q-value 추출
        q_value = self.q_table.get_q_value(state_index, action_id)
        return {
            "action_id": action_id,
            "card_id": card_id,
            "q_value": float(q_value),
            "state_index": state_index,
            "all_q_values": q_values.tolist(),  # 디버깅용
        }
    def get_top_k_actions(self, state: State, k: int = 3) -> list:
        """
        상위 k개의 액션 추천
        Args:
            state: 현재 협상 상태
            k: 추천할 액션 개수
        Returns:
            [
                {'action_id': 5, 'card_id': 'no_5', 'q_value': 0.85},
                {'action_id': 3, 'card_id': 'no_3', 'q_value': 0.72},
                {'action_id': 1, 'card_id': 'no_1', 'q_value': 0.68}
            ]
        """
        state_index = state.to_index()
        q_values = self.q_table.get_q_values_for_state(state_index)
        # Q-value 내림차순 정렬
        sorted_actions = sorted(enumerate(q_values), key=lambda x: x[1], reverse=True)
        # 상위 k개 추출
        top_k = sorted_actions[:k]
        results = []
        for action_id, q_value in top_k:
            card_id = self.action_card_mapper.get_card_id(action_id)
            results.append(
                {"action_id": action_id, "card_id": card_id, "q_value": float(q_value)}
            )
        return results
    def reset_episode(self):
        """
        에피소드 종료 시 Policy 초기화
        (다음 협상 세션 시작 전 호출)
        """
        self.policy.reset_episode()
    def get_available_actions(self) -> list:
        """
        현재 에피소드에서 아직 사용하지 않은 액션들
        Returns:
            [0, 2, 4, 5, ...] (사용하지 않은 action_id들)
        """
        import numpy as np
        mask = self.policy.get_action_mask()
        available_action_ids = np.where(mask > 0)[0].tolist()
        return available_action_ids
    def get_available_cards(self) -> list:
        """
        현재 에피소드에서 아직 사용하지 않은 카드들
        Returns:
            [
                {'action_id': 0, 'card_id': 'no_0'},
                {'action_id': 2, 'card_id': 'no_2'},
                ...
            ]
        """
        available_action_ids = self.get_available_actions()
        results = []
        for action_id in available_action_ids:
            card_id = self.action_card_mapper.get_card_id(action_id)
            results.append({"action_id": action_id, "card_id": card_id})
        return results
--- a/src/negotiation_agent/Q_Table/usecase/train_offline_usecase.py
+++ b/src/negotiation_agent/Q_Table/usecase/train_offline_usecase.py
@ -0,0 +1,203 @@
 """
 TrainOfflineUsecase
 저장된 Experience 데이터로 Q-Table을 오프라인 학습
 """
 from typing import List, Optional
 from ..domain.model.q_table import QTable
 from ..domain.model.experience import Experience
 from ..domain.model.visit_table import VisitTable
 from ..domain.repository.experience_repository import ExperienceRepository
 class TrainOfflineUsecase:
    """
    수집된 Experience 데이터를 사용하여 Q-Table을 오프라인으로 학습
    배치 학습 방식:
    - 저장된 Experience를 로드
    - Q-Learning 알고리즘으로 Q-Table 업데이트
    - 여러 에포크 반복 가능
    """
    def __init__(
        self,
        q_table: QTable,
        experience_repository: ExperienceRepository,
        visit_table: Optional[VisitTable] = None,
    ):
        """
        Args:
            q_table: 학습할 Q-Table
            experience_repository: Experience 저장소
            visit_table: 방문 횟수를 함께 추적할 N-Table (선택)
        """
        self.q_table = q_table
        self.experience_repository = experience_repository
        self.visit_table = visit_table
    def train(
        self,
        filename: str = "experiences.jsonl",
        epochs: int = 1,
        batch_size: Optional[int] = None,
    ) -> dict:
        """
        저장된 Experience로 Q-Table 학습
        Args:
            filename: Experience 파일명
            epochs: 학습 반복 횟수
            batch_size: 배치 크기 (None이면 전체 데이터 사용)
        Returns:
            {
                'total_experiences': 1000,
                'epochs': 10,
                'updates': 10000,
                'avg_loss': 0.05
            }
        """
        # Experience 로드
        experiences = self.experience_repository.load_all(filename)
        if not experiences:
            return {
                "total_experiences": 0,
                "epochs": 0,
                "updates": 0,
                "avg_loss": 0.0,
                "message": "No experiences found",
            }
        total_updates = 0
        total_loss = 0.0
        # 에포크 반복
        for epoch in range(epochs):
            epoch_loss = 0.0
            # 배치 처리
            if batch_size:
                for i in range(0, len(experiences), batch_size):
                    batch = experiences[i : i + batch_size]
                    loss = self._train_batch(batch)
                    epoch_loss += loss
                    total_updates += len(batch)
            else:
                # 전체 데이터 한번에
                loss = self._train_batch(experiences)
                epoch_loss += loss
                total_updates += len(experiences)
            total_loss += epoch_loss
        avg_loss = total_loss / total_updates if total_updates > 0 else 0.0
        return {
            "total_experiences": len(experiences),
            "epochs": epochs,
            "updates": total_updates,
            "avg_loss": avg_loss,
        }
    def _train_batch(self, experiences: List[Experience]) -> float:
        """
        배치 Experience로 Q-Table 업데이트
        Args:
            experiences: Experience 리스트
        Returns:
            평균 손실값
        """
        total_loss = 0.0
        for exp in experiences:
            # State → state_index
            state_index = exp.state.to_index()
            # Q-value 업데이트 전 값
            old_q = self.q_table.get_q_value(state_index, exp.action_id)
            # Q-Table 업데이트
            if exp.next_state:
                next_state_index = exp.next_state.to_index()
                self.q_table.update(
                    state_index=state_index,
                    action_id=exp.action_id,
                    reward=exp.reward,
                    next_state_index=next_state_index,
                    done=exp.done,
                )
            else:
                # 종료 상태
                self.q_table.update(
                    state_index=state_index,
                    action_id=exp.action_id,
                    reward=exp.reward,
                    next_state_index=None,
                    done=True,
                )
            if self.visit_table is not None:
                self.visit_table.increment(state_index, exp.action_id)
            # Q-value 업데이트 후 값
            new_q = self.q_table.get_q_value(state_index, exp.action_id)
            # 손실 계산 (업데이트 크기)
            loss = abs(new_q - old_q)
            total_loss += loss
        return total_loss / len(experiences) if experiences else 0.0
    def train_by_episodes(
        self,
        episode_ids: List[str],
        filename: str = "experiences.jsonl",
        epochs: int = 1,
    ) -> dict:
        """
        특정 에피소드들만 선택하여 학습
        Args:
            episode_ids: 학습할 에피소드 ID 리스트
            filename: Experience 파일명
            epochs: 학습 반복 횟수
        Returns:
            학습 결과
        """
        # 해당 에피소드의 Experience만 로드
        all_experiences = self.experience_repository.load_all(filename)
        filtered_experiences = [
            exp for exp in all_experiences if exp.episode_id in episode_ids
        ]
        if not filtered_experiences:
            return {
                "total_experiences": 0,
                "episodes": 0,
                "epochs": 0,
                "updates": 0,
                "avg_loss": 0.0,
                "message": "No matching episodes found",
            }
        total_updates = 0
        total_loss = 0.0
        for epoch in range(epochs):
            loss = self._train_batch(filtered_experiences)
            total_loss += loss * len(filtered_experiences)
            total_updates += len(filtered_experiences)
        return {
            "total_experiences": len(filtered_experiences),
            "episodes": len(episode_ids),
            "epochs": epochs,
            "updates": total_updates,
            "avg_loss": total_loss / total_updates if total_updates > 0 else 0.0,
        }
--- a/src/negotiation_agent/integration/action_card_mapper.py
+++ b/src/negotiation_agent/integration/action_card_mapper.py
@ -0,0 +1,152 @@
 """
 Action-Card 매핑 레이어
 Q_Table의 action_id와 Card_Management의 card_id를 연결
 """
 import json
 from pathlib import Path
 from typing import Dict, Optional
 class ActionCardMapper:
    """
    action_id (0, 1, 2, ...) ↔ card_id ("no_0", "no_1", ...)
    Q_Table은 추상적인 action_id만 다루고,
    Card_Management는 구체적인 card_id만 다룬다.
    이 클래스가 둘을 연결하는 단일 책임을 가진다.
    매핑은 JSON 파일에서 로드되며, 카드 추가/삭제 시 수동으로 업데이트된다.
    """
    def __init__(self, mapping_file: Optional[str] = None):
        """
        Args:
            mapping_file: JSON 매핑 파일 경로
                         (기본: integration/data/action_card_mapping.json)
        """
        if mapping_file is None:
            current_dir = Path(__file__).parent
            mapping_file = current_dir / "data" / "action_card_mapping.json"
        self.mapping_file = Path(mapping_file)
        self.action_to_card: Dict[int, str] = {}
        self.card_to_action: Dict[str, int] = {}
        self._load_mapping()
    def _load_mapping(self):
        """JSON 파일에서 매핑 로드"""
        if not self.mapping_file.exists():
            raise FileNotFoundError(
                f"Mapping file not found: {self.mapping_file}\n"
                f"Please create action_card_mapping.json first."
            )
        with open(self.mapping_file, "r", encoding="utf-8") as f:
            data = json.load(f)
        # action_to_card: {"0": "no_0", "1": "no_1", ...}
        # JSON keys are strings, convert to int
        self.action_to_card = {int(k): v for k, v in data["action_to_card"].items()}
        # card_to_action: reverse mapping for quick lookup
        self.card_to_action = {v: int(k) for k, v in data["action_to_card"].items()}
    def get_card_id(self, action_id: int) -> Optional[str]:
        """
        action_id → card_id 변환
        Args:
            action_id: Q_Table에서 사용하는 액션 ID (0, 1, 2, ...)
        Returns:
            card_id: Card_Management에서 사용하는 카드 ID ("no_0", "no_1", ...)
                    매핑이 없으면 None
        Example:
            >>> mapper.get_card_id(5)
            'no_5'
        """
        return self.action_to_card.get(action_id)
    def get_action_id(self, card_id: str) -> Optional[int]:
        """
        card_id → action_id 변환
        Args:
            card_id: Card_Management에서 사용하는 카드 ID ("no_0", "no_1", ...)
        Returns:
            action_id: Q_Table에서 사용하는 액션 ID (0, 1, 2, ...)
                      매핑이 없으면 None
        Example:
            >>> mapper.get_action_id('no_5')
            5
        """
        return self.card_to_action.get(card_id)
    def get_action_space_size(self) -> int:
        """
        현재 매핑된 액션 개수 (= 카드 개수)
        Returns:
            액션 공간 크기
        Example:
            >>> mapper.get_action_space_size()
            21  # 21개의 카드
        """
        return len(self.action_to_card)
    def get_all_mappings(self) -> Dict[int, str]:
        """
        전체 매핑 반환
        Returns:
            {action_id: card_id} 딕셔너리
        Example:
            >>> mapper.get_all_mappings()
            {0: 'no_0', 1: 'no_1', ..., 20: 'no_20'}
        """
        return self.action_to_card.copy()
    def get_all_action_ids(self) -> list:
        """
        모든 action_id 리스트 반환 (정렬됨)
        Returns:
            [0, 1, 2, ..., 20]
        """
        return sorted(self.action_to_card.keys())
    def get_all_card_ids(self) -> list:
        """
        모든 card_id 리스트 반환 (action_id 순서)
        Returns:
            ['no_0', 'no_1', ..., 'no_20']
        """
        return [self.action_to_card[i] for i in sorted(self.action_to_card.keys())]
    def is_valid_action_id(self, action_id: int) -> bool:
        """action_id가 매핑에 존재하는지 확인"""
        return action_id in self.action_to_card
    def is_valid_card_id(self, card_id: str) -> bool:
        """card_id가 매핑에 존재하는지 확인"""
        return card_id in self.card_to_action
    def reload(self):
        """매핑 파일 다시 로드 (런타임 중 파일이 변경된 경우)"""
        self._load_mapping()
    def __repr__(self) -> str:
        return (
            f"ActionCardMapper("
            f"action_space_size={self.get_action_space_size()}, "
            f"mapping_file={self.mapping_file}"
            f")"
        )
--- a/src/negotiation_agent/integration/data/action_card_mapping.json
+++ b/src/negotiation_agent/integration/data/action_card_mapping.json
@ -0,0 +1,13 @@
 {
  "action_to_card": {
    "0": "no_1",
    "1": "no_2",
    "2": "no_3",
    "3": "no_5",
    "4": "no_6",
    "5": "no_8",
    "6": "no_11",
    "7": "no_13",
    "8": "no_14"
  }
 }
--- a/train.py
+++ b/train.py
@ -0,0 +1,90 @@
 import sys
 import os
 import argparse
 import numpy as np
 from pathlib import Path
 # Add project root to path to allow imports from src
 sys.path.append(os.path.dirname(os.path.abspath(__file__)))
 from src.negotiation_agent.Q_Table.domain.model.q_table import QTable
 from src.negotiation_agent.Q_Table.domain.model.visit_table import VisitTable
 from src.negotiation_agent.Q_Table.domain.repository.experience_repository import ExperienceRepository
 from src.negotiation_agent.Q_Table.infra.repository.model_repository import ModelRepository
 from src.negotiation_agent.Q_Table.usecase.train_offline_usecase import TrainOfflineUsecase
 from src.negotiation_agent.integration.action_card_mapper import ActionCardMapper
 def main():
    parser = argparse.ArgumentParser(description="Train Q-Table Agent")
    parser.add_argument("--epochs", type=int, default=10, help="Number of epochs")
    parser.add_argument("--batch-size", type=int, default=32, help="Batch size")
    parser.add_argument("--lr", type=float, default=0.1, help="Learning rate (only for new tables)")
    parser.add_argument("--gamma", type=float, default=0.9, help="Discount factor (only for new tables)")
    parser.add_argument("--data-file", type=str, default="experiences.jsonl", help="Experience file name inside data/experiences/")
    args = parser.parse_args()
    print("=== KTC V2 Agent Training ===")
    # 1. Config
    try:
        mapper = ActionCardMapper()
        ACTION_SIZE = mapper.get_action_space_size()
    except Exception as e:
        # Fallback if specific file error
        print(f"Warning: Could not load Action mapping ({e}). Defaulting to 21.")
        ACTION_SIZE = 21
    STATE_SIZE = 162
    print(f"Configuration: State Size={STATE_SIZE}, Action Size={ACTION_SIZE}")
    # 2. Repository & Models
    model_repo = ModelRepository()
    print("Loading models...")
    q_table, visit_table = model_repo.load()
    if q_table is None:
        print("[Info] No existing Q-Table found. Creating new one.")
        q_table = QTable(
            state_space_size=STATE_SIZE, 
            action_space_size=ACTION_SIZE, 
            learning_rate=args.lr, 
            discount_factor=args.gamma
        )
    else:
        print("[Info] Loaded existing Q-Table.")
    if visit_table is None:
        print("[Info] No existing VisitTable found. Creating new one.")
        visit_table = VisitTable(STATE_SIZE, ACTION_SIZE)
    else:
        print("[Info] Loaded existing VisitTable.")
    # 3. Data Repository
    exp_repo = ExperienceRepository()
    # Check if data file exists
    data_path = exp_repo.data_dir / args.data_file
    if not data_path.exists():
        print(f"[Warning] Experience file not found at: {data_path}")
        print("Please ensure the data file is synchronized from the main server.")
        return
    # 4. Usecase
    trainer = TrainOfflineUsecase(q_table, exp_repo, visit_table)
    # 5. Train
    print(f"Starting training for {args.epochs} epochs with batch size {args.batch_size}...")
    result = trainer.train(filename=args.data_file, epochs=args.epochs, batch_size=args.batch_size)
    print("\nTraining Result:")
    for k, v in result.items():
        print(f"  {k}: {v}")
    # 6. Save
    print("Saving models...")
    model_repo.save(q_table, visit_table)
    print("Done.")
 if __name__ == "__main__":
    main()
		`@ -0,0 +1,3 @@`
							`"""Q-Table package exports."""`

							`from . import domain, usecase # noqa: F401`