Commit Graph

7 Commits (742c0f1bcc8fd828886342d617b9d23c05c54ace)

Author SHA1 Message Date
Haewon Kam 2cda26a649 feat: per-URL clinic folder — auto-save all scraped data to Storage
Each analysis run now creates a dedicated folder in Supabase Storage:
  clinics/{domain}/{reportId}/
    ├── scrape_data.json    (discover-channels: website scrape + Perplexity)
    ├── channel_data.json   (collect-channel-data: all channel API results)
    └── report.json         (generate-report: final AI-generated report)

Screenshots also moved from {reportId}/{id}.png to:
  clinics/{domain}/{reportId}/screenshots/{id}.png

Migration: 20260407_clinic_data_storage.sql creates 'clinic-data' bucket
(private, 10MB/file, JSON only). All writes are non-fatal — pipeline
continues even if Storage upload fails.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 10:04:52 +09:00
Haewon Kam 36d2f1cf49 feat: archive Firecrawl screenshots to Supabase Storage (permanent URLs)
## 문제
Firecrawl이 반환하는 스크린샷 URL은 GCS Signed URL로 7일 후 만료.
리포트에 저장된 이미지 URL이 일주일 후 전부 깨짐 (403 Access Denied).

## 해결
collect-channel-data의 Vision 단계에 아카이빙 스텝 추가.
캡처 직후 base64(이미 메모리에 있음)를 Supabase Storage에 영구 업로드.

### 처리 흐름 (변경 후)
1. captureAllScreenshots() → GCS URL + base64 반환 (기존)
2. [신규] archiveTasks: base64 → Supabase Storage 업로드 (병렬)
   - 경로: screenshots/{reportId}/{screenshotId}.png
   - 성공 시 ss.url을 영구 Supabase URL로 in-place 교체
   - 실패 시 non-fatal — GCS URL fallback으로 Vision 분석 계속 진행
3. runVisionAnalysis() — base64 여전히 메모리에 있어 정상 실행 (기존)
4. channelData.screenshots 저장 시 영구 URL 사용 (자동)
   - archived: true/false 플래그 추가 (모니터링용)

### 비용/성능
- 추가 API 호출 없음 (base64 이미 캡처 시 다운로드됨)
- 업로드: ~1-3초/장 (병렬), 5MB limit, PNG/JPEG/WebP 허용
- 버킷: public (URL만 있으면 열람) + 서비스 역할만 업로드 가능

## 마이그레이션
supabase/migrations/20260407_screenshots_storage.sql
- screenshots 버킷 생성 (public, 5MB limit)
- RLS: public read / service_role write
- delete_old_screenshots() 함수: 90일 이상 된 파일 정리 (pg_cron 연동 가능)

## 타입
ScreenshotResult.archived?: boolean 필드 추가 (영구 vs GCS fallback 구분)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 09:51:31 +09:00
Haewon Kam d5f7f24e0a feat: clinic registry DB + pipeline audit P0 fixes
## Clinic Registry
- data/clinic-registry/clinic_registry_working.csv — 91개 병원 채널 마스터 DB
- data/clinic-registry/INFINITH_Outbound_List.csv — BD팀 아웃바운드 리스트 (17컬럼)
- data/clinic-registry/update_csv.py — 안전 CSV 업데이트 스크립트 (빈 필드만 채움)
- data/clinic-registry/extract_place_ids.py — 네이버 플레이스 ID 추출기
- scripts/import-registry.ts — CSV → Supabase clinic_registry 테이블 임포트
- supabase/migrations/20260406_clinic_registry.sql — clinic_registry 테이블 스키마

## Pipeline P0 Bug Fixes (전수 감사 후)
- fix(collect-channel-data): 강남언니 rating 0-10 스케일 오변환 제거
  - 기존: rating ≤ 5이면 ×2 → 4.8/10을 9.6/10으로 잘못 변환
  - 수정: Firecrawl 프롬프트가 이미 0-10 지시 → rawValue 직접 신뢰
- fix(generate-report): Perplexity 단일 fetch → fetchWithRetry 교체
  - maxRetries:2, backoffMs:[5000,15000], timeoutMs:90s
  - 기존: 타임아웃/429 시 리포트 생성 전체 실패
  - 수정: 자동 재시도로 일시적 API 오류 극복

## Docs
- docs/PIPELINE_IMPROVEMENT_PLAN.md — Sprint 0/1/2 완료 표시 + 전수 감사 결과 추가
- docs/REGISTRY_FUNCTIONAL_SPECS.md, DB_SCHEMA_V3.md 외 기획 문서 다수 추가

## New Components & Features
- supabase/functions/generate-content-plan, adjust-strategy — 콘텐츠 플랜/전략 조정
- src/components/plan/EditEntryModal, StrategyAdjustmentSection — 플랜 편집 UI
- supabase/functions/_shared/dataQuality, foundingYearExtractor, urlClassifier — 데이터 품질 유틸

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 09:33:25 +09:00
Haewon Kam 79950925a1 fix: add Authorization header to all Edge Function calls + fix Vision Analysis
- All fetch calls to Supabase Edge Functions now include
  Authorization: Bearer <anon_key> (was missing → 401 errors)
- Fix Firecrawl screenshot API: remove invalid screenshotOptions,
  use "screenshot@fullPage" format (v2 API compatibility)
- Fix screenshot response handling: v2 returns URL not base64,
  now downloads and converts to base64 for Gemini Vision
- Add about page to Vision Analysis capture targets
- Add retry utility, channel error tracking, pipeline resume,
  enrichment retry, EmptyState improvements (Sprint 2-3)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-05 10:08:03 +09:00
Haewon Kam e011ef7357 docs: DB Schema V3 — SaaS multi-tenant with time-series + performance loop
9 tables: clinics, analysis_runs, channel_snapshots, screenshots,
content_plans, channel_configs, performance_metrics, content_performance,
strategy_adjustments

2 views: channel_latest, channel_weekly_delta

Key features:
- Clinic-centric (1 hospital = 1 row, multiple analyses)
- Time-series channel metrics (INSERT-only snapshots)
- Performance → Strategy loop (weekly KPI tracking → auto-adjust plans)
- Content performance tracking (individual post metrics)
- 8-phase implementation checklist with verification criteria

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 00:45:34 +09:00
Haewon Kam 7557ef774c feat: Pipeline V2 — 3-phase analysis with verified channel discovery
Restructured the entire analysis pipeline from AI-guessing social
handles to deterministic 3-phase discovery + collection + generation.

Phase 1 (discover-channels): 3-source channel discovery
  - Firecrawl scrape: extract social links from HTML
  - Perplexity search: find handles via web search
  - URL regex parsing: deterministic link extraction
  - Handle verification: HEAD requests + YouTube API
  - DB: creates row with verified_channels + scrape_data

Phase 2 (collect-channel-data): 9 parallel data collectors
  - Instagram (Apify), YouTube (Data API v3), Facebook (Apify)
  - 강남언니 (Firecrawl), Naver Blog + Place (Naver API)
  - Google Maps (Apify), Market analysis (Perplexity 4x parallel)
  - DB: stores ALL raw data in channel_data column

Phase 3 (generate-report): AI report from real data
  - Reads channel_data + analysis_data from DB
  - Builds channel summary with real metrics
  - AI generates report using only verified data
  - V1 backwards compatibility preserved (url-based flow)

Supporting changes:
  - DB migration: status, verified_channels, channel_data columns
  - _shared/extractSocialLinks.ts: regex-based social link parser
  - _shared/verifyHandles.ts: multi-platform handle verifier
  - AnalysisLoadingPage: real 3-phase progress + channel panel
  - useReport: channel_data column support + V2 enrichment merge
  - 강남언니 rating: auto-correct 5→10 scale + search fallback
  - KPIDashboard: navigate() instead of <a href>
  - Loading text: 20-30초 → 1-2분

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 21:49:13 +09:00
Haewon Kam 60cd055042 feat: real API integration + YouTube Data API v3 + progressive loading
- Replace mock useReport() with real Supabase API data pipeline
- Add transformReport.ts to map API responses to MarketingReport type
- Add useEnrichment() hook for background channel data enrichment
- Replace Apify YouTube scraper with YouTube Data API v3
- Add mergeEnrichment() for progressive data loading
- Add EmptyState component for graceful empty data handling
- Add socialHandles to generate-report metadata
- Graceful empty data in ClinicSnapshot, YouTube, Instagram, Facebook
- Add Supabase Edge Functions and DB migrations
- Add developer handoff documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 10:57:14 +09:00