## P0 버그 수정 (즉시 영향)
### fix(collect-channel-data): 강남언니 rating 오변환 제거
- 기존: `rating ≤ 5 → ×2` 로직으로 4.8/10을 9.6/10으로 잘못 변환
- Firecrawl 프롬프트가 이미 0-10 반환 지시 → rawValue 직접 신뢰
### fix(generate-report): Perplexity 단일 fetch → fetchWithRetry
- maxRetries:2, backoffMs:[5000,15000], timeoutMs:90s 설정
- 기존: 일시적 429/타임아웃 시 리포트 생성 전체 실패
## P1 기능 추가 (데이터 품질)
### feat(collect-channel-data): channel_snapshots health_score 계산
- `computeHealthScore(channel, data)` 함수 추가 (채널별 0-100 스코어)
- Instagram: followers 기반 선형 보간 + posts bonus
- YouTube: subscribers 기반 + video count bonus
- 강남언니: rating×7 + reviews bonus (max 30pt)
- Google Maps: rating×12 + reviews bonus (max 40pt)
- Naver Blog: presence (50pt) + 언급 수 bonus (max 30pt)
- 모든 channel_snapshots INSERT에 health_score 포함
### feat(collect-channel-data): 네이버 블로그 공식 컨텐츠 스크랩 추가
- 기존: Naver Search API로 3rd-party 언급만 수집
- 추가: Registry에서 확인된 공식 블로그 URL을 Firecrawl로 직접 스크랩
- 총 게시글 수, 최근 게시물 (제목/날짜/요약), 카테고리 추출
- 실패 시 non-critical — 기존 Naver Search 결과는 항상 유지
## docs: PIPELINE_IMPROVEMENT_PLAN 감사 결과 반영
- Sprint 0 (Vision), Sprint 1, Sprint 2 완료 표시
- WP-10, WP-11 완료 표시
- 2026-04-07 전수 감사 섹션 추가 (구현 완료/수정/남은 Gap 표)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Previous chunked btoa approach encoded each chunk independently,
producing corrupted base64 that Gemini couldn't parse (returned {})
- Now builds complete binary string first, then encodes once with btoa
- Added screenshot debug info to channel errors for diagnostics
- Confirmed: foundingYear 2004, doctors, gangnamunni data all extracted
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- All fetch calls to Supabase Edge Functions now include
Authorization: Bearer <anon_key> (was missing → 401 errors)
- Fix Firecrawl screenshot API: remove invalid screenshotOptions,
use "screenshot@fullPage" format (v2 API compatibility)
- Fix screenshot response handling: v2 returns URL not base64,
now downloads and converts to base64 for Gemini Vision
- Add about page to Vision Analysis capture targets
- Add retry utility, channel error tracking, pipeline resume,
enrichment retry, EmptyState improvements (Sprint 2-3)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WP-1: YouTube channel ID regex {20,} → {22} (exactly 24 chars)
WP-2: Naver Place category filtering in enrich-channels (성형/피부)
WP-3: Google Maps stores mapsUrl separately from clinicWebsite
WP-4: Naver Blog separates officialBlogUrl from search results
WP-5: 강남언니 rawRating + normalized rating (≤5 → ×2), Firecrawl
prompt explicitly states "out of 10, NOT out of 5"
WP-6: Perplexity model centralized in _shared/config.ts (env override)
WP-7: Apify Instagram timeout 30s → 45s
Frontend: transformReport uses mapsUrl and officialBlogUrl when available
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Google Maps: was using gm.website (clinic's own site) → now always
generates maps.google.com/search URL
Naver Blog: was linking to first search result post (random personal
blog) → now links to Naver blog search results page
Naver Place: np.link was the clinic's own website, not Naver Place →
now generates map.naver.com search URL. Also fixed collect-channel-data
to search with "성형외과" suffix and match by category (성형/피부) to
avoid same-name dental clinics.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instagram HEAD requests often fail (rate limiting, blocking) causing
valid handles to be dropped. Now all discovered handles are kept
(verified or not) and Apify attempts collection on all of them.
Apify's own scraper validates existence more reliably than HEAD requests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
discover-channels: extractHandle('youtube') now detects UC* channel IDs
and returns them without @ prefix (previously @UC... caused verify fail)
verifyHandles: verifyYouTube uses cleanHandle for UC* check, requests
part=id,snippet for richer data
collect-channel-data: if channelId missing but handle present, resolves
via forHandle/forUsername lookup or direct UC* detection before skipping
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
discover-channels: new extractHandle() validates each handle belongs to
its platform (rejects hospital-internal URLs like /idtube/view being
treated as YouTube). Extracts handles from full URLs correctly.
collect-channel-data: explicit Record<string,unknown> typing for DB JSON
fields — fixes TypeScript property access on VerifiedChannels from DB.
verifyHandles: fix TikTok double-URL concatenation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restructured the entire analysis pipeline from AI-guessing social
handles to deterministic 3-phase discovery + collection + generation.
Phase 1 (discover-channels): 3-source channel discovery
- Firecrawl scrape: extract social links from HTML
- Perplexity search: find handles via web search
- URL regex parsing: deterministic link extraction
- Handle verification: HEAD requests + YouTube API
- DB: creates row with verified_channels + scrape_data
Phase 2 (collect-channel-data): 9 parallel data collectors
- Instagram (Apify), YouTube (Data API v3), Facebook (Apify)
- 강남언니 (Firecrawl), Naver Blog + Place (Naver API)
- Google Maps (Apify), Market analysis (Perplexity 4x parallel)
- DB: stores ALL raw data in channel_data column
Phase 3 (generate-report): AI report from real data
- Reads channel_data + analysis_data from DB
- Builds channel summary with real metrics
- AI generates report using only verified data
- V1 backwards compatibility preserved (url-based flow)
Supporting changes:
- DB migration: status, verified_channels, channel_data columns
- _shared/extractSocialLinks.ts: regex-based social link parser
- _shared/verifyHandles.ts: multi-platform handle verifier
- AnalysisLoadingPage: real 3-phase progress + channel panel
- useReport: channel_data column support + V2 enrichment merge
- 강남언니 rating: auto-correct 5→10 scale + search fallback
- KPIDashboard: navigate() instead of <a href>
- Loading text: 20-30초 → 1-2분
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>