- collect-channel-data: naverBlog 실시간 검색 제거 → verified handle 기반 RSS 직접 fetch
- collect-channel-data: naverPlace DB-first 패턴 (verified_channels에 저장된 데이터 우선 사용, 없을 때만 URL도메인 매칭 검색 후 DB에 저장)
- transformReport: ch.issues 배열 항목이 {issue, severity} 객체일 때 JSON.stringify 대신 .issue 문자열 추출
- ProblemDiagnosis: Lucide 아이콘 제거 → FilledIcons(ShieldFilled, FileTextFilled, LinkExternalFilled), 항목 구분자 ' ' → ' — '
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- collect-channel-data: gangnamUnni scraping no longer requires
verified=true. Fallback: Firecrawl search for gangnamunni.com URL
when discover-channels failed to verify. Solves empty ratings/reviews.
- generate-report: Perplexity prompt now explicitly requests leadDoctor
(name, specialty, rating, reviewCount) and staffCount in clinicInfo.
- transformReport: clinicInfo type extended with leadDoctor + staffCount;
transformation prefers clinic.leadDoctor over doctors[0] fallback.
Root cause: clinic_registry table not yet in DB → discover-channels
always falls back to API search → gangnamUnni URL not found →
collect-channel-data skips gangnamUnni → all clinic metrics empty.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each analysis run now creates a dedicated folder in Supabase Storage:
clinics/{domain}/{reportId}/
├── scrape_data.json (discover-channels: website scrape + Perplexity)
├── channel_data.json (collect-channel-data: all channel API results)
└── report.json (generate-report: final AI-generated report)
Screenshots also moved from {reportId}/{id}.png to:
clinics/{domain}/{reportId}/screenshots/{id}.png
Migration: 20260407_clinic_data_storage.sql creates 'clinic-data' bucket
(private, 10MB/file, JSON only). All writes are non-fatal — pipeline
continues even if Storage upload fails.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
## 문제
Firecrawl이 반환하는 스크린샷 URL은 GCS Signed URL로 7일 후 만료.
리포트에 저장된 이미지 URL이 일주일 후 전부 깨짐 (403 Access Denied).
## 해결
collect-channel-data의 Vision 단계에 아카이빙 스텝 추가.
캡처 직후 base64(이미 메모리에 있음)를 Supabase Storage에 영구 업로드.
### 처리 흐름 (변경 후)
1. captureAllScreenshots() → GCS URL + base64 반환 (기존)
2. [신규] archiveTasks: base64 → Supabase Storage 업로드 (병렬)
- 경로: screenshots/{reportId}/{screenshotId}.png
- 성공 시 ss.url을 영구 Supabase URL로 in-place 교체
- 실패 시 non-fatal — GCS URL fallback으로 Vision 분석 계속 진행
3. runVisionAnalysis() — base64 여전히 메모리에 있어 정상 실행 (기존)
4. channelData.screenshots 저장 시 영구 URL 사용 (자동)
- archived: true/false 플래그 추가 (모니터링용)
### 비용/성능
- 추가 API 호출 없음 (base64 이미 캡처 시 다운로드됨)
- 업로드: ~1-3초/장 (병렬), 5MB limit, PNG/JPEG/WebP 허용
- 버킷: public (URL만 있으면 열람) + 서비스 역할만 업로드 가능
## 마이그레이션
supabase/migrations/20260407_screenshots_storage.sql
- screenshots 버킷 생성 (public, 5MB limit)
- RLS: public read / service_role write
- delete_old_screenshots() 함수: 90일 이상 된 파일 정리 (pg_cron 연동 가능)
## 타입
ScreenshotResult.archived?: boolean 필드 추가 (영구 vs GCS fallback 구분)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
## P0 버그 수정 (즉시 영향)
### fix(collect-channel-data): 강남언니 rating 오변환 제거
- 기존: `rating ≤ 5 → ×2` 로직으로 4.8/10을 9.6/10으로 잘못 변환
- Firecrawl 프롬프트가 이미 0-10 반환 지시 → rawValue 직접 신뢰
### fix(generate-report): Perplexity 단일 fetch → fetchWithRetry
- maxRetries:2, backoffMs:[5000,15000], timeoutMs:90s 설정
- 기존: 일시적 429/타임아웃 시 리포트 생성 전체 실패
## P1 기능 추가 (데이터 품질)
### feat(collect-channel-data): channel_snapshots health_score 계산
- `computeHealthScore(channel, data)` 함수 추가 (채널별 0-100 스코어)
- Instagram: followers 기반 선형 보간 + posts bonus
- YouTube: subscribers 기반 + video count bonus
- 강남언니: rating×7 + reviews bonus (max 30pt)
- Google Maps: rating×12 + reviews bonus (max 40pt)
- Naver Blog: presence (50pt) + 언급 수 bonus (max 30pt)
- 모든 channel_snapshots INSERT에 health_score 포함
### feat(collect-channel-data): 네이버 블로그 공식 컨텐츠 스크랩 추가
- 기존: Naver Search API로 3rd-party 언급만 수집
- 추가: Registry에서 확인된 공식 블로그 URL을 Firecrawl로 직접 스크랩
- 총 게시글 수, 최근 게시물 (제목/날짜/요약), 카테고리 추출
- 실패 시 non-critical — 기존 Naver Search 결과는 항상 유지
## docs: PIPELINE_IMPROVEMENT_PLAN 감사 결과 반영
- Sprint 0 (Vision), Sprint 1, Sprint 2 완료 표시
- WP-10, WP-11 완료 표시
- 2026-04-07 전수 감사 섹션 추가 (구현 완료/수정/남은 Gap 표)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
YouTube now verifies all candidates and picks best match by channel title.
Facebook tries all candidates with domain-name fallback when Firecrawl returns empty.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Hero button: gray when empty, accent gradient when URL entered
- generate-report: force-inject gangnamUnniStats from Vision Analysis
into channelAnalysis (score, rating, reviews, doctors)
- Add gangnamunni Vision data to prompt context
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Previous chunked btoa approach encoded each chunk independently,
producing corrupted base64 that Gemini couldn't parse (returned {})
- Now builds complete binary string first, then encodes once with btoa
- Added screenshot debug info to channel errors for diagnostics
- Confirmed: foundingYear 2004, doctors, gangnamunni data all extracted
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- All fetch calls to Supabase Edge Functions now include
Authorization: Bearer <anon_key> (was missing → 401 errors)
- Fix Firecrawl screenshot API: remove invalid screenshotOptions,
use "screenshot@fullPage" format (v2 API compatibility)
- Fix screenshot response handling: v2 returns URL not base64,
now downloads and converts to base64 for Gemini Vision
- Add about page to Vision Analysis capture targets
- Add retry utility, channel error tracking, pipeline resume,
enrichment retry, EmptyState improvements (Sprint 2-3)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Added A4 parallel Firecrawl call with actions: [wait 3s, scrape]
to execute JavaScript and extract social button href URLs from
header/footer. This is the most reliable source — most Korean
clinics have Facebook/Instagram/YouTube/Blog icons in their nav.
Results merged as Source 3 (buttonHandles) alongside HTML links,
JSON extraction, and API searches.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
WP-1: YouTube channel ID regex {20,} → {22} (exactly 24 chars)
WP-2: Naver Place category filtering in enrich-channels (성형/피부)
WP-3: Google Maps stores mapsUrl separately from clinicWebsite
WP-4: Naver Blog separates officialBlogUrl from search results
WP-5: 강남언니 rawRating + normalized rating (≤5 → ×2), Firecrawl
prompt explicitly states "out of 10, NOT out of 5"
WP-6: Perplexity model centralized in _shared/config.ts (env override)
WP-7: Apify Instagram timeout 30s → 45s
Frontend: transformReport uses mapsUrl and officialBlogUrl when available
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Google Maps: was using gm.website (clinic's own site) → now always
generates maps.google.com/search URL
Naver Blog: was linking to first search result post (random personal
blog) → now links to Naver blog search results page
Naver Place: np.link was the clinic's own website, not Naver Place →
now generates map.naver.com search URL. Also fixed collect-channel-data
to search with "성형외과" suffix and match by category (성형/피부) to
avoid same-name dental clinics.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instagram HEAD requests often fail (rate limiting, blocking) causing
valid handles to be dropped. Now all discovered handles are kept
(verified or not) and Apify attempts collection on all of them.
Apify's own scraper validates existence more reliably than HEAD requests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split queries performed worse. The proven working pattern is:
- Single query with Korean+English clinic name
- "검색해서 찾아줘. 검색 결과에서 발견된 계정을 모두 알려줘" phrasing
- All channels in one request
- English name in parentheses helps Perplexity find international accounts
Tested: "그랜드성형외과 (Grand Plastic Surgery)" → finds Instagram,
YouTube, Facebook, TikTok, Naver Blog all in one call.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Single mega-query returns empty results. Split into:
B4a. Instagram + YouTube (most important, focused search)
B4b. Facebook + TikTok + Naver Blog + Kakao
B4c. 강남언니 + review platforms
Each query is short and focused — matches the proven pattern of
2-5 keyword searches that Perplexity handles well.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Long system prompt caused sonar-pro to return empty results.
Reverted to sonar model with short, proven prompt pattern that
matches the user's successful manual test.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaced simple "find handles" prompt with comprehensive research agent:
- Model: sonar → sonar-pro (advanced multi-step web search)
- System prompt: full research methodology with 2-3 keyword searches,
URL fetching, quantitative data extraction
- Output: structured JSON with channels (handles + follower counts +
subscriber counts) + platforms (강남언니 rating, reviews)
- Research results saved to scrape_data.onlinePresenceResearch for
downstream use in collect-channel-data and generate-report
Added _shared/researchPrompt.ts with prompt template + builder.
Updated agent documentation in doc/.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
B4 Perplexity: rewrote from narrow "find social accounts" to broad
"Online Presence 종합 분석" — finds Instagram, YouTube, Facebook,
TikTok, Naver, Kakao, 강남언니, 바비톡 in one query.
B5 Apify Instagram: generates handle candidates from clinic name
(english name, domain, _official, _ps, _clinic variants) and directly
checks each via Apify instagram-profile-scraper. Finds accounts that
web search misses.
Removed redundant B4b (platform presence) — now merged into B4.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
API results may contain null, numbers, or objects instead of strings.
Now coerces all values to strings before processing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaced Perplexity-only approach with 5 parallel direct API searches:
B1. YouTube Data API: search?type=channel&q={clinicName} → find channel
B2a. Naver Blog API: search blog.json → find official Naver blog
B2b. Naver Web API: search webkr.json → find Instagram/YouTube/Facebook URLs
B3. Firecrawl Search: web search → extract social URLs from results
B4. Perplexity: supplement — catch what direct APIs missed
All 5 sources run in parallel after Stage A (Firecrawl scrape for clinicName).
Results merged + deduplicated + verified. Perplexity is now a fallback,
not the primary source.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Perplexity prompts changed from "find verified accounts" (returns all
null) to "search and report what you find" (returns actual handles).
Added clinicName resolution: Firecrawl Korean → English → Perplexity
URL-to-name lookup → domain fallback.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously Firecrawl and Perplexity ran in parallel, so Perplexity
received raw URL instead of clinic name → poor search results.
Now:
Stage A: Firecrawl scrape+map (parallel) → extract clinicName from HTML
Stage B: Perplexity searches using extracted clinicName → finds Instagram,
YouTube, Facebook handles that Firecrawl HTML parsing missed
Stage C: Merge 3 sources + verify all handles
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
discover-channels: extractHandle('youtube') now detects UC* channel IDs
and returns them without @ prefix (previously @UC... caused verify fail)
verifyHandles: verifyYouTube uses cleanHandle for UC* check, requests
part=id,snippet for richer data
collect-channel-data: if channelId missing but handle present, resolves
via forHandle/forUsername lookup or direct UC* detection before skipping
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
discover-channels: new extractHandle() validates each handle belongs to
its platform (rejects hospital-internal URLs like /idtube/view being
treated as YouTube). Extracts handles from full URLs correctly.
collect-channel-data: explicit Record<string,unknown> typing for DB JSON
fields — fixes TypeScript property access on VerifiedChannels from DB.
verifyHandles: fix TikTok double-URL concatenation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restructured the entire analysis pipeline from AI-guessing social
handles to deterministic 3-phase discovery + collection + generation.
Phase 1 (discover-channels): 3-source channel discovery
- Firecrawl scrape: extract social links from HTML
- Perplexity search: find handles via web search
- URL regex parsing: deterministic link extraction
- Handle verification: HEAD requests + YouTube API
- DB: creates row with verified_channels + scrape_data
Phase 2 (collect-channel-data): 9 parallel data collectors
- Instagram (Apify), YouTube (Data API v3), Facebook (Apify)
- 강남언니 (Firecrawl), Naver Blog + Place (Naver API)
- Google Maps (Apify), Market analysis (Perplexity 4x parallel)
- DB: stores ALL raw data in channel_data column
Phase 3 (generate-report): AI report from real data
- Reads channel_data + analysis_data from DB
- Builds channel summary with real metrics
- AI generates report using only verified data
- V1 backwards compatibility preserved (url-based flow)
Supporting changes:
- DB migration: status, verified_channels, channel_data columns
- _shared/extractSocialLinks.ts: regex-based social link parser
- _shared/verifyHandles.ts: multi-platform handle verifier
- AnalysisLoadingPage: real 3-phase progress + channel panel
- useReport: channel_data column support + V2 enrichment merge
- 강남언니 rating: auto-correct 5→10 scale + search fallback
- KPIDashboard: navigate() instead of <a href>
- Loading text: 20-30초 → 1-2분
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Naver Blog search: collect blog post results for clinic name (total count + top 10 posts)
- Naver Place search: collect place info (name, category, address, telephone)
- Multi-account Instagram: AI prompt requests all IG accounts (국내/해외)
- enrich-channels: process multiple IG handles with fallback per handle
- transformReport: merge multiple IG accounts into instagramAudit.accounts[]
- generate-report: socialHandles.instagram now array of handles
- Hero/CTA: transition-all → transition-shadow for instant click response
- Hero/CTA: disabled state when URL is empty (opacity-50 + cursor-not-allowed)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- enrich-channels: add 강남언니 scraping module (search + structured JSON extraction)
- Collects: rating/10, reviews, doctors with ratings, procedures, certifications
- transformReport: merge 강남언니 data into clinicSnapshot + otherChannels
- Updates lead doctor info, certifications, and review counts from real data
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- enrich-channels: Instagram fallback — auto-try _ps, .ps, _clinic suffixes when <100 followers
- enrich-channels: YouTube URL normalization via normalizeYouTubeChannel (handles /c/, /user/, @handle)
- enrich-channels: Google Maps multi-query search for better hit rate
- generate-report: AI-found social handles prioritized over Firecrawl scrape
- generate-report: Added socialMedia field to AI prompt for accurate handle discovery
- normalizeHandles: Added normalizeYouTubeChannel for /c/, /user/, /channel/, @handle URLs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- P1-5: Add kpiTargets schema to AI prompt, use AI-generated goals instead of hardcoded multipliers
- P1-6: Extend website channelAnalysis with trackingPixels, snsLinksOnSite, additionalDomains, mainCTA
- P1-7: ClinicProfilePage fetches data from DB by report ID instead of hardcoded VIEW clinic data
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add normalizeInstagramHandle() utility (Edge + browser) to strip URLs, @ prefixes
- generate-report: normalize handles before saving, persist socialHandles in report JSONB
- enrich-channels: normalize Instagram handle before Apify call (defense in depth)
- useReport: recover socialHandles + channelEnrichment from DB on direct URL access
- ReportPage: skip redundant enrichment when data already exists in DB
Fixes: Instagram enrichment failing due to URL-format handles passed to Apify
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace mock useReport() with real Supabase API data pipeline
- Add transformReport.ts to map API responses to MarketingReport type
- Add useEnrichment() hook for background channel data enrichment
- Replace Apify YouTube scraper with YouTube Data API v3
- Add mergeEnrichment() for progressive data loading
- Add EmptyState component for graceful empty data handling
- Add socialHandles to generate-report metadata
- Graceful empty data in ClinicSnapshot, YouTube, Instagram, Facebook
- Add Supabase Edge Functions and DB migrations
- Add developer handoff documentation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>