Commit Graph

7 Commits (bcc0b6ea5e6244d6287bbb2c839963dbcf85f280)

Author SHA1 Message Date
Haewon Kam 2ca9ec0306 fix: YouTube name matching + Facebook domain fallback in channel discovery
YouTube now verifies all candidates and picks best match by channel title.
Facebook tries all candidates with domain-name fallback when Firecrawl returns empty.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-05 12:15:37 +09:00
Haewon Kam 79950925a1 fix: add Authorization header to all Edge Function calls + fix Vision Analysis
- All fetch calls to Supabase Edge Functions now include
  Authorization: Bearer <anon_key> (was missing → 401 errors)
- Fix Firecrawl screenshot API: remove invalid screenshotOptions,
  use "screenshot@fullPage" format (v2 API compatibility)
- Fix screenshot response handling: v2 returns URL not base64,
  now downloads and converts to base64 for Gemini Vision
- Add about page to Vision Analysis capture targets
- Add retry utility, channel error tracking, pipeline resume,
  enrichment retry, EmptyState improvements (Sprint 2-3)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-05 10:08:03 +09:00
Haewon Kam 80c57147e7 feat: Sprint 1 — 7 data quality quick wins
WP-1: YouTube channel ID regex {20,} → {22} (exactly 24 chars)
WP-2: Naver Place category filtering in enrich-channels (성형/피부)
WP-3: Google Maps stores mapsUrl separately from clinicWebsite
WP-4: Naver Blog separates officialBlogUrl from search results
WP-5: 강남언니 rawRating + normalized rating (≤5 → ×2), Firecrawl
      prompt explicitly states "out of 10, NOT out of 5"
WP-6: Perplexity model centralized in _shared/config.ts (env override)
WP-7: Apify Instagram timeout 30s → 45s

Frontend: transformReport uses mapsUrl and officialBlogUrl when available

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 23:35:40 +09:00
Haewon Kam 1fb1de8303 fix: keep unverified Instagram handles as candidates for collection
Instagram HEAD requests often fail (rate limiting, blocking) causing
valid handles to be dropped. Now all discovered handles are kept
(verified or not) and Apify attempts collection on all of them.
Apify's own scraper validates existence more reliably than HEAD requests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 01:38:12 +09:00
Haewon Kam df8f84c3b9 fix: YouTube channel ID (UC...) handling + handle-to-channelId resolution
discover-channels: extractHandle('youtube') now detects UC* channel IDs
and returns them without @ prefix (previously @UC... caused verify fail)

verifyHandles: verifyYouTube uses cleanHandle for UC* check, requests
part=id,snippet for richer data

collect-channel-data: if channelId missing but handle present, resolves
via forHandle/forUsername lookup or direct UC* detection before skipping

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 01:00:21 +09:00
Haewon Kam f65f0e85b3 fix: robust handle extraction — reject non-platform URLs, fix type safety
discover-channels: new extractHandle() validates each handle belongs to
its platform (rejects hospital-internal URLs like /idtube/view being
treated as YouTube). Extracts handles from full URLs correctly.

collect-channel-data: explicit Record<string,unknown> typing for DB JSON
fields — fixes TypeScript property access on VerifiedChannels from DB.

verifyHandles: fix TikTok double-URL concatenation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 00:03:26 +09:00
Haewon Kam 7557ef774c feat: Pipeline V2 — 3-phase analysis with verified channel discovery
Restructured the entire analysis pipeline from AI-guessing social
handles to deterministic 3-phase discovery + collection + generation.

Phase 1 (discover-channels): 3-source channel discovery
  - Firecrawl scrape: extract social links from HTML
  - Perplexity search: find handles via web search
  - URL regex parsing: deterministic link extraction
  - Handle verification: HEAD requests + YouTube API
  - DB: creates row with verified_channels + scrape_data

Phase 2 (collect-channel-data): 9 parallel data collectors
  - Instagram (Apify), YouTube (Data API v3), Facebook (Apify)
  - 강남언니 (Firecrawl), Naver Blog + Place (Naver API)
  - Google Maps (Apify), Market analysis (Perplexity 4x parallel)
  - DB: stores ALL raw data in channel_data column

Phase 3 (generate-report): AI report from real data
  - Reads channel_data + analysis_data from DB
  - Builds channel summary with real metrics
  - AI generates report using only verified data
  - V1 backwards compatibility preserved (url-based flow)

Supporting changes:
  - DB migration: status, verified_channels, channel_data columns
  - _shared/extractSocialLinks.ts: regex-based social link parser
  - _shared/verifyHandles.ts: multi-platform handle verifier
  - AnalysisLoadingPage: real 3-phase progress + channel panel
  - useReport: channel_data column support + V2 enrichment merge
  - 강남언니 rating: auto-correct 5→10 scale + search fallback
  - KPIDashboard: navigate() instead of <a href>
  - Loading text: 20-30초 → 1-2분

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 21:49:13 +09:00