Commit Graph

30 Commits (79950925a1b0375f66c2f67cb782d826f52e0616)

Author SHA1 Message Date
Haewon Kam 79950925a1 fix: add Authorization header to all Edge Function calls + fix Vision Analysis
- All fetch calls to Supabase Edge Functions now include
  Authorization: Bearer <anon_key> (was missing → 401 errors)
- Fix Firecrawl screenshot API: remove invalid screenshotOptions,
  use "screenshot@fullPage" format (v2 API compatibility)
- Fix screenshot response handling: v2 returns URL not base64,
  now downloads and converts to base64 for Gemini Vision
- Add about page to Vision Analysis capture targets
- Add retry utility, channel error tracking, pipeline resume,
  enrichment retry, EmptyState improvements (Sprint 2-3)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-05 10:08:03 +09:00
Haewon Kam 7fe3ff82c9 feat: DB V3 dual-write — clinics + analysis_runs + channel_snapshots
Phase 2-4 of SaaS schema migration. All Edge Functions now write to
BOTH legacy marketing_reports AND new V3 tables:

discover-channels:
  - UPSERT clinics (url-based dedup)
  - INSERT analysis_runs (status: discovering)

collect-channel-data:
  - INSERT channel_snapshots (one per channel — time-series!)
  - INSERT screenshots (evidence rows)
  - UPDATE analysis_runs (raw_channel_data, vision_analysis)

generate-report:
  - UPDATE analysis_runs (report, status: complete)
  - UPDATE clinics (last_analyzed_at, established_year)

Frontend passes clinicId + runId through all 3 phases.
Legacy marketing_reports still written for backward compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 00:51:11 +09:00
Haewon Kam 6a3390840d feat: Vision Analysis — screenshot capture + Gemini Vision extraction
WP-V1: Multi-page screenshot capture via Firecrawl
  - Captures 6+ pages: main, doctors, surgery, YouTube, Instagram, 강남언니
  - Runs in parallel within collect-channel-data Phase 2

WP-V2: Gemini Vision analysis per screenshot
  - Page-specific prompts (main page OCR, doctor profiles, channel stats)
  - Extracts: founding year, doctors, certifications, services, social icons,
    brand colors, slogans, YouTube/Instagram stats from screenshots

WP-V3: Vision data pipeline integration
  - channel_data.visionAnalysis: merged structured data
  - channel_data.screenshots[]: evidence for report EvidenceGallery
  - generate-report embeds screenshots as report.screenshots[]
  - buildChannelSummary includes Vision data in AI prompt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 23:59:19 +09:00
Haewon Kam ed37f23f78 feat: extract social links from JS-rendered buttons on clinic website
Added A4 parallel Firecrawl call with actions: [wait 3s, scrape]
to execute JavaScript and extract social button href URLs from
header/footer. This is the most reliable source — most Korean
clinics have Facebook/Instagram/YouTube/Blog icons in their nav.

Results merged as Source 3 (buttonHandles) alongside HTML links,
JSON extraction, and API searches.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 23:41:27 +09:00
Haewon Kam 80c57147e7 feat: Sprint 1 — 7 data quality quick wins
WP-1: YouTube channel ID regex {20,} → {22} (exactly 24 chars)
WP-2: Naver Place category filtering in enrich-channels (성형/피부)
WP-3: Google Maps stores mapsUrl separately from clinicWebsite
WP-4: Naver Blog separates officialBlogUrl from search results
WP-5: 강남언니 rawRating + normalized rating (≤5 → ×2), Firecrawl
      prompt explicitly states "out of 10, NOT out of 5"
WP-6: Perplexity model centralized in _shared/config.ts (env override)
WP-7: Apify Instagram timeout 30s → 45s

Frontend: transformReport uses mapsUrl and officialBlogUrl when available

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 23:35:40 +09:00
Haewon Kam 29c1faf49e fix: correct OtherChannels URLs — Google Maps, Naver Blog, Naver Place
Google Maps: was using gm.website (clinic's own site) → now always
generates maps.google.com/search URL

Naver Blog: was linking to first search result post (random personal
blog) → now links to Naver blog search results page

Naver Place: np.link was the clinic's own website, not Naver Place →
now generates map.naver.com search URL. Also fixed collect-channel-data
to search with "성형외과" suffix and match by category (성형/피부) to
avoid same-name dental clinics.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 21:25:26 +09:00
Haewon Kam 1fb1de8303 fix: keep unverified Instagram handles as candidates for collection
Instagram HEAD requests often fail (rate limiting, blocking) causing
valid handles to be dropped. Now all discovered handles are kept
(verified or not) and Apify attempts collection on all of them.
Apify's own scraper validates existence more reliably than HEAD requests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 01:38:12 +09:00
Haewon Kam 087f65eec1 fix: revert to single Perplexity query with proven prompt pattern
Split queries performed worse. The proven working pattern is:
- Single query with Korean+English clinic name
- "검색해서 찾아줘. 검색 결과에서 발견된 계정을 모두 알려줘" phrasing
- All channels in one request
- English name in parentheses helps Perplexity find international accounts

Tested: "그랜드성형외과 (Grand Plastic Surgery)" → finds Instagram,
YouTube, Facebook, TikTok, Naver Blog all in one call.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 01:36:36 +09:00
Haewon Kam 5157cf446a fix: split Perplexity into 3 focused queries matching research methodology
Single mega-query returns empty results. Split into:
B4a. Instagram + YouTube (most important, focused search)
B4b. Facebook + TikTok + Naver Blog + Kakao
B4c. 강남언니 + review platforms

Each query is short and focused — matches the proven pattern of
2-5 keyword searches that Perplexity handles well.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 01:34:45 +09:00
Haewon Kam ac2da7a4ac fix: simplify Perplexity prompt — short system + direct user query
Long system prompt caused sonar-pro to return empty results.
Reverted to sonar model with short, proven prompt pattern that
matches the user's successful manual test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 01:32:54 +09:00
Haewon Kam e64d168d34 feat: Perplexity sonar-pro research agent with structured online presence analysis
Replaced simple "find handles" prompt with comprehensive research agent:
- Model: sonar → sonar-pro (advanced multi-step web search)
- System prompt: full research methodology with 2-3 keyword searches,
  URL fetching, quantitative data extraction
- Output: structured JSON with channels (handles + follower counts +
  subscriber counts) + platforms (강남언니 rating, reviews)
- Research results saved to scrape_data.onlinePresenceResearch for
  downstream use in collect-channel-data and generate-report

Added _shared/researchPrompt.ts with prompt template + builder.
Updated agent documentation in doc/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 01:31:00 +09:00
Haewon Kam c74832d764 feat: Perplexity Online Presence 종합 분석 + Apify Instagram 검색
B4 Perplexity: rewrote from narrow "find social accounts" to broad
"Online Presence 종합 분석" — finds Instagram, YouTube, Facebook,
TikTok, Naver, Kakao, 강남언니, 바비톡 in one query.

B5 Apify Instagram: generates handle candidates from clinic name
(english name, domain, _official, _ps, _clinic variants) and directly
checks each via Apify instagram-profile-scraper. Finds accounts that
web search misses.

Removed redundant B4b (platform presence) — now merged into B4.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 01:24:56 +09:00
Haewon Kam 64669888c2 fix: type-safe string handling in extractSocialLinks/mergeSocialLinks
API results may contain null, numbers, or objects instead of strings.
Now coerces all values to strings before processing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 01:17:49 +09:00
Haewon Kam f224d1788c feat: API-first channel discovery — YouTube API + Naver API + Firecrawl Search + Perplexity
Replaced Perplexity-only approach with 5 parallel direct API searches:

B1. YouTube Data API: search?type=channel&q={clinicName} → find channel
B2a. Naver Blog API: search blog.json → find official Naver blog
B2b. Naver Web API: search webkr.json → find Instagram/YouTube/Facebook URLs
B3. Firecrawl Search: web search → extract social URLs from results
B4. Perplexity: supplement — catch what direct APIs missed

All 5 sources run in parallel after Stage A (Firecrawl scrape for clinicName).
Results merged + deduplicated + verified. Perplexity is now a fallback,
not the primary source.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 01:15:49 +09:00
Haewon Kam 25aece2366 fix: Perplexity prompt rewrite + clinicName fallback via AI
Perplexity prompts changed from "find verified accounts" (returns all
null) to "search and report what you find" (returns actual handles).
Added clinicName resolution: Firecrawl Korean → English → Perplexity
URL-to-name lookup → domain fallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 01:07:04 +09:00
Haewon Kam 122b1915f0 fix: 2-stage discovery — Firecrawl first for clinicName, then Perplexity
Previously Firecrawl and Perplexity ran in parallel, so Perplexity
received raw URL instead of clinic name → poor search results.

Now:
Stage A: Firecrawl scrape+map (parallel) → extract clinicName from HTML
Stage B: Perplexity searches using extracted clinicName → finds Instagram,
  YouTube, Facebook handles that Firecrawl HTML parsing missed
Stage C: Merge 3 sources + verify all handles

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 01:02:30 +09:00
Haewon Kam df8f84c3b9 fix: YouTube channel ID (UC...) handling + handle-to-channelId resolution
discover-channels: extractHandle('youtube') now detects UC* channel IDs
and returns them without @ prefix (previously @UC... caused verify fail)

verifyHandles: verifyYouTube uses cleanHandle for UC* check, requests
part=id,snippet for richer data

collect-channel-data: if channelId missing but handle present, resolves
via forHandle/forUsername lookup or direct UC* detection before skipping

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 01:00:21 +09:00
Haewon Kam f65f0e85b3 fix: robust handle extraction — reject non-platform URLs, fix type safety
discover-channels: new extractHandle() validates each handle belongs to
its platform (rejects hospital-internal URLs like /idtube/view being
treated as YouTube). Extracts handles from full URLs correctly.

collect-channel-data: explicit Record<string,unknown> typing for DB JSON
fields — fixes TypeScript property access on VerifiedChannels from DB.

verifyHandles: fix TikTok double-URL concatenation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 00:03:26 +09:00
Haewon Kam 5239ad7382 chore: add deno.json for new Edge Functions
Required by Supabase deploy to resolve @supabase/functions-js import.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 22:04:13 +09:00
Haewon Kam 7557ef774c feat: Pipeline V2 — 3-phase analysis with verified channel discovery
Restructured the entire analysis pipeline from AI-guessing social
handles to deterministic 3-phase discovery + collection + generation.

Phase 1 (discover-channels): 3-source channel discovery
  - Firecrawl scrape: extract social links from HTML
  - Perplexity search: find handles via web search
  - URL regex parsing: deterministic link extraction
  - Handle verification: HEAD requests + YouTube API
  - DB: creates row with verified_channels + scrape_data

Phase 2 (collect-channel-data): 9 parallel data collectors
  - Instagram (Apify), YouTube (Data API v3), Facebook (Apify)
  - 강남언니 (Firecrawl), Naver Blog + Place (Naver API)
  - Google Maps (Apify), Market analysis (Perplexity 4x parallel)
  - DB: stores ALL raw data in channel_data column

Phase 3 (generate-report): AI report from real data
  - Reads channel_data + analysis_data from DB
  - Builds channel summary with real metrics
  - AI generates report using only verified data
  - V1 backwards compatibility preserved (url-based flow)

Supporting changes:
  - DB migration: status, verified_channels, channel_data columns
  - _shared/extractSocialLinks.ts: regex-based social link parser
  - _shared/verifyHandles.ts: multi-platform handle verifier
  - AnalysisLoadingPage: real 3-phase progress + channel panel
  - useReport: channel_data column support + V2 enrichment merge
  - 강남언니 rating: auto-correct 5→10 scale + search fallback
  - KPIDashboard: navigate() instead of <a href>
  - Loading text: 20-30초 → 1-2분

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 21:49:13 +09:00
Haewon Kam e32b8766de feat: prototype gap closure — enrichment diagnosis + brand extraction + plan assets
Phase 1: Data Pipeline Fixes
- Plan page: connect enrichment data for Asset Collection + YouTube Repurpose
- mergeEnrichment: generate 15-20 data-driven diagnosis items from enrichment
  (YouTube Shorts check, IG engagement, FB activity, 강남언니 ratings, GMaps)
- ClinicSnapshot: fill staffCount, nearestStation, certifications from enrichment

Phase 2: AI + Brand Enhancement
- AI prompt: per-channel diagnosis[] array (5-7 items), established, nameEn, newChannelProposals
- scrape-website: Firecrawl branding extraction (colors, fonts, logo, tagline)
- transformPlan: BrandGuide colors/fonts from scraped branding data
- transformPlan: cross-channel brand consistency analysis (name, phone mismatches)
- transformPlan: channel branding rules from enrichment (YT, IG, FB profiles)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 17:09:15 +09:00
Haewon Kam a7d8aeeddc feat: Facebook page data collection via Apify scraper
- enrich-channels: add Facebook Pages Scraper (apify~facebook-pages-scraper)
- Collects: pageName, followers, likes, categories, email, phone, website, intro, rating
- transformReport: merge Facebook data into facebookAudit.pages[] (auto-shows section)
- Frontend: pass facebookHandle through enrichment pipeline
- EnrichChannelsRequest: add facebookHandle parameter

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 16:16:37 +09:00
Haewon Kam ad625e08ee fix: enrichment pipeline reliability + loading page gradient + button click area
- generate-report: filter empty strings from AI social handles, add saveError logging
- useReport: 3-level fallback for social handles (report > clinicInfo > scrape_data)
- useEnrichment: always trigger enrichment if clinicName exists (not just IG/YT handles)
- Hero: pointer-events-none on decorative blobs (were blocking button clicks)
- AnalysisLoadingPage: warm gradient on INFINITH logo text (#fff3eb → #e4cfff → #f5f9ff)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 16:05:33 +09:00
Haewon Kam 72ea8f4a2d feat: Naver Search API + multi-account Instagram + button UX fix
- Naver Blog search: collect blog post results for clinic name (total count + top 10 posts)
- Naver Place search: collect place info (name, category, address, telephone)
- Multi-account Instagram: AI prompt requests all IG accounts (국내/해외)
- enrich-channels: process multiple IG handles with fallback per handle
- transformReport: merge multiple IG accounts into instagramAudit.accounts[]
- generate-report: socialHandles.instagram now array of handles
- Hero/CTA: transition-all → transition-shadow for instant click response
- Hero/CTA: disabled state when URL is empty (opacity-50 + cursor-not-allowed)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 15:34:10 +09:00
Haewon Kam cf482d1bd7 feat: 강남언니 real-time data collection via Firecrawl scraping
- enrich-channels: add 강남언니 scraping module (search + structured JSON extraction)
- Collects: rating/10, reviews, doctors with ratings, procedures, certifications
- transformReport: merge 강남언니 data into clinicSnapshot + otherChannels
- Updates lead doctor info, certifications, and review counts from real data

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 14:51:47 +09:00
Haewon Kam e5399486f7 fix: Instagram data collection pipeline — handle normalization + DB persistence
- enrich-channels: Instagram fallback — auto-try _ps, .ps, _clinic suffixes when <100 followers
- enrich-channels: YouTube URL normalization via normalizeYouTubeChannel (handles /c/, /user/, @handle)
- enrich-channels: Google Maps multi-query search for better hit rate
- generate-report: AI-found social handles prioritized over Firecrawl scrape
- generate-report: Added socialMedia field to AI prompt for accurate handle discovery
- normalizeHandles: Added normalizeYouTubeChannel for /c/, /user/, /channel/, @handle URLs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 14:45:00 +09:00
Haewon Kam 200497fa1e feat: P1-5/6/7 — AI KPI targets, website tech audit, dynamic clinic profile
- P1-5: Add kpiTargets schema to AI prompt, use AI-generated goals instead of hardcoded multipliers
- P1-6: Extend website channelAnalysis with trackingPixels, snsLinksOnSite, additionalDomains, mainCTA
- P1-7: ClinicProfilePage fetches data from DB by report ID instead of hardcoded VIEW clinic data

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 14:30:03 +09:00
Haewon Kam 7ea9972c7e feat: P1-4 Brand Identity tab — AI-generated brand analysis
- generate-report: add brandIdentity schema to AI prompt (logo, message, tone, positioning, hashtags, channel consistency)
- transformReport: map API brandIdentity array to TransformationProposal component
- ApiReport type: add brandIdentity field

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 14:10:28 +09:00
Haewon Kam bd7bc45192 fix: Instagram data collection pipeline — handle normalization + DB persistence
- Add normalizeInstagramHandle() utility (Edge + browser) to strip URLs, @ prefixes
- generate-report: normalize handles before saving, persist socialHandles in report JSONB
- enrich-channels: normalize Instagram handle before Apify call (defense in depth)
- useReport: recover socialHandles + channelEnrichment from DB on direct URL access
- ReportPage: skip redundant enrichment when data already exists in DB

Fixes: Instagram enrichment failing due to URL-format handles passed to Apify

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 13:34:54 +09:00
Haewon Kam 60cd055042 feat: real API integration + YouTube Data API v3 + progressive loading
- Replace mock useReport() with real Supabase API data pipeline
- Add transformReport.ts to map API responses to MarketingReport type
- Add useEnrichment() hook for background channel data enrichment
- Replace Apify YouTube scraper with YouTube Data API v3
- Add mergeEnrichment() for progressive data loading
- Add EmptyState component for graceful empty data handling
- Add socialHandles to generate-report metadata
- Graceful empty data in ClinicSnapshot, YouTube, Instagram, Facebook
- Add Supabase Edge Functions and DB migrations
- Add developer handoff documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 10:57:14 +09:00