o2o-infinith-demo

Commit Graph

Author	SHA1	Message	Date
Haewon Kam	cd2463fb2d	fix: clinic_registry CSV 임포트 + NaverPlace 검색 개선 - VerifiedChannels에 naverPlace 필드 추가 (registry URL → placeId 전달) - registryToVerifiedChannels: naver_place_url → placeId 추출하여 포함 - collect-channel-data NaverPlace 매칭 완화: exact match → contains match, 의원/병원 suffix 제거 shortName 사용, placeId 힌트로 검색 보강 - clinic_registry에 73개 병원 CSV 데이터 임포트 (올바른 YouTube/Blog/GangnamUnni/NaverPlace URL) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-10 14:22:59 +09:00
Haewon Kam	36d2f1cf49	feat: archive Firecrawl screenshots to Supabase Storage (permanent URLs) ## 문제 Firecrawl이 반환하는 스크린샷 URL은 GCS Signed URL로 7일 후 만료. 리포트에 저장된 이미지 URL이 일주일 후 전부 깨짐 (403 Access Denied). ## 해결 collect-channel-data의 Vision 단계에 아카이빙 스텝 추가. 캡처 직후 base64(이미 메모리에 있음)를 Supabase Storage에 영구 업로드. ### 처리 흐름 (변경 후) 1. captureAllScreenshots() → GCS URL + base64 반환 (기존) 2. [신규] archiveTasks: base64 → Supabase Storage 업로드 (병렬) - 경로: screenshots/{reportId}/{screenshotId}.png - 성공 시 ss.url을 영구 Supabase URL로 in-place 교체 - 실패 시 non-fatal — GCS URL fallback으로 Vision 분석 계속 진행 3. runVisionAnalysis() — base64 여전히 메모리에 있어 정상 실행 (기존) 4. channelData.screenshots 저장 시 영구 URL 사용 (자동) - archived: true/false 플래그 추가 (모니터링용) ### 비용/성능 - 추가 API 호출 없음 (base64 이미 캡처 시 다운로드됨) - 업로드: ~1-3초/장 (병렬), 5MB limit, PNG/JPEG/WebP 허용 - 버킷: public (URL만 있으면 열람) + 서비스 역할만 업로드 가능 ## 마이그레이션 supabase/migrations/20260407_screenshots_storage.sql - screenshots 버킷 생성 (public, 5MB limit) - RLS: public read / service_role write - delete_old_screenshots() 함수: 90일 이상 된 파일 정리 (pg_cron 연동 가능) ## 타입 ScreenshotResult.archived?: boolean 필드 추가 (영구 vs GCS fallback 구분) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 09:51:31 +09:00
Haewon Kam	d5f7f24e0a	feat: clinic registry DB + pipeline audit P0 fixes ## Clinic Registry - data/clinic-registry/clinic_registry_working.csv — 91개 병원 채널 마스터 DB - data/clinic-registry/INFINITH_Outbound_List.csv — BD팀 아웃바운드 리스트 (17컬럼) - data/clinic-registry/update_csv.py — 안전 CSV 업데이트 스크립트 (빈 필드만 채움) - data/clinic-registry/extract_place_ids.py — 네이버 플레이스 ID 추출기 - scripts/import-registry.ts — CSV → Supabase clinic_registry 테이블 임포트 - supabase/migrations/20260406_clinic_registry.sql — clinic_registry 테이블 스키마 ## Pipeline P0 Bug Fixes (전수 감사 후) - fix(collect-channel-data): 강남언니 rating 0-10 스케일 오변환 제거 - 기존: rating ≤ 5이면 ×2 → 4.8/10을 9.6/10으로 잘못 변환 - 수정: Firecrawl 프롬프트가 이미 0-10 지시 → rawValue 직접 신뢰 - fix(generate-report): Perplexity 단일 fetch → fetchWithRetry 교체 - maxRetries:2, backoffMs:[5000,15000], timeoutMs:90s - 기존: 타임아웃/429 시 리포트 생성 전체 실패 - 수정: 자동 재시도로 일시적 API 오류 극복 ## Docs - docs/PIPELINE_IMPROVEMENT_PLAN.md — Sprint 0/1/2 완료 표시 + 전수 감사 결과 추가 - docs/REGISTRY_FUNCTIONAL_SPECS.md, DB_SCHEMA_V3.md 외 기획 문서 다수 추가 ## New Components & Features - supabase/functions/generate-content-plan, adjust-strategy — 콘텐츠 플랜/전략 조정 - src/components/plan/EditEntryModal, StrategyAdjustmentSection — 플랜 편집 UI - supabase/functions/_shared/dataQuality, foundingYearExtractor, urlClassifier — 데이터 품질 유틸 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 09:33:25 +09:00
Haewon Kam	ec991057e6	feat: add API Dashboard + filled icons + pipeline improvements - Add /api-dashboard page with API connection status, env var checker, pipeline flow diagram, and cost estimator - Add 15 new filled SVG icons (Shield, Database, Server, Bolt, Eye, Copy, Check, Cross, Warning, Refresh, Flow, Coin, LinkExternal etc.) - Follow INFINITH design system: no emoji, no line icons, semantic status colors, diagonal shadows, brand gradients, font-serif headings - Improve Vision Analysis with base64 encoding fix - Add SectionErrorBoundary for graceful section-level error handling - Add Google Places API utility (prepared for future migration) - Fix Edge Function auth headers and report generation pipeline Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 14:59:31 +09:00
Haewon Kam	2ca9ec0306	fix: YouTube name matching + Facebook domain fallback in channel discovery YouTube now verifies all candidates and picks best match by channel title. Facebook tries all candidates with domain-name fallback when Firecrawl returns empty. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-05 12:15:37 +09:00
Haewon Kam	82e9ec6cc0	fix: correct base64 encoding for Vision Analysis screenshots - Previous chunked btoa approach encoded each chunk independently, producing corrupted base64 that Gemini couldn't parse (returned {}) - Now builds complete binary string first, then encodes once with btoa - Added screenshot debug info to channel errors for diagnostics - Confirmed: foundingYear 2004, doctors, gangnamunni data all extracted Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-05 11:38:16 +09:00
Haewon Kam	79950925a1	fix: add Authorization header to all Edge Function calls + fix Vision Analysis - All fetch calls to Supabase Edge Functions now include Authorization: Bearer <anon_key> (was missing → 401 errors) - Fix Firecrawl screenshot API: remove invalid screenshotOptions, use "screenshot@fullPage" format (v2 API compatibility) - Fix screenshot response handling: v2 returns URL not base64, now downloads and converts to base64 for Gemini Vision - Add about page to Vision Analysis capture targets - Add retry utility, channel error tracking, pipeline resume, enrichment retry, EmptyState improvements (Sprint 2-3) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-05 10:08:03 +09:00
Haewon Kam	6a3390840d	feat: Vision Analysis — screenshot capture + Gemini Vision extraction WP-V1: Multi-page screenshot capture via Firecrawl - Captures 6+ pages: main, doctors, surgery, YouTube, Instagram, 강남언니 - Runs in parallel within collect-channel-data Phase 2 WP-V2: Gemini Vision analysis per screenshot - Page-specific prompts (main page OCR, doctor profiles, channel stats) - Extracts: founding year, doctors, certifications, services, social icons, brand colors, slogans, YouTube/Instagram stats from screenshots WP-V3: Vision data pipeline integration - channel_data.visionAnalysis: merged structured data - channel_data.screenshots[]: evidence for report EvidenceGallery - generate-report embeds screenshots as report.screenshots[] - buildChannelSummary includes Vision data in AI prompt Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 23:59:19 +09:00
Haewon Kam	80c57147e7	feat: Sprint 1 — 7 data quality quick wins WP-1: YouTube channel ID regex {20,} → {22} (exactly 24 chars) WP-2: Naver Place category filtering in enrich-channels (성형/피부) WP-3: Google Maps stores mapsUrl separately from clinicWebsite WP-4: Naver Blog separates officialBlogUrl from search results WP-5: 강남언니 rawRating + normalized rating (≤5 → ×2), Firecrawl prompt explicitly states "out of 10, NOT out of 5" WP-6: Perplexity model centralized in _shared/config.ts (env override) WP-7: Apify Instagram timeout 30s → 45s Frontend: transformReport uses mapsUrl and officialBlogUrl when available Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 23:35:40 +09:00
Haewon Kam	1fb1de8303	fix: keep unverified Instagram handles as candidates for collection Instagram HEAD requests often fail (rate limiting, blocking) causing valid handles to be dropped. Now all discovered handles are kept (verified or not) and Apify attempts collection on all of them. Apify's own scraper validates existence more reliably than HEAD requests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 01:38:12 +09:00
Haewon Kam	ac2da7a4ac	fix: simplify Perplexity prompt — short system + direct user query Long system prompt caused sonar-pro to return empty results. Reverted to sonar model with short, proven prompt pattern that matches the user's successful manual test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 01:32:54 +09:00
Haewon Kam	e64d168d34	feat: Perplexity sonar-pro research agent with structured online presence analysis Replaced simple "find handles" prompt with comprehensive research agent: - Model: sonar → sonar-pro (advanced multi-step web search) - System prompt: full research methodology with 2-3 keyword searches, URL fetching, quantitative data extraction - Output: structured JSON with channels (handles + follower counts + subscriber counts) + platforms (강남언니 rating, reviews) - Research results saved to scrape_data.onlinePresenceResearch for downstream use in collect-channel-data and generate-report Added _shared/researchPrompt.ts with prompt template + builder. Updated agent documentation in doc/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 01:31:00 +09:00
Haewon Kam	64669888c2	fix: type-safe string handling in extractSocialLinks/mergeSocialLinks API results may contain null, numbers, or objects instead of strings. Now coerces all values to strings before processing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 01:17:49 +09:00
Haewon Kam	df8f84c3b9	fix: YouTube channel ID (UC...) handling + handle-to-channelId resolution discover-channels: extractHandle('youtube') now detects UC* channel IDs and returns them without @ prefix (previously @UC... caused verify fail) verifyHandles: verifyYouTube uses cleanHandle for UC* check, requests part=id,snippet for richer data collect-channel-data: if channelId missing but handle present, resolves via forHandle/forUsername lookup or direct UC* detection before skipping Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 01:00:21 +09:00
Haewon Kam	f65f0e85b3	fix: robust handle extraction — reject non-platform URLs, fix type safety discover-channels: new extractHandle() validates each handle belongs to its platform (rejects hospital-internal URLs like /idtube/view being treated as YouTube). Extracts handles from full URLs correctly. collect-channel-data: explicit Record<string,unknown> typing for DB JSON fields — fixes TypeScript property access on VerifiedChannels from DB. verifyHandles: fix TikTok double-URL concatenation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 00:03:26 +09:00
Haewon Kam	7557ef774c	feat: Pipeline V2 — 3-phase analysis with verified channel discovery Restructured the entire analysis pipeline from AI-guessing social handles to deterministic 3-phase discovery + collection + generation. Phase 1 (discover-channels): 3-source channel discovery - Firecrawl scrape: extract social links from HTML - Perplexity search: find handles via web search - URL regex parsing: deterministic link extraction - Handle verification: HEAD requests + YouTube API - DB: creates row with verified_channels + scrape_data Phase 2 (collect-channel-data): 9 parallel data collectors - Instagram (Apify), YouTube (Data API v3), Facebook (Apify) - 강남언니 (Firecrawl), Naver Blog + Place (Naver API) - Google Maps (Apify), Market analysis (Perplexity 4x parallel) - DB: stores ALL raw data in channel_data column Phase 3 (generate-report): AI report from real data - Reads channel_data + analysis_data from DB - Builds channel summary with real metrics - AI generates report using only verified data - V1 backwards compatibility preserved (url-based flow) Supporting changes: - DB migration: status, verified_channels, channel_data columns - _shared/extractSocialLinks.ts: regex-based social link parser - _shared/verifyHandles.ts: multi-platform handle verifier - AnalysisLoadingPage: real 3-phase progress + channel panel - useReport: channel_data column support + V2 enrichment merge - 강남언니 rating: auto-correct 5→10 scale + search fallback - KPIDashboard: navigate() instead of <a href> - Loading text: 20-30초 → 1-2분 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 21:49:13 +09:00
Haewon Kam	e5399486f7	fix: Instagram data collection pipeline — handle normalization + DB persistence - enrich-channels: Instagram fallback — auto-try _ps, .ps, _clinic suffixes when <100 followers - enrich-channels: YouTube URL normalization via normalizeYouTubeChannel (handles /c/, /user/, @handle) - enrich-channels: Google Maps multi-query search for better hit rate - generate-report: AI-found social handles prioritized over Firecrawl scrape - generate-report: Added socialMedia field to AI prompt for accurate handle discovery - normalizeHandles: Added normalizeYouTubeChannel for /c/, /user/, /channel/, @handle URLs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 14:45:00 +09:00
Haewon Kam	bd7bc45192	fix: Instagram data collection pipeline — handle normalization + DB persistence - Add normalizeInstagramHandle() utility (Edge + browser) to strip URLs, @ prefixes - generate-report: normalize handles before saving, persist socialHandles in report JSONB - enrich-channels: normalize Instagram handle before Apify call (defense in depth) - useReport: recover socialHandles + channelEnrichment from DB on direct URL access - ReportPage: skip redundant enrichment when data already exists in DB Fixes: Instagram enrichment failing due to URL-format handles passed to Apify Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 13:34:54 +09:00

18 Commits (f0bf3bb9b07e1b48042adae55b2d673c39af1936)