EmbedClipFarm — YouTube Semantic Search

EmbedClipFarm lets you search YouTube videos by meaning, not just keywords. It uses AI to understand what's being said in videos, so searching "economic anxiety" finds relevant moments even if those exact words are never spoken.

What gets analyzed: The primary analysis is on speech transcripts — auto-generated captions pulled from YouTube via yt-dlp, or transcribed locally using Whisper AI for videos without captions. Optionally, a vision API can annotate keyframes with rich text descriptions for visual search (e.g. "person at a podium") — use --vision gemini or --vision claude during indexing. Or use CLIP embeddings with --clip for a fully local alternative (no API key needed). For videos without captions, add --whisper to transcribe audio locally using Whisper AI. Scene detection uses frame-by-frame analysis to identify visual cut points in the video, splitting it into distinct scenes that can be individually searched and downloaded as clips.

How it works: Transcripts are split into ~30-second chunks, and each chunk is converted into a 384-dimensional vector embedding (using all-MiniLM-L6-v2) that captures its semantic meaning. When you search, your query is embedded the same way and results are ranked by cosine similarity — how close your query's meaning is to each chunk's meaning.

Scores: strong (>= 0.3) — high confidence match, related (0.15–0.3) — topically related, below 0.15 is filtered out as noise.

Before using this page: Run the CLI to index a YouTube channel or videos. This creates an index.json file that you load below. Each result embeds the YouTube player at the matching timestamp and gives you a command to download the specific clip.

1. Install & index:
pip install youtube-transcript-api chromadb sentence-transformers numpy python-dotenv yt-dlp

Index a single video:
python embedclipfarm.py index "https://youtube.com/watch?v=VIDEO_ID"
Index multiple videos:
python embedclipfarm.py index "URL1" "URL2" "URL3"
Index a channel:
python embedclipfarm.py index "@channelname" --max-videos 20
Index a playlist:
python embedclipfarm.py index "https://youtube.com/playlist?list=PLxxxxx"
Index from a file of URLs:
python embedclipfarm.py index urls.txt

Each creates a folder (named after the source) containing index.json + transcripts. Check the CLI output for the exact folder path, then load the index.json below.

Optional: add visual scene annotation or speech features
Add any of these flags to the index command above:

Annotate keyframes with Gemini vision (set GEMINI_API_KEY in .env)
--vision gemini
Annotate keyframes with Claude vision (set ANTHROPIC_API_KEY in .env)
--vision claude
Local CLIP visual embeddings (no API key needed)
--clip
Whisper transcription for videos without captions:
--whisper
Speaker ID to tag who's talking (requires pip install pyannote-audio + HF_TOKEN in .env)
--whisper --speaker-id

Custom prompt for vision annotation:
--vision-prompt "Describe the fashion and clothing visible in this frame"
Default prompt describes the scene in detail for search. Custom prompts let you focus on specific aspects (fashion, text on screen, emotions, etc.)

Punctuation is added to transcripts by default. Disable with:
--no-punctuate

Example with everything:
python embedclipfarm.py index "@channel" --vision gemini --whisper --speaker-id --max-videos 20

3. Download clips:
python embedclipfarm.py clip VIDEO_ID --start 10 --end 41 Download a specific segment
python embedclipfarm.py search "query" --download ./clips Search and download all matching clips
Each search result below includes a Copy clip command button — paste it into your terminal to download that exact segment.

Full docs: github.com/artificialnouveau/embedclipfarm