Semantic search over YouTube content — search by meaning, not just keywords.
EmbedClipFarm lets you search YouTube videos by meaning, not just keywords. It uses AI to understand what's being said in videos, so searching "economic anxiety" finds relevant moments even if those exact words are never spoken.
What gets analyzed:
The primary analysis is on speech transcripts — auto-generated captions pulled from YouTube via yt-dlp,
or transcribed locally using Whisper AI for videos without captions.
Optionally, a vision API can annotate keyframes with rich text descriptions for visual search
(e.g. "person at a podium") — use --vision gemini or
--vision claude during indexing.
Or use CLIP embeddings with --clip for a fully local alternative (no API key needed).
For videos without captions, add --whisper to transcribe audio locally using Whisper AI.
Scene detection uses frame-by-frame analysis to identify visual cut points
in the video, splitting it into distinct scenes that can be individually searched and downloaded as clips.
How it works: Transcripts are split into ~30-second chunks, and each chunk is converted into a 384-dimensional vector embedding (using all-MiniLM-L6-v2) that captures its semantic meaning. When you search, your query is embedded the same way and results are ranked by cosine similarity — how close your query's meaning is to each chunk's meaning.
Scores: strong (>= 0.3) — high confidence match, related (0.15–0.3) — topically related, below 0.15 is filtered out as noise.
Before using this page: Run the CLI to index a YouTube channel or videos.
This creates an index.json file
that you load below. Each result embeds the YouTube player at the matching timestamp and gives you
a command to download the specific clip.
pip install youtube-transcript-api chromadb sentence-transformers numpy python-dotenv yt-dlppython embedclipfarm.py index "https://youtube.com/watch?v=VIDEO_ID"python embedclipfarm.py index "URL1" "URL2" "URL3"python embedclipfarm.py index "@channelname" --max-videos 20python embedclipfarm.py index "https://youtube.com/playlist?list=PLxxxxx"python embedclipfarm.py index urls.txtindex.json + transcripts. Check the CLI output for the exact folder path, then load the index.json below.
GEMINI_API_KEY in .env)--vision geminiANTHROPIC_API_KEY in .env)--vision claude--clip--whisperpip install pyannote-audio + HF_TOKEN in .env)--whisper --speaker-id--vision-prompt "Describe the fashion and clothing visible in this frame"--no-punctuatepython embedclipfarm.py index "@channel" --vision gemini --whisper --speaker-id --max-videos 20
python embedclipfarm.py clip VIDEO_ID --start 10 --end 41
Download a specific segmentpython embedclipfarm.py search "query" --download ./clips
Search and download all matching clipsFull docs: github.com/artificialnouveau/embedclipfarm
Load index.json
Drop your index.json file here, or click to browse