An AI search infrastructure pairing CLIP and FAISS for natural-language image search, with an Elasticsearch engine and Logstash pipeline alongside it. We redefined search speed for an operator handling large-scale image assets.
Once image assets cross 50,000, 'finding it by filename' isn't a workable method.
The operator needed to manage assets, but designers and PMs looking for 'something like a blue cityscape at night' had to spend real time digging through folders. Manually tagging keywords accumulated as operational cost as the library grew.
Search latency was a second problem. A query taking several seconds broke flow every time.
We vectorize the meaning of the image itself, search by natural language, and route metadata search through Elasticsearch — a dual-engine setup.
CLIP converts images to vectors. FAISS performs similarity search. A user types 'blue cityscape at night' and the sentence joins the same vector space; the closest images surface.
Metadata-based keyword search runs on Elasticsearch, with Logstash handling the data-loading pipeline. The result is sub-0.1-second search across 50,000+ assets.
Images and text share a vector space for similarity search.
Metadata search and data-loading pipeline split into separate concerns.
Both search paths unified behind one API — clients only see one call.