Show HN: A vector database with semantic SQL-like filtering Hi HN! It’s always bothered me that there’s no real equivalent of SQL WHERE for vector content. Filtering is one of the cornerstones of a modern database — but vector DBs only support either top-k sort, which is only useful for fuzzy search, or metadata filtering, which isn’t semantic. I’ve found myself wanting all the results matching my semantic query, not just k! Aside from data analysis, it's relevant if you’re trying to do any LLM reasoning: you don’t make good decisions or reach good conclusions by considering a small subset of information. So, we’ve designed a filtering primitive on top of vectors and assembled a demo on customer reviews from Trustpilot, Yelp, App Store, etc. You can select any brand/restaurant/app, and slice the review data however you want. The filter should find all matching documents, not just the top-k. Check it out at https://ift.tt/TuJhYrS ! Not super optimized yet, and really just an exploration, but hopefully gets the point across. FAQ: - Can I try it on my own data? Sure, shoot me a message at hello [at] emberml [dot] com. - How does it work? We’ve built a custom vector-based index, and we learn a high-quality decision boundary between relevant and irrelevant vectors at query time. You can think of it as forming a few-shot classifier each time. - What’s the catch? It’s far slower and less scalable than KNN/ANN right now. But I’d rather solve quality before trying to scale up quantity; tbh I’m not satisfied with vector DB performance even at @ N=1,000. A hot take, maybe? - Why don’t you just classify the data beforehand? Unstructured data has too many degrees of freedom, so it’s hard to anticipate every search/filter a priori. Our approach is somewhat analogous to schema-on-read. https://ift.tt/SOz9nKh September 14, 2023 at 11:34PM
Show HN: A vector database with semantic SQL-like filtering https://ift.tt/WiV14M9
Related Articles
Show HN: Integrate Discord with Jira https://ift.tt/lyLKfTgShow HN: Integrate Discord with Jira Looks like growing number of comp… Read More
Show HN: Noisy Nest Free white/pink/brown noise generator https://ift.tt/UZPqHN8Show HN: Noisy Nest Free white/pink/brown noise generator Hi All, i wa… Read More
Show HN: A map that tells you if a NYC cafe has WiFi, a restroom, and an outlet https://ift.tt/HEphLv4Show HN: A map that tells you if a NYC cafe has WiFi, a restroom, and … Read More
Show HN: Program ESP32s in Nim https://ift.tt/69lUSOrShow HN: Program ESP32s in Nim https://ift.tt/NwcWflQ October 1, 2023 … Read More
Show HN: Raiseto – Discover and Share Ideas https://ift.tt/6jNdkbQShow HN: Raiseto – Discover and Share Ideas https://raiseto.com/ Septe… Read More
Show HN: A C++ dump func. that can print multi-D vectors, maps, tuples, and all https://ift.tt/tOsQiFHShow HN: A C++ dump func. that can print multi-D vectors, maps, tuples… Read More
Show HN: Dolce – Get notified if something (bad) happens to your containers https://ift.tt/aZKQ3nLShow HN: Dolce – Get notified if something (bad) happens to your conta… Read More
Show HN: Stargazers Reloaded – LLM-Powered Analyses of Your GitHub Community https://ift.tt/e9nwgFWShow HN: Stargazers Reloaded – LLM-Powered Analyses of Your GitHub Com… Read More
0 Comments: