Show HN: Open-Source Zero-Shot Image Model Server Enabling Model Feedback https://ift.tt/wXRzblP

Show HN: Open-Source Zero-Shot Image Model Server Enabling Model Feedback https://ift.tt/wXRzblP

Show HN: Open-Source Zero-Shot Image Model Server Enabling Model Feedback Hi everyone! Here is an open source implementation of a decently performant server hosting zero-shot image models (CLIP for image classification, OWL-ViT-ST for object detection), with an extra algorithm to allow users to give the models feedback when they make mistakes! We built a company off this flavor of tech two years ago and have clients who are currently using our commercial API. We are now moving on to other projects but want to make sure our clients still have access to the approaches that they've grown to rely on, so we're open sourcing a simple implementation that they'll be able to use after we've shut down our hosted API! I used to work at a robotics startup. After a while it seemed clear that the biggest limiting factor in our ability to ship new models wasn't innovation on model architecture, it was access to relevant, high-quality training data. Around that time CLIP was released, which got me thinking about the idea of having models with world-knowledge baked in so as to reduce the amount of training data required. A year later when Stable Diffusion dropped, my cofounder Ben Brooks and I took the plunge and founded DirectAI, where we worked on building ways to get performant models without collecting any training data, using the knowledge stored in pretrained models instead. In this implementation, we replace the linear classification head typically used in zero-shot image classifiers with a modified nearest neighbors method that lets you use multiple examples (both positive and negative) per-class to make sure the decision boundary the model is using is more aligned with what you had in mind. Our clients have found it very useful for things from interior design to content moderation to sports analytics, building models that are either too niche to be supported by a traditional cloud-hosted computer vision API or are subtly different from the models that existing cloud APIs host. For example, one of our clients wants to filter out all images containing alcohol. Hive has an API for that, but Hive explicitly allows red solo cups that don't obviously have anything alcoholic in them, whereas our client wanted to filter those out too! Feedback is welcome! There are still bugs in the Gradio frontend / codebase in general, but I have a deadline and need to be working on new stuff at a new job starting Monday so I thought I would just go ahead and get it out there! I've never tried to publish a real open source piece of code before and I must admit I am quite nervous! https://ift.tt/iDROU09 October 20, 2024 at 12:21AM

0 Comments: