Show HN: Open-Source Zero-Shot Image Model Server Enabling Model Feedback Hi everyone! Here is an open source implementation of a decently performant server hosting zero-shot image models (CLIP for image classification, OWL-ViT-ST for object detection), with an extra algorithm to allow users to give the models feedback when they make mistakes! We built a company off this flavor of tech two years ago and have clients who are currently using our commercial API. We are now moving on to other projects but want to make sure our clients still have access to the approaches that they've grown to rely on, so we're open sourcing a simple implementation that they'll be able to use after we've shut down our hosted API! I used to work at a robotics startup. After a while it seemed clear that the biggest limiting factor in our ability to ship new models wasn't innovation on model architecture, it was access to relevant, high-quality training data. Around that time CLIP was released, which got me thinking about the idea of having models with world-knowledge baked in so as to reduce the amount of training data required. A year later when Stable Diffusion dropped, my cofounder Ben Brooks and I took the plunge and founded DirectAI, where we worked on building ways to get performant models without collecting any training data, using the knowledge stored in pretrained models instead. In this implementation, we replace the linear classification head typically used in zero-shot image classifiers with a modified nearest neighbors method that lets you use multiple examples (both positive and negative) per-class to make sure the decision boundary the model is using is more aligned with what you had in mind. Our clients have found it very useful for things from interior design to content moderation to sports analytics, building models that are either too niche to be supported by a traditional cloud-hosted computer vision API or are subtly different from the models that existing cloud APIs host. For example, one of our clients wants to filter out all images containing alcohol. Hive has an API for that, but Hive explicitly allows red solo cups that don't obviously have anything alcoholic in them, whereas our client wanted to filter those out too! Feedback is welcome! There are still bugs in the Gradio frontend / codebase in general, but I have a deadline and need to be working on new stuff at a new job starting Monday so I thought I would just go ahead and get it out there! I've never tried to publish a real open source piece of code before and I must admit I am quite nervous! https://ift.tt/iDROU09 October 20, 2024 at 12:21AM
Show HN: Open-Source Zero-Shot Image Model Server Enabling Model Feedback https://ift.tt/wXRzblP
Related Articles
Show HN: Polestar Finder https://ift.tt/rvkn2lRShow HN: Polestar Finder https://ift.tt/hPE6J8w October 11, 2022 at 12… Read More
Show HN: Record and play back your pipes (debugging) https://ift.tt/jm3hvU8Show HN: Record and play back your pipes (debugging) https://ift.tt/fW… Read More
Show HN: AI-Generated Photography https://ift.tt/p9Rk1gPShow HN: AI-Generated Photography https://nyx.gallery/ October 12, 202… Read More
Show HN: Cito – Actionable data observability for every data team https://ift.tt/NxKLwepShow HN: Cito – Actionable data observability for every data team http… Read More
Show HN: Komorebi – A tiling window manager for Windows 10/11 written in Rust https://ift.tt/RgcHvlsShow HN: Komorebi – A tiling window manager for Windows 10/11 written … Read More
Show HN: Using AI to write picture books https://ift.tt/bLcQ79eShow HN: Using AI to write picture books Ever wanted to create your ow… Read More
Show HN: InvokeAI, an open source Stable Diffusion toolkit and WebUI https://ift.tt/q3RL97QShow HN: InvokeAI, an open source Stable Diffusion toolkit and WebUI H… Read More
Show HN: Reverse Engineering an Old Digital Back Raw File Format https://ift.tt/1mBPtjYShow HN: Reverse Engineering an Old Digital Back Raw File Format Perha… Read More
0 Comments: