Show HN: Build an open-source computer vision model in seconds using text Hello HN! I want to share something me and a few friends have been working on for a while now — Zeroshot, a web tool that builds image classifiers using text-image models and autolabeling. What does this mean in practice? You can put together an image classifier in about 30 seconds that’s faster and more accurate than CLIP, but that you can deploy yourself however you’d like. It’s open source, commercially licensed, and doesn’t require you to pay anyone per API call. Here's a 2 minute video that shows it off: https://www.youtube.com/watch?v=S4R1gtmM-Lo How/why does it work? We believe that with the rise of foundation vision models, computer vision will fundamentally change. These powerful models will let any devs “compile” a model ahead of time with a subset of the foundation model’s characteristics, using only text and a web-tool. The days of teams of MLEs building complex models and pipelines are ending. Zeroshot works by using two powerful pre-trained models, CLIP and DINOv2 together. The web-app allows users to quickly create our training sets via text search. Using pre-cached DINOv2 features, we generate a simple linear model that can be trained and deployed without any fine-tuning. Since you can see what’s going into your training set, you can tune your prompts to get the type of performance or detail you want. CLIP Small -- Size: 335 MB, Latency: 35ms CLIP Large -- Size: 891 MB, Latency: 276ms Zeroshot -- Size: 85 MB, Latency: 20ms What’s next? We wanna see how people use or would use the tool before deciding what to do next. On the list: clients for iOS and NodeJS, speeding up GPU inference times via TensorRT, offering larger Zeroshot models for better accuracy, easier results refining, support for bringing your own data lake, model refinement using GPT-V, we’ve got plenty of ideas. https://ift.tt/ZjDKcmk November 29, 2023 at 01:48AM
Show HN: Build an open-source computer vision model in seconds using text https://ift.tt/UYCcx2t
Related Articles
Show HN: I'm open-sourcing my game engine https://ift.tt/DJnNKTyShow HN: I'm open-sourcing my game engine Modd.io is a collaborative g… Read More
Show HN: Local fine tuning for Mistral and SDXL, GPU mem/latency optimization https://ift.tt/dqWvsyrShow HN: Local fine tuning for Mistral and SDXL, GPU mem/latency optim… Read More
Show HN: fx-upscale – Metal-powered spatial video upscaling https://ift.tt/pxIiSHVShow HN: fx-upscale – Metal-powered spatial video upscaling Hi! This i… Read More
Show HN: Create and deploy multi-page web app prototypes using chat https://ift.tt/XBUVTjdShow HN: Create and deploy multi-page web app prototypes using chat gp… Read More
Show HN: Live-ish illustration for TTRPG campaigns https://ift.tt/XLvCa5fShow HN: Live-ish illustration for TTRPG campaigns My D&D group ty… Read More
Show HN: Come and create chat rooms about programming and software development https://ift.tt/XlaiPdWShow HN: Come and create chat rooms about programming and software dev… Read More
Show HN: Slack knowledge curator to extract FAQ from discussion threads https://ift.tt/nLOJzGuShow HN: Slack knowledge curator to extract FAQ from discussion thread… Read More
Show HN: Command line tool for extracting secrets from WARC (Web ARChive) files https://ift.tt/iP0CF3gShow HN: Command line tool for extracting secrets from WARC (Web ARChi… Read More
0 Comments: