Show HN: Datalake for Computer Vision Projects Buddhika, Kelum, and Chong Han here. We are building a self-hosted data infrastructure platform for computer vision. Our community page is https://ift.tt/EJ3OFdw In the past, we worked on a couple of high-scale computer vision projects in retail, farming, and hospitals in various capacities. These projects involved 2D object sections, 3D object tracking, and more advanced 3D perception. Like other CV Engineers, we observed a common factor during these projects: one needs a large volume of high-quality data to build a production-deployable CV system. Our biggest challenge was not having a robust data infrastructure to handle large volumes of data. Our S3 buckets were like a data swamp; we had so much raw image and video in storage buckets without tracking. Instead of working on CV, we had to develop tools for data operations. We understand that many of us have our own custom scripts and stitch them together to make things happen in the CV pipeline. However, it is brittle and cumbersome to maintain. We wanted to build a system on top of the cloud buckets such as S3 that store all file indexes, labels, metadata attributes, inference outputs, model training outcomes, and literally anything related to machine learning/computer vision. This makes it possible for us to search for anything and consume efficiently. This behaves as a DataLake (by the way, "DataLake" is an overused term). All other downstream processes in the CV pipeline can access data more efficiently via SDK and can also return data back to the Lake (e.g., training/inference outcomes). The reason we made it self-hosted is to address data security and privacy concerns. Since data is fundamental to AI, we believe that companies and organizations should have complete control over it. Currently, we support AWS, GCP, and Azure cloud buckets; soon, we will support local storage. We ship this as a Docker container so you can just install it on any VM or local server. The installation script will do all the configuration automatically. The Python SDK and documentation are available but not perfect yet. We’ve launched this under MIT and Elastic licenses so any developer can use it. Our goal is not to charge individual developers. We make money by charging a license fee for things like multiple users, multiple buckets, scalability with K8, and providing support. Give it a try: https://ift.tt/EJ3OFdw Let us know what you think. July 22, 2023 at 04:45AM
Show HN: Datalake for Computer Vision Projects https://ift.tt/U3XkDmH
Related Articles
Show HN: Shadcn/pro – Advanced Next 15 and React 19 SaaS starter kit https://ift.tt/v9ZOiMKShow HN: Shadcn/pro – Advanced Next 15 and React 19 SaaS starter kit H… Read More
Show HN: Printing Chess Boards in a Terminal https://ift.tt/2rpxLB9Show HN: Printing Chess Boards in a Terminal I'm doing some fun chess-… Read More
Show HN: SlateDB – An embedded storage engine built on object storage https://ift.tt/Vrx5KDCShow HN: SlateDB – An embedded storage engine built on object storage … Read More
Show HN: An extensive set of RAG implementations+many different strategies https://ift.tt/PEThaUSShow HN: An extensive set of RAG implementations+many different strate… Read More
Show HN: AnimeGenAi – AI-powered anime style image and video generator https://ift.tt/pFnzfCVShow HN: AnimeGenAi – AI-powered anime style image and video generator… Read More
Show HN: My Care Voice – We Help You Prepare for the What Ifs https://ift.tt/4vd2xj5Show HN: My Care Voice – We Help You Prepare for the What Ifs We are e… Read More
Show HN: Beating OpenAI's structured outputs on cost, accuracy and speed https://ift.tt/qj39JzGShow HN: Beating OpenAI's structured outputs on cost, accuracy and spe… Read More
Show HN: AI Bartender in a Virtual Bar https://ift.tt/sjci3aqShow HN: AI Bartender in a Virtual Bar Note: The avatar will talk to y… Read More
0 Comments: