Show HN: Datalake for Computer Vision Projects https://ift.tt/U3XkDmH

Show HN: Datalake for Computer Vision Projects Buddhika, Kelum, and Chong Han here. We are building a self-hosted data infrastructure platform for computer vision. Our community page is https://ift.tt/EJ3OFdw In the past, we worked on a couple of high-scale computer vision projects in retail, farming, and hospitals in various capacities. These projects involved 2D object sections, 3D object tracking, and more advanced 3D perception. Like other CV Engineers, we observed a common factor during these projects: one needs a large volume of high-quality data to build a production-deployable CV system. Our biggest challenge was not having a robust data infrastructure to handle large volumes of data. Our S3 buckets were like a data swamp; we had so much raw image and video in storage buckets without tracking. Instead of working on CV, we had to develop tools for data operations. We understand that many of us have our own custom scripts and stitch them together to make things happen in the CV pipeline. However, it is brittle and cumbersome to maintain. We wanted to build a system on top of the cloud buckets such as S3 that store all file indexes, labels, metadata attributes, inference outputs, model training outcomes, and literally anything related to machine learning/computer vision. This makes it possible for us to search for anything and consume efficiently. This behaves as a DataLake (by the way, "DataLake" is an overused term). All other downstream processes in the CV pipeline can access data more efficiently via SDK and can also return data back to the Lake (e.g., training/inference outcomes). The reason we made it self-hosted is to address data security and privacy concerns. Since data is fundamental to AI, we believe that companies and organizations should have complete control over it. Currently, we support AWS, GCP, and Azure cloud buckets; soon, we will support local storage. We ship this as a Docker container so you can just install it on any VM or local server. The installation script will do all the configuration automatically. The Python SDK and documentation are available but not perfect yet. We’ve launched this under MIT and Elastic licenses so any developer can use it. Our goal is not to charge individual developers. We make money by charging a license fee for things like multiple users, multiple buckets, scalability with K8, and providing support. Give it a try: https://ift.tt/EJ3OFdw Let us know what you think. July 22, 2023 at 04:45AM

World News

Labels Cloud

Hot News

Socialize

Page Nav

Breaking News

News

Sports

Grid

Menu Footer Widget

Featured

Social Plugin

Videos

Text Widget

Populars

Trending Posts Display

Home Layout Display

Contact Form

Contact Us

Ticker

Latest News

Labels

Ad Code

Like Us

Latest

Brexit

Football

America

Total Pageviews

Home Top Ad

Archive

Post Top Ad

Post Bottom Ad

728x90 AdSpace

Slider

Subscribe Us

Ads Place

Ad Space

Footer Menu

Connect WIth Us

Sports News

Games

Category

Sports

Trends

About Us

News By Picture

Politics

Travel

Tech

Music

Games

Ads Place

Iklan Atas Artikel

Social

Pages

Iklan Tengah Artikel 1

Content Marketing

Iklan Tengah Artikel 2

Privacy Policy

Iklan Bawah Artikel

Fashion & Lifestyle

Popular

Show HN: Datalake for Computer Vision Projects https://ift.tt/U3XkDmH

0 Comments: