Show HN: Version code, models, & datasets together in GitHub Hi HN! We just launched a GitHub integration that scales your Git repos to handle 100 terabytes of files in a single repo. XetData enables data scientists and machine learning engineers to version code, models, and datasets together. Most teams have glued together clunky workflows using S3, DVC, Git, Git LFS, and other tools and make true reproducibility difficult: https://ift.tt/zWavudA We instead embrace and extend Git so end-users don’t need to learn a new tool and a new set of commands. Our implementation is similar to Git LFS, where we take over the .gitattributes file, push pointers to large files in GitHub, and push the raw, large files to us. We have a few distinct features that we’re proud of that improve the user experience: - Our XetData bot comments on your pull requests to provide links to useful dataset views and model diffs. We’re working on rendering these inside GitHub itself using browser extensions. - Git LFS and similar tools only implement file-level deduplication. We created a new technique called block-based deduplication (published in CIDR’23 conference) specifically for data and ML workflows. The ML lifecycle consists of making lots of iterative changes and our technique helps save storage and time spent downloading and uploading changes. - You can mount large repos to your local machine using git-xet mount for exploratory work. Individual files that are needed are streamed in just in time behind the scenes. We open sourced our implementation of mount and it was well received here on HN: https://ift.tt/j0D5OcL - To give more users access to your data, just add them to your GitHub repo. This is a beta product and we would love all of your feedback. You can find all instructions to try this out here: https://ift.tt/Subn4s0 While we’re in beta, our product is completely free to use. We have a Slack you can join or a GitHub issue tracker. - Slack: https://ift.tt/sK3pwBv - GitHub: https://ift.tt/gCW0fUD November 16, 2023 at 11:56PM
Show HN: Version code, models, & datasets together in GitHub https://ift.tt/M5cFW6s
Related Articles
Show HN: AriFramework – Svelte Without Compiling https://ift.tt/2KxI6lYShow HN: AriFramework – Svelte Without Compiling https://ift.tt/3bCwA4… Read More
Show HN: Pxy – A Go server that proxies websocket livestreams to RTMP servers https://ift.tt/2Voar4yShow HN: Pxy – A Go server that proxies websocket livestreams to RTMP … Read More
Show HN: I created a CS:GO game server hosting platform and launched an MVP https://ift.tt/2VTz2x7Show HN: I created a CS:GO game server hosting platform and launched a… Read More
Show HN: Compare Covid death probabilities to skydiving and other activities https://ift.tt/3eLVS26Show HN: Compare Covid death probabilities to skydiving and other acti… Read More
Show HN: A programmable tooltip on Mac OS https://ift.tt/2VmvhRzShow HN: A programmable tooltip on Mac OS https://ift.tt/2z9wQcV April… Read More
Show HN: Kmdr – Explains the syntax of CLI commands you select in the browser https://ift.tt/2Yt1CIiShow HN: Kmdr – Explains the syntax of CLI commands you select in the … Read More
Show HN: Create and Render 3D models in Go https://ift.tt/34ZsIZbShow HN: Create and Render 3D models in Go https://ift.tt/2VmJVs6 Apri… Read More
Show HN: Platform to let aspiring programmers get their code reviewed by experts https://ift.tt/3bYmmf5Show HN: Platform to let aspiring programmers get their code reviewed … Read More
0 Comments: