Show HN: Version code, models, & datasets together in GitHub Hi HN! We just launched a GitHub integration that scales your Git repos to handle 100 terabytes of files in a single repo. XetData enables data scientists and machine learning engineers to version code, models, and datasets together. Most teams have glued together clunky workflows using S3, DVC, Git, Git LFS, and other tools and make true reproducibility difficult: https://ift.tt/zWavudA We instead embrace and extend Git so end-users don’t need to learn a new tool and a new set of commands. Our implementation is similar to Git LFS, where we take over the .gitattributes file, push pointers to large files in GitHub, and push the raw, large files to us. We have a few distinct features that we’re proud of that improve the user experience: - Our XetData bot comments on your pull requests to provide links to useful dataset views and model diffs. We’re working on rendering these inside GitHub itself using browser extensions. - Git LFS and similar tools only implement file-level deduplication. We created a new technique called block-based deduplication (published in CIDR’23 conference) specifically for data and ML workflows. The ML lifecycle consists of making lots of iterative changes and our technique helps save storage and time spent downloading and uploading changes. - You can mount large repos to your local machine using git-xet mount for exploratory work. Individual files that are needed are streamed in just in time behind the scenes. We open sourced our implementation of mount and it was well received here on HN: https://ift.tt/j0D5OcL - To give more users access to your data, just add them to your GitHub repo. This is a beta product and we would love all of your feedback. You can find all instructions to try this out here: https://ift.tt/Subn4s0 While we’re in beta, our product is completely free to use. We have a Slack you can join or a GitHub issue tracker. - Slack: https://ift.tt/sK3pwBv - GitHub: https://ift.tt/gCW0fUD November 16, 2023 at 11:56PM
Show HN: Version code, models, & datasets together in GitHub https://ift.tt/M5cFW6s
Related Articles
Show HN: Free High-quality TailwindCSS Components. No attribution required https://ift.tt/UeK12GZShow HN: Free High-quality TailwindCSS Components. No attribution requ… Read More
Show HN: Twitch chat in the Terminal https://ift.tt/MXxB0WKShow HN: Twitch chat in the Terminal https://ift.tt/Pz0SIM5 September … Read More
Show HN: Real-Time 3D Gaussian Splatting in WebGL https://ift.tt/8ngTWYOShow HN: Real-Time 3D Gaussian Splatting in WebGL https://ift.tt/2q5Dh… Read More
Show HN: Mavex.ai – Your Personal AI Executive Assistant https://ift.tt/7d35uc4Show HN: Mavex.ai – Your Personal AI Executive Assistant Mavy is your … Read More
Show HN: Slotmachine, to book and free server ports at scale https://ift.tt/zt3dJNVShow HN: Slotmachine, to book and free server ports at scale This is a… Read More
Show HN: Erlmacs – a script to update your .emacs file for Erlang development https://ift.tt/1HapAKWShow HN: Erlmacs – a script to update your .emacs file for Erlang deve… Read More
Show HN: Papersnap – Claude 2-Powered Mind Maps from Research https://ift.tt/IgfCYoMShow HN: Papersnap – Claude 2-Powered Mind Maps from Research https://… Read More
Show HN: Dracula Theme for Miniflux https://ift.tt/jcmX0ntShow HN: Dracula Theme for Miniflux https://ift.tt/itO2lIP September 1… Read More
0 Comments: