Show HN: I built an open-source data pipeline tool in Go Every data pipeline job I had to tackle required quite a few components to set up: - One tool to ingest data - Another one to transform it - If you wanted to run Python, set up an orchestrator - If you need to check the data, a data quality tool Let alone this being hard to set up and taking time, it is also pretty high-maintenance. I had to do a lot of infra work, and while this being billable hours for me I didn’t enjoy the work at all. For some parts of it, there were nice solutions like dbt, but in the end for an end-to-end workflow, it didn’t work. That’s why I decided to build an end-to-end solution that could take care of data ingestion, transformation, and Python stuff. Initially, it was just for our own usage, but in the end, we thought this could be a useful tool for everyone. In its core, Bruin is a data framework that consists of a CLI application written in Golang, and a VS Code extension that supports it with a local UI. Bruin supports quite a few stuff: - Data ingestion using ingestr ( https://ift.tt/Pry29kV ) - Data transformation in SQL & Python, similar to dbt - Python env management using uv - Built-in data quality checks - Secrets management - Query validation & SQL parsing - Built-in templates for common scenarios, e.g. Shopify, Notion, Gorgias, BigQuery, etc This means that you can write end-to-end pipelines within the same framework and get it running with a single command. You can run it on your own computer, on GitHub Actions, or in an EC2 instance somewhere. Using the templates, you can also have ready-to-go pipelines with modeled data for your data warehouse in seconds. It includes an open-source VS Code extension as well, which allows working with the data pipelines locally, in a more visual way. The resulting changes are all in code, which means everything is version-controlled regardless, it just adds a nice layer. Bruin can run SQL, Python, and data ingestion workflows, as well as quality checks. For Python stuff, we use the awesome (and it really is awesome!) uv under the hood, install dependencies in an isolated environment, and install and manage the Python versions locally, all in a cross-platform way. Then in order to manage data uploads to the data warehouse, it uses dlt under the hood to upload the data to the destination. It also uses Arrow’s memory-mapped files to easily access the data between the processes before uploading them to the destination. We went with Golang because of its speed and strong concurrency primitives, but more importantly, I knew Go better than the other languages available to me and I enjoy writing Go, so there’s also that. We had a small pool of beta testers for quite some time and I am really excited to launch Bruin CLI to the rest of the world and get feedback from you all. I know it is not often to build data tooling in Go but I believe we found ourselves in a nice spot in terms of features, speed, and stability. https://ift.tt/FJgtSBH I’d love to hear your feedback and learn more about how we can make data pipelines easier and better to work with, looking forward to your thoughts! Best, Burak https://ift.tt/FJgtSBH December 17, 2024 at 10:10PM
Show HN: I built an open-source data pipeline tool in Go https://ift.tt/SHAFvtQ
Related Articles
Show HN: WebGPU + TypeScript Slime Mold https://ift.tt/vtQXuTyShow HN: WebGPU + TypeScript Slime Mold https://ift.tt/wKy6jHR January… Read More
Show HN: Does your food have gluten? https://ift.tt/9OkY83AShow HN: Does your food have gluten? Hey folks! About a couple of mont… Read More
Show HN: Signify – FOSS tool to generate Email signatures (HTML and PNG) https://ift.tt/qkn3V45Show HN: Signify – FOSS tool to generate Email signatures (HTML and PN… Read More
Show HN: Skeet – A local-friendly command-line copilot that works with any LLM https://ift.tt/7m1bR0CShow HN: Skeet – A local-friendly command-line copilot that works with… Read More
Show HN: A 100-Line LLM Framework https://ift.tt/u6dGiD1Show HN: A 100-Line LLM Framework I've seen a lot of comments about ho… Read More
Show HN: Discuo – Anonymous discussions with infinite branching and 24h lifespan https://ift.tt/KgXPEhrShow HN: Discuo – Anonymous discussions with infinite branching and 24… Read More
Show HN: I created a directory of the most durable products in the world https://ift.tt/Imkd37gShow HN: I created a directory of the most durable products in the wor… Read More
Show HN: Open Rewind – POC for audio and screen and video streaming to S3 https://ift.tt/RmZFU0iShow HN: Open Rewind – POC for audio and screen and video streaming to… Read More
0 Comments: