Show HN: I built an open-source data pipeline tool in Go Every data pipeline job I had to tackle required quite a few components to set up: - One tool to ingest data - Another one to transform it - If you wanted to run Python, set up an orchestrator - If you need to check the data, a data quality tool Let alone this being hard to set up and taking time, it is also pretty high-maintenance. I had to do a lot of infra work, and while this being billable hours for me I didn’t enjoy the work at all. For some parts of it, there were nice solutions like dbt, but in the end for an end-to-end workflow, it didn’t work. That’s why I decided to build an end-to-end solution that could take care of data ingestion, transformation, and Python stuff. Initially, it was just for our own usage, but in the end, we thought this could be a useful tool for everyone. In its core, Bruin is a data framework that consists of a CLI application written in Golang, and a VS Code extension that supports it with a local UI. Bruin supports quite a few stuff: - Data ingestion using ingestr ( https://ift.tt/Pry29kV ) - Data transformation in SQL & Python, similar to dbt - Python env management using uv - Built-in data quality checks - Secrets management - Query validation & SQL parsing - Built-in templates for common scenarios, e.g. Shopify, Notion, Gorgias, BigQuery, etc This means that you can write end-to-end pipelines within the same framework and get it running with a single command. You can run it on your own computer, on GitHub Actions, or in an EC2 instance somewhere. Using the templates, you can also have ready-to-go pipelines with modeled data for your data warehouse in seconds. It includes an open-source VS Code extension as well, which allows working with the data pipelines locally, in a more visual way. The resulting changes are all in code, which means everything is version-controlled regardless, it just adds a nice layer. Bruin can run SQL, Python, and data ingestion workflows, as well as quality checks. For Python stuff, we use the awesome (and it really is awesome!) uv under the hood, install dependencies in an isolated environment, and install and manage the Python versions locally, all in a cross-platform way. Then in order to manage data uploads to the data warehouse, it uses dlt under the hood to upload the data to the destination. It also uses Arrow’s memory-mapped files to easily access the data between the processes before uploading them to the destination. We went with Golang because of its speed and strong concurrency primitives, but more importantly, I knew Go better than the other languages available to me and I enjoy writing Go, so there’s also that. We had a small pool of beta testers for quite some time and I am really excited to launch Bruin CLI to the rest of the world and get feedback from you all. I know it is not often to build data tooling in Go but I believe we found ourselves in a nice spot in terms of features, speed, and stability. https://ift.tt/FJgtSBH I’d love to hear your feedback and learn more about how we can make data pipelines easier and better to work with, looking forward to your thoughts! Best, Burak https://ift.tt/FJgtSBH December 17, 2024 at 10:10PM
Show HN: I built an open-source data pipeline tool in Go https://ift.tt/SHAFvtQ
Related Articles
Show HN: Insomnia-like client for SQLite https://ift.tt/3GWD0vDShow HN: Insomnia-like client for SQLite https://ift.tt/3bQ5fO7 Novemb… Read More
Show HN: You.com, private search engine that summarizes the web – built for devs https://ift.tt/3n1cKICShow HN: You.com, private search engine that summarizes the web – buil… Read More
Show HN: Cedille, the largest French language model, released in open source https://ift.tt/3F8dF03Show HN: Cedille, the largest French language model, released in open … Read More
Show HN: Augmented Reality enriched book generated by Artificial Intelligence https://ift.tt/3n0uPGEShow HN: Augmented Reality enriched book generated by Artificial Intel… Read More
Show HN: Document and automate your operations playbooks and business processes https://ift.tt/3HcPSOmShow HN: Document and automate your operations playbooks and business … Read More
Show HN: Synchro Charts, a time-series visualization library https://ift.tt/308dmmQShow HN: Synchro Charts, a time-series visualization library https://i… Read More
Show HN: We tracked every piece of litter in SF’s SOMA District in 2019 and 2020 https://ift.tt/30bdF0wShow HN: We tracked every piece of litter in SF’s SOMA District in 201… Read More
Show HN: Repobeats – insights for your GitHub Repo https://ift.tt/3woDvK0Show HN: Repobeats – insights for your GitHub Repo https://ift.tt/3q6a… Read More
0 Comments: