Show HN: MyDuck Server – Supercharge MySQL and Postgres Analytics with DuckDB Hello HN! We're excited to announce MyDuck Server, an open-source project that seamlessly integrates the analytical power of DuckDB with your existing MySQL & Postgres databases. *Backstory* Currently, there are no fully satisfactory open-source OLAP solutions for MySQL & Postgres. In the MySQL ecosystem, HeatWave offers close integration, but it's a proprietary, commercial product from Oracle. The Postgres community has seen promising DuckDB-based extensions emerge, including the official pg_duckdb. However, extensions can introduce isolation concerns in mission-critical environments. Consequently, many organizations resort to setting up complex and costly data movement pipelines using tools like Debezium, Flink, or other commercial solutions to replicate data from MySQL & Postgres to OLAP systems (e.g., Snowflake, BigQuery, ClickHouse) or Lakehouses (e.g., Delta Lake + Spark). This approach introduces significant operational overhead and expense. Another emerging strategy is the zero-ETL approach, increasingly advocated by cloud providers. This model simplifies data integration by allowing the cloud provider to manage ETL pipelines, while necessitating reliance on specific cloud ecosystems and services. *Key features* MyDuck Server offers a real-time analytical replica that leverages DuckDB's native columnar storage and processing capabilities. It operates as a separate server, ensuring isolation and minimizing impact on your primary database. Key features include: - Easy Zero-ETL: Built-in real-time replication from MySQL & Postgres with no complex pipelines to manage. It feels like a standard MySQL replica or Postgres standby. With the Docker image, passing a connection string is enough. - MySQL & Postgres Protocol Compatibility: We take this seriously and are working to make this project integrate well with the existing ecosystem around MySQL & Postgres. Currently, it is already possible to connect to MyDuck with standard MySQL & PostgreSQL clients in many programming languages. - HTAP Support: A standard database proxy can be deployed in front of a MySQL/Postgres primary and its MyDuck replica to route write operations to the primary and read operations to the replica. It just works. - DuckDB SQL & Columnar I/O over Postgres Protocol: It's unnecessary to restrict ourselves to MySQL/Postgres's SQL expressiveness and row-oriented data transfer. The Postgres port accepts all DuckDB-valid SQL queries, and you can retrieve query results in columnar format via `COPY (SELECT ...) TO STDOUT (FORMAT parquet/arrow)`. - Standalone Mode: It does not need to be run as a replica. It can also act as a primary server that brings DuckDB into server mode and accepts updates from multiple connections, breaking DuckDB's single-process limitation. *Relevant Previous HN Threads* - pg_duckdb [1] ( https://ift.tt/OU0MVcj ) is the official Postgres extension for DuckDB. It uses DuckDB as an execution engine to accelerate analytical queries by scanning Postgres tables directly. - pg_mooncake [2] ( https://ift.tt/B1IPy3L ) is a Postgres extension that adds columnstore tables for PG. It uses pg_duckdb under the hood but stores data in Lakehouse formats (Iceberg & Delta Lake). - BemiDB [3] ( https://ift.tt/XDGj8SY ) is also a DuckDB-based Postgres replica. Unlike us, they focus on storing data in Lakehouse format. We believe MyDuck Server offers a compelling solution for those seeking high-performance analytics on their MySQL & Postgres data without the complexities and costs of traditional approaches. We're eager to hear your feedback and answer any questions you might have. Let me know what you think! [0] https://ift.tt/CFiX6ID [1] https://ift.tt/kYFWeso [2] https://ift.tt/LeksUa3 [3] https://ift.tt/FYRKaGS https://ift.tt/CFiX6ID November 28, 2024 at 08:20PM
0 Comments: