Show HN: SQLFrame – I ran PySpark without Spark on a SQL database Recently I open-sourced SQLFrame, a DataFrame library that implements the PySpark DataFrame API but removes Spark as a dependency. It does this by generating the corresponding SQL for the DataFrame operations using SQLGlot. Since the output is SQL this also means that the PySpark DataFrame API can now be used directly against other databases without the Spark middleman. I built this because of two common problems I have faced in my career: 1. I prefer to write complex pipelines in PySpark but they can be hard to read for SQL-proficient co-workers. Therefore I find myself in a tradeoff between maintainability and accessibility. 2. I really enjoy using the PySpark DataFrame API but not every project requires Spark and therefore I'm not able to use the DataFrame library I am most proficient in. The library currently focuses on transformation pipelines (reading from and writing to tables) and data analysis as key use cases. It does offer some ability to read from files directly but they must be small although this can be improved over time if there is demand for it. SQLFrame currently supports BigQuery, DuckDB, and Postgres with Clickhouse, Redshift, Snowflake, Spark, and Trino in development or planned. You can use the "Standalone" session to test running against any engine supported by SQLGlot but there could be issues with more advanced functions that will be resolved once officially supported by SQLFrame. Blog post for more details: https://ift.tt/U0pTQN4... Would love to answer any questions or hear any feedback you may have! https://ift.tt/gEPLHpo May 21, 2024 at 06:39AM
Show HN: SQLFrame – I ran PySpark without Spark on a SQL database https://ift.tt/4liObUq
Related Articles
Show HN: Forth, a News Feed for News https://ift.tt/0DItcSuShow HN: Forth, a News Feed for News Hi HN -- I want to share a passio… Read More
Show HN: My first programming project – userscripts to change forum UIs https://ift.tt/I6tOmhqShow HN: My first programming project – userscripts to change forum UI… Read More
Show HN: VueXYZ – Creative coding composables for Vue 3 https://ift.tt/AuX6RYSShow HN: VueXYZ – Creative coding composables for Vue 3 https://vuexyz… Read More
Show HN: Vector Embedding Version Control https://ift.tt/f8XCldvShow HN: Vector Embedding Version Control https://ift.tt/AHDwUm9 March… Read More
Show HN: My first software project- a website to set goals and track progress https://ift.tt/GY7lhFQShow HN: My first software project- a website to set goals and track p… Read More
Show HN: Manta – A tool for FPGA Debugging and Rapid Prototyping https://ift.tt/1b6nwkdShow HN: Manta – A tool for FPGA Debugging and Rapid Prototyping Hi HN… Read More
Show HN: Open-Source Interactive Eclipse Map https://ift.tt/FOdv6XUShow HN: Open-Source Interactive Eclipse Map Link is to the Github rep… Read More
Show HN: Control Panel for YouTube https://ift.tt/skFJe0UShow HN: Control Panel for YouTube Hi HN, I recently released a new br… Read More
0 Comments: