Snowpark & Scala — Snowflake experiment
I have been very excited by #Snowpark since Snowflake announced it, and finally I have the chance to try it out for myself.
It’s fun to write Scala and actually seeing it execute directly on #snowflake without a SparkCluster and NOT moving any data around.
My experiment was to rewrite SQL to Scala to see if “less is more”.
I took some SQL from my dbt training and tried to do the similar job on Snowpark.
SQL from dbt training
with orders as (
select * from {{ ref(‘stg_orders’ )}}
),
payments as (
select * from {{ ref(‘stg_payments’) }}
),
order_payments as (
select
order_id,
sum(case when status = ‘success’ then amount end) as amount
from payments
group by 1
),
final as (
select
orders.order_id,
orders.customer_id,
orders.order_date,
coalesce(order_payments.amount, 0) as amount
from orders
left join order_payments using (order_id)
)
select * from final
And as Scala code
val snowflake_session= getSnowflakeSession()
val stg_orders_df = snowflake_session.table(“STG_ORDERS”)
val stg_payments_df = snowflake_session.table(“STG_PAYMENTS”).filter(col(“STATUS”) === “success”)
val payments_sum_amounts_df = stg_payments_df.groupBy(“order_id”).sum(col(“amount”))
val fct_orders_df = stg_orders_df.naturalJoin(payments_sum_amounts_df, “left”)
val fct_orders_scala_df = fct_orders_df.rename(“AMOUNT”, col(“SUM(AMOUNT)”))
fct_orders_scala_df.write.mode(“overwrite”).saveAsTable(“analytics.dbt_jverdier.fct_orders_scala”)
My first impression
So my first impression is:
- Snowpark and scala is easier to read (for me at least)
- Less is more (9 lines in Scala, 23 in SQL)
- Adding reusability and error handling will provide more robust and better code
I am sure there are many improvements in above, but it has been some years since I did a lot of programming :-)
Try it out and if you have some comments let me know :-)
Jan
#snowpark Snowflake-inc. Tomáš Sobotík