Optimising BigQuery — Part 1
Optimising the Storage Billing Models
I’ve been working with Google Cloud Platform (GCP) for several years, and BigQuery still amazes me with what it can do. Snowflake is also a great platform, but if you’re already using GCP, BigQuery is a fantastic choice because it integrates so seamlessly. Now, if you’re looking for a true multi-cloud solution, Snowflake might be the way to go — but that’s a conversation for another time.
Optimising BigQuery for cost and performance isn’t easy and can take a lot of time. It’s a continuous effort that data architects, domain architects, or data engineers need to tackle regularly.
With many companies moving toward a data mesh strategy, it gets even trickier. Each domain is now in charge of its own optimization, and they might not have the deep expertise needed to handle BigQuery’s complexities.
The upside? A lot of companies using BigQuery don’t actually need to dive deep into optimization. If your data amounts and query needs are small, and each query costs around €0.0001, spending time and effort on storage and cost optimization might not be worth it.
So, let’s get started!
The different storage billing models
BigQuery offers two storage models: logical storage and physical storage. While both provide the same performance, they differ in pricing. Also, keep in mind that the first 10GB of storage each month is free.
Active physical storage in BigQuery costs twice as much as active logical storage, which is the default option. So why consider the physical billing model? The key difference lies in data compression: logical storage is billed based on the uncompressed size of your data, while physical storage is billed based on the compressed size. This means that even though physical storage has a higher per-gigabyte cost, you might end up paying less overall if your data compresses well.
To make things more interesting, features like time-travel and fail-safe are included at no extra cost in the logical storage model. However, if you choose physical storage, these features aren’t included and must be added to your final price.
So, when should you consider using the Physical Storage billing model in BigQuery?
First off, take a look at the size of your tables. If they’re under 1TB, it’s probably not worth the effort to switch. For many companies, 1TB is a substantial amount of data, so changing billing models might not make a big difference.
The key factor here is the compression rate of your data. If your data compresses well — meaning you have a high compression rate — it might be beneficial to switch to the physical storage model.
Let’s walk through a simplified example without using partitioning (though you definitely should — I’ll cover that in “Optimizing BigQuery — Part 2”).
Imagine you have 100TB of active data. Using the default logical storage billing model, this would cost you €2,000 per month. In reality, much of this data will move to long-term storage over time, reducing the cost by 50%. Remember, long-term storage applies to data that isn’t updated; running queries on it doesn’t shift it back to active logical storage. That’s another excellent reason to use partitioning!
Now, consider the physical storage billing model. The same 100TB of active data would cost €4,400 per month. However, with a compression rate of 90%, you’d only pay for 10TB, which amounts to €440 per month. You’d need to add the costs for time-travel and fail-safe features, but we will ignore those in this simplified example.
In this scenario, by choosing the right storage billing mode, you could save €18,720 per year. Saving that amount annually is probably worth your time🙂
Enjoy analysing BigQuery Storage cost, and if you this far in the post, drop me an invite on LinkedIn to connect :-) https://www.linkedin.com/in/janverdier/
I have developed a BigQuery Optimiser application, it is available for free, ping me if you are interested.