Bodo Glossary

Snowflake Warehouse Sizes

Snowflake offers different warehouse sizes to accommodate various data processing workloads.

Snowflake's available warehouse sizes range from Extra Small (XS) to 4X-Large (4XL), with varying levels of compute resources (CPU, memory, and storage). The size of a Snowflake virtual warehouse influences both cost and performance, so choosing the right warehouse size is critical.

Right-sizing Snowflake warehouses

Efficient warehouse usage is critical to achieving optimal performance and cost-effectiveness. Here are the key metrics to monitor:

Remote and Local Spillage: Local spillage occurs when your warehouse's RAM is filled, and data or intermediate results need to be written to SSD. Remote spillage happens when the SSD fills up, and data must be spilled to Blob/S3 storage, which is significantly slower. Monitoring for spillage is critical, as it indicates that the warehouse may not have enough resources to handle the workload efficiently. Remediation involves sizing up the warehouse or optimizing pruning ratios to reduce the amount of data processed.

Warehouse Load: You can monitor warehouse load using the _{snowflake.account_usage.warehouse_load_history} table. It shows how many queries are running on a warehouse on average per hour. A very low load suggests that the warehouse is underutilized and may not be cost-effective. An excessively high load (e.g., over 8 for a single cluster or more for multi-cluster) can lead to queueing and performance degradation. Right-sizing involves adjusting the warehouse to match the workload's concurrency requirements. Consolidate and move more work to an underutilized warehouse, or distribute workloads to multiple warehouses if the load is too high.

Data Volume: Data volume, related to the _{scanned_partitions} metric in _{query_history}, is a crucial factor in determining the appropriate warehouse size. Scanning too much data can lead to local and remote spillage and increased query execution times. Scanning too little data can result in underutilized resources and wasted capacity.
It's important to choose a warehouse size that aligns with the volume of data being processed. For example, if a query only scans a small amount of data, it may be better suited for a smaller warehouse to avoid resource wastage.

Which Snowflake warehouse size should I start with?

Starting with a smaller warehouse size like XS (X-Small) and then gradually increasing it based on observed utilization is a cost-effective and prudent approach. This strategy allows you to optimize performance while minimizing unnecessary expenses.

1. Start with XS Warehouse: Begin with an X-Small warehouse size for your initial workload or set of queries.
The XS warehouse provides basic compute resources and is often cost-effective for light workloads or when you're initially testing and developing your data processing tasks.

2. Monitor Utilization: Keep a close eye on the utilization of the XS warehouse. Utilization refers to how much of the warehouse's capacity is being actively used. Utilization metrics can be monitored through Snowflake's performance and usage monitoring tools.

3. Keep an Eye on Sustained High Utilization: If you consistently observe high utilization levels on your XS warehouse, it indicates that the workload may be pushing the warehouse's limits. High utilization can lead to performance bottlenecks, longer query execution times, and potentially queueing of queries.

4. Upgrade Warehouse Size: When you observe sustained high utilization and performance degradation, consider upgrading your warehouse size. Snowflake allows you to easily resize your warehouse to a larger size to accommodate higher workloads and provide more compute resources.

In summary, starting with a smaller warehouse size and then upgrading based on utilization is a proactive and cost-efficient approach to managing Snowflake warehouses. It ensures that you have the right amount of compute resources to handle your workloads without overprovisioning and incurring unnecessary expenses.

If you’re looking to optimize Snowflake costs, see Bodo in action.