An Inside Look at NTT DATA’s Cost Optimization Service for Snowflake (COSS)
- January 17, 2023
There are several risk factors involved with determining cost optimization in Snowflake, and as a result, clients may have concerns about whether those risks are worth the reward. The common concerns we hear include the high initial cost, the necessary experimentation with Snowflake’s platform, and Snowflake’s current static, view-only administration dashboards for performance and recommendations on design decisions.
Here are a few of the most frequently asked questions that our team receives from new, Snowflake clients in their first-year journey:
- We followed the best practices recommended by Snowflake in multiple webinars, blogs, and documentation. They do work well, but they come with a huge credit cost. What really went wrong?
- How can optimal performance be achieved at a justified cost?
- What clustering keys should be used?
- What’s the right size of the warehouse to be used?
There are strong query performance optimization methods available in Snowflake including materialized view, clustering, and search optimization. Each optimization method is designed for different scenarios, and if it is used in the wrong scenario, this can lead to an unexpected result. Additionally, selecting the wrong choice of optimization method can lead to higher costs.
NTT DATA understands these questions and concerns, and as a result, we were motivated to build our Cost Optimization Service for Snowflake (COSS). COSS is one of our newest capabilities, and in this blog, we’ll explore more details about COSS, as well as share its block diagram and example dashboards.
Overview
COSS is designed as a Snowflake Native Application. Below, figure 1 shows a high-level block diagram of COSS. The building blocks of COSS include log and usage analytics, a recommendation engine, a scheduler, and a customer interface. In addition to the Snowflake Native Application, COSS has an external component known as Streamlit dashboards.
Fig. 1 High-Level Block Diagram of COSS
The log and usage analytics module is responsible for processing information from snowflake.account_usage and Snowflake system functions with the help of various Snowflake stored procedures and user-defined functions. The log and usage analytics module is also responsible for building useful summaries from this information.
The recommendation engine module is responsible for generating performance optimization recommendations from the summaries built by the log and usage analytics module.
Some of these recommendations include:
- What is the right size of the warehouse for the workload?
- What is the utilization of current warehouses?
- What needs to be done to reduce the consumption of my Snowflake credits?
- What needs to be done to improve the workload’s performance?
- What performance optimization method should be used considering cost/performance benefits? Ex: materialized view, clustering, and/or search optimization.
- What are the optimal parameters for performance optimization methods? Ex: clustering keys, etc.
The scheduler module is responsible for triggering the execution of the COSS application’s log and usage analytics module and recommendation engine module on a periodic interval. The default interval is set to once a week, but the clients can configure the interval’s frequency.
The customer interface module provides the ability for clients to interact with the Snowflake Native Application and COSS. This includes reading recommendations, initializing the COSS application, and executing the COSS application on schedule. For the sake of simplicity, all these interactions with the COSS application are done using SQL.
The dashboard is responsible for visually representing summaries built by the log and usage analytics module. This can be very helpful in making design decisions within Snowflake’s platform. The dashboard was designed using the Streamlit and Snowpark capabilities of Snowflake.
Dashboards
Snowflake Usage Overview
To help identify a costly set of services, users, databases, and queries, users can follow dashboards from COSS. This helps to prioritize optimization efforts.
Warehouse Analytics
Additionally, the following dashboards from COSS will help to identify warehouses that are underutilized, costly, and not in the right size or configurations.
Clustering Analytics
Clustering can substantially improve the query performance by co-locating similar rows in the same micro-partitions. If a table is queried more frequently, then clustering provides more benefits. However, if a table is changed more frequently, then keeping it clustered can be more expensive. Therefore, clustering is generally the most cost-effective for tables that are queried, but not changed, frequently.
The following dashboards from COSS are helpful to finalize the tables for clustering and clustering keys on that tables.
Materialized View Analytics
Materialized views are designed to improve the performance of the most frequently repeated queries. A materialized view is a pre-computed, stored result from a query-defining view. Because the data is pre-computed, querying a materialized view is faster than executing a query against the base table of the view. This performance difference can be significant when a query is run frequently or is sufficiently complex.
Following dashboards helps to make decisions about which view should be defined as a materialized view.
Search Optimization Analytics
Search optimization can significantly improve the performance of point lookup queries. To improve performance for point lookup queries, the search optimization service relies on a persistent data structure that serves as an optimized search access path. A maintenance service that runs in the background is responsible for creating and maintaining the search access path.
The following dashboards from COSS will help to make design decisions about enabling search optimization on a table.
Closing Thoughts
COSS (Cost Optimization Service for Snowflake) is helpful for those looking to optimize the cost of Snowflake’s platform. Even though it isn’t fully automated in its current state, it does provide valuable summaries to help make design decisions. Additionally, the deployment of COSS as a Snowflake Native Application assures governed access to data, and with that in mind, sensitive data doesn’t remain in Snowflake’s platform during the optimization process.
Do you want to learn more about COSS or what NTT DATA & Snowflake can help you achieve with your data? Connect with us for more information.
Subscribe to our blog