Accelerate Snowflake's data sharing ecosystem with data clean rooms

  • January 31, 2024

The data industry is quickly beginning to realize the importance of sharing data and exploiting its value with other partners. In today's world, data is a company's greatest asset value, and there's enormous potential for innovation in providing and consuming data for new purposes. In this context, Snowflake is becoming the global data leader in developing new cloud functionality through its data-sharing capabilities.

On the other hand, data privacy is a huge challenge facing many industries. At an increasing rate, privacy-conscious consumers and corresponding tightening regulations have severely restricted the handling of data, including personal privacy (ex: HIPAA or GDPR). When it comes to sharing data with third parties, it can be even more difficult.

This dilemma between data sharing and data privacy is now giving rise to a trend called data clean rooms. Keep reading to find out the history of data clean rooms, the features of Snowflake’s data clean rooms and why they are accepted.

What is a data clean room?

A “clean room” is a controlled environment that limits or eliminates the presence of external contaminants to provide ideal environmental conditions for testing and/or production of sensitive products. Clean rooms have been used in semiconductor manufacturing, life sciences and many other industries. This type of environment is completely isolated from the outside world, with limited staff wearing protective clothing and working with limited tools.

While a data clean room is a more abstract concept, the intent is to use it in the same fashion as a physical clean room. The goal is an isolated system environment where data can't be corrupted or leaked. While physical clean rooms are designed to keep the air clean and free of material leakage, data clean rooms are designed – to keep data clean and prevent data leakage.

A data clean room provides aggregated and anonymized user information to protect user privacy with another party. No one can corrupt the data in the data clean room, and no one can take the data out of the room. In the room, data can be jointly handled by multiple parties without providing personally identifiable information (PII), which doesn't violate privacy regulations.

The history of data clean rooms

The birth of data clean rooms began with the deprecation of third-party cookies. In the advertising industry, third-party cookies were typically used to identify and track users. These cookies allow ad publishers and advertisers to display ads that can be customized to user preferences. Also, they can analyze which ads led to purchase behavior. However, increased privacy awareness has led to stronger regulations, such as the General Data Protection Regulation (GDPR) in the EU and the California Consumer Privacy Act (CCPA) in California. On the browser side, Apple’s Safari and Google’s Chrome browsers have each declared that they'll no longer use third-party cookies.

The rapid rise in deprecation threatens to make third-party cookies used to identify customers obsolete. Advertising publishers created the first-gen data clean room with their “walled gardens” concept. Walled gardens are ecosystems and platforms secured by ad publishers. Some examples include Google Ads Data Hub that launched in 2017, Facebook Advanced Analytics in 2018 and Amazon Marketing Cloud in 2019.

Such first-generation data clean rooms inside these walled gardens, provided by big tech platforms like Google, Meta and Amazon, allowed these ad-media giants and advertisers to analyze data within their respective platforms. Advertisers can analyze a publisher’s data that is related to their advertising, within their limited permissions. The publishers don't provide raw data, but only the means and results of the analysis, following the rules outlined in their privacy policies. Using this method, an individual’s user data never leaves the publishers’ room. Advertisers can link their first-party data to the publishers' second-party data for analysis within the data clean room.

A new generation — data clean rooms by Snowflake

A new generation of data clean rooms is now emerging. Its characteristics are that it is neutral, and it covers all data sharing, not just the advertising industry. The need to protect data privacy and use second-party data for data analysis isn't just for the advertising industry. Data in all industries is developing more and more requirements for sharing data while protecting data privacy. Technology vendors are becoming increasingly focused on these requirements. As a result, they’re beginning to offer data clean rooms on their respective platforms.

One of the new largest data clean room vendors is Snowflake. One of NTT DATA’s longtime partners, Snowflake recognized early on the importance of sharing data, and not only offers a cloud data warehouse, but they also operate a data marketplace for its ecosystem with the use of data-sharing technology. However, while it seems sharing data with other companies should be simple, the technical requirements, or privacy settings, prevent this from happening. It's always necessary to protect the privacy of users. Snowflake's implementation of data clean rooms in its data collaboration platform is a natural progression.

Snowflake's data clean rooms differ from previous generations of data clean rooms in several ways. First, Snowflake is neither a provider nor a consumer of data; they're neutral. This gives anyone with any data in any industry the possibility to be a data provider in a data clean room using Snowflake as a platform. Snowflake’s data clean rooms take full advantage of Snowflake’s data-sharing capabilities, which avoid data copying and latency. As a result, both data providers and consumers can analyze data in a “virtual” data clean room, while keeping the data entirely to themselves.

Expanding data clean rooms for the future of the next evolution

When it comes to analyzing data, particularly while maintaining its privacy, the role of data clean rooms has expanded enormously from the previous use case in the advertising industry.

In response to Snowflake's data clean rooms, Amazon and Google have recently announced their own data clean room solutions — AWS Clean Rooms and BigQuery data clean rooms. These new solutions aren't the existing walled-garden type. In addition, other independent data clean room vendors, such as Habu, InfoSum and Optable, are also emerging. Interestingly, some of these vendors use Snowflake for their platforms and are positioned as a top layer in Snowflake's data clean rooms. Samooha, a data clean rooms vendor that used Snowflake’s platform, was acquired by Snowflake in December 2023. It'll take a while before the results of this acquisition are seen in Snowflake's features, but it's just one indication of the potential for change in the industry.

In the face of rising competition, Snowflake continues to make strong investments in data clean rooms due to the outstanding number of capabilities that they offer. Why? Their data clean rooms are in line with Snowflake’s Data Cloud philosophy. Snowflake has been investing in, and implementing, data clean rooms for several years, while continuing to focus on accelerating their developments, including advanced features, within this space in technology.

In June 2023, Snowflake released a preview feature known as Native Data Clean Rooms. While the differential privacy feature has not yet been implemented, these Native Data Clean Rooms will make data clean rooms even easier to use. Additionally, they'll greatly expand Snowflake's overall data-sharing ecosystem. To learn more about this feature, keep an eye out for part two of this blog.

In their latest announcement, from November 2023, Snowflake announced the development of Snowflake Horizon — a new data governance series that includes a differential privacy feature. This is a result of Snowflake's February 2023 acquisition of LeapYear — a differential privacy technology company. Differential privacy provides a mathematically quantifiable way to balance data privacy and data usefulness. It can allow organizations to analyze and share their private data without revealing anyone’s sensitive information and in compliance with data privacy regulations, such as GDPR or CCPA. Their development will continue to reflect industry expectations.

NTT DATA has a deep understanding of Snowflake and its capabilities. Our team has valuable experience with Snowflakes data clean rooms, and as a preview user, we could validate the Native Data Clean Rooms feature. Contact us for a demo, or to learn more about data clean rooms.

Subscribe to our blog

Ryo Shibuya

Ryo Shibuya is a highly accomplished Cloud Data Architect with a wealth of experience in the field. With a career spanning over a decade, Ryo holds multiple certifications in Snowflake and AWS, and he has been instrumental in implementing data analytics platforms and cloud data warehousing solutions for various organizations. Notably, Ryo played a key role in the early adoption of Snowflake technology in the Japanese market, spearheading transformative change. Ryo's expertise extends to providing technical consulting for numerous Snowflake implementation projects and developing educational programs on data analytics platforms. For his supreme Snowflake expertise, Snowflake awarded Ryo the title of Snowflake Data Superhero, and he is considered a true thought leader in the industry.


Related Blog Posts