Back to the Basics of Everything Data — Technology Expansion, Snowflake, and the Modern Data Cloud
- September 16, 2022
In the second of a two-part series, David Hrncir, regional technical expert at NTT DATA, and Ajay Bidani, digital enablement and insights manager at Powell Industries, cover a wide varity of topics on technology expansion, challenges in the modern data cloud, data warehouses, data governance and more.
A Quick Recap
We recently traveled to fabulous Las Vegas to attend and sponsor the incredible Snowflake Summit 2022. As an Elite Snowflake partner, we have a great working knowledge of Snowflake, and the capabilities of the Snowflake Data Cloud help us to succeed in our belief that we can do data better, together.
We had the perfect opportunity to present a live episode of Hashmap on Tap — our data podcast that covers a wide variety of topics in data science. This Snowflake-centered, exclusive podcast featured presented by David Hrncir, Ajay Bidani, (and one of our clients). They explored technology expansions, challenges in the modern data cloud, a debate on the state of data warehouses, data governance, and more. So, without further ado, let’s dive in and see what insights they have to offer.
Technology expansion
Have you heard of the ADKAR methodology? (Yes — another acronym, we know, but they work). If you haven’t heard of the terminology, David says that you may want to subtly send the definition to the leadership team. What is it and what is it used for?
- ADKAR Methodology
- Awareness
- Desire
- Knowledge
- Ability
- Reinforcement
David explained that it’s a methodology used for building data culture within your organization. This is critical when thinking about the modern data cloud and all that it encompasses, such as data integration and the modern data warehouse.
Challenges of the modern data cloud
The data cloud is a big deal to Ajay — he says that it has “permanently changed my outlook” on the way data is done. In his experience, two of the biggest challenges are transformation and modernization. The ideas of being able to scale, remove friction, or not worry about building infrastructure have been game-changers. Just a few years back, these were roadblocks faced by many organizations that made it difficult to make regular progress, but Ajay offers simple advice: “Continue to look further ahead.”
Snowflake and the data cloud
David recalls when we first partnered with Snowflake and shares his first thought when seeing its architecture, “This seems so simple, extremely logical…and makes so much sense.” In his opinion, building for the cloud, particularly when it comes to Snowflake and the Snowflake Data Cloud, has changed the mindset around what we can do right now and how we can do it.
Data warehouses: in or out?
It’s important to recognize the differences between the data cloud and a data warehouse. Many people take data warehousing for data cloud, but they’re not exactly interchangeable. As Ajay explains, it’s a term to describe where the data will go as a part of an essential repository.
David and Ajay both agree that they still use the term “data warehouse” to some degree because it is a vital capability of the data cloud. David uses the term to describe a type of “workload” within an enterprise. For Powell, Ajay says they use the terms “data warehouse” or “enterprise data warehouse” to denote workloads built in the past where “data cloud” is used for present and future workloads.
While the jury is still out on the terminology, David and Ajay’s perspectives help to shed light on technology expansion.
Driving innovation
What factors help to drive innovation, especially when it comes to the data cloud?
Experimentation is certainly one of them, and Ajay offers some valuable insight into doing it right. He advises that one of the best things is to remain practical about the progression and know how quickly things can move. It’s more important to start off at a slow pace and make sure people have bought into it and understand the necessary processes. Once this mindset is in place, the organization can participate to drive innovation and pick up the pace to achieve the desired outcome.
In David’s opinion, “innovation is the key” when talking about anything in data science, especially when it comes to Snowflake, data warehouses, tools, etc. Innovation is helping to drive new workloads and technologies to expand the datalink concept — which is making tsunami-size waves in data science. It’s imperative to understand the technology, business, and leadership in making decisions that will serve your goals — if something is not the right fit, move on to find something that is.
One example of this innovation at work is the ELT process. ELT has been a “source of positive things” for Powell. Benefits include reducing complexity and bringing in more data without having to stack processing in numerous places. There are some challenges that come along with it, such as increased volume of data and different types of data. However, in Ajay’s world, these are good challenges to have due to the potential opportunities that await.
How can I make business decisions with innovation in mind?
Ajay’s advice here is to find the right tooling, the right people, and the right process to get up and running as quickly as possible. However, just because you have great tools and know what’s possible doesn’t mean things will happen fast — the last thing you want is for the business to get caught flat-footed.
In the past, it was common to jump to a data decision — for example, deciding to waterfall a process and be a DI project. The challenge here is that things have shifted in the market today — data governance is now growing in popularity primarily because of the influence of time-to-market expectations.
So what’s the deal with data governance?
And why is this such an important topic in data right now?
David explains that in today’s data space, many data governance issues stem from having vast numbers of data sets derived from tunnel vision based on agility. One or more parts of the organization wants to expand and create more features (which is great), but this typically leaves other parts of the organization wondering about some of the “- ITYs” such as accountability, security, validity, traceability, quality, discoverability, usability and observability.
While Ajay states that Powell handles data governance “pretty well”, he also admits that discoverability is their biggest challenge. In his experience, the quality and usability of data goes down when consumers are unable to determine which dataset they are to use for a particular purpose — even though access is fully granted. Additionally, this leads to shadow or independent governance models/policies causing further data confusion.
David says that data governance has been a key topic in most client discussions he has had over the last year or two. He states that if any of the below questions are being asked, you may not have a data problem — you most likely have a data governance problem.
- What dataset should I use?
- Is there sample data for this dataset?
- What are the data sources for this dataset?
- How was this data curated?
- Who is the data owner?
- Who should or should not be seeing this data?
- How often is the data updated/maintained?
- What is the purpose of this dataset?
Agility and speed are great for DI processes, but that’s only half the battle. If consumers are unable to find, gain access, and understand the data that’s available, how effective is an organization in terms of data?
Ajay admits that data governance can be “a little intimidating” but oftentimes, looks can be deceiving and it’s not as scary as it seems. In his experience, it can be easy to overlook the minute details, but it’s imperative to be mindful so these details don’t slip through the cracks.
By staying vigilant to the details and processes when it comes to data governance, this helps build a level of trust between the business and technology teams. Ajay says that “Having more things that you can actually see [in terms of datasets] will make a big difference in trust.” Increasing visibility of these details is crucial in helping individuals understand why data should be trusted, and in turn, why they can trust their own teams and other teams within their organization.
David wraps this up by saying that many companies view data governance initiatives as “what is required for us to do legally”. While that is one aspect, data governance is truly a key component of your comprehensive data strategy. Not only is governance implemented to avert data breaches and theft, for example, but it should also be implemented to promote the effectiveness of data use, data culture, and innovation.
It's worth noting that NTT DATA has many data governance vendor partners and has recently performed some benchmark studies on data governance vendors.
Data mesh
In the world of data and data hybrids, data governance isn’t the only buzzword on the rise — data mesh is appearing pretty frequently, too. So, what exactly is data mesh? According to David, “it’s a mind shift.” Think —SOA/MSA++. The core principles of data mesh are:
- Domain-driven architecture
- Data as a Product
- Self-Service Infrastructure as a Platform
- Federated Computational Governance
David gives an example in which the sales team are data owners of a set of data. They write it, curate it work with it, etc. (or they lead the projects for the data engineers that do) This subset of data is built for use within an organization — the sales team is not building the data products simply because they want it, rather, they want everyone to use it. There is a demand and a need.
For people to use the sales team’s data products, the organization needs to have a self-service mechanism (platform). This self-service platform ensures that those who need to use data as a product have a quick way to get to it. Finally, the last piece that feeds into this is federated access which governs who can access the sales team’s data.
With these four basic principles in place, everybody that needs the data should be able to get to it quickly within an organization. David explains “if they can’t, you’re probably not doing it correctly.” He compares data mesh to the concept of an SOA architecture coupled with DI (data integration) and very dependent on development technology but with technology-agnostic consumption.
When thinking about domain teams, they each serve a different purpose. For example, the sales and marketing teams are going to be doing their day-to-day operations while concurrently developing data products for the organization. This allows the sales and marketing teams to be data owners and enables them to present what they have to IT [and other departments within the organization] to partner and work together on new ideas. At the same time, this takes some of the work off IT’s plate, and instead of keeping new ideas in one domain team, it encourages the collaboration of ideas across different teams in the organization.
Ajay agrees adding that “getting IT out of trying to solve everything for everyone is definitely something we need more of.” He admits that the challenges don’t get any easier to solve and they’re never really removed — one challenge is just traded off the list for another one. In a way, it’s sometimes like taking two steps forward and one step back.
He shares that Powell is working towards engaging the business to be a part of solving problems, as opposed to simply relying on technology teams to solve everything. The key to success here, in his opinion, is to find a way to take these steps consistently with people who are interested and invested in making it happen.
Data mesh vs. data fabric
New ideas, such as data mesh, are changing the paradigm of how data projects are done today compared to the last 30 or 40 years — spreading out responsibilities across interdepartmental teams in an organization. In thinking about these ideas, how does the concept of data mesh contrast to data fabric?
David explains that “data fabric is a much simpler concept in terms of development.” He explains that while they are similar, data mesh is about normalizing data products where consumption is through developed APIs while data fabric is about virtualizing data — where it sits. He says that with data fabric, “You create data as a service (DaaS) very similar to data products, but it’s a virtualized DaaS.” He continues to explain this by saying “You’re still creating and promoting data curation throughout the organization; you’re just not developing the consumption APIs.”
David deepens this explanation with “data mesh is more organization-oriented with heavy development where data fabric is more technology-oriented allowing the virtualization tool to perform all the heavy lifting.” Data fabric allows easier access and consumption of the data (typically), but it’s going to be less refined (as data products). In his opinion, he thinks that “data fabrics are going to be a little bit easier to start with and implement versus data meshes.”
Ajay agrees that there are upsides and downsides to both and that data fabric relies more on technology whereas data mesh relies more on development. In his opinion, it comes down to the talent of your people when estimating what it would take to implement a data mesh or data fabric successfully.
Data applications
No matter how many data warehouses, data fabrics, data meshes, or hybrids exist, consumption and consumption agility will be in focus. This brings David and Ajay to their next (and final) section on data applications (or data apps).
“How can I get quick access or quick data?” This seems like an easy question to ask, but the answer is one that clients most likely don’t want to hear. The problem with this, Ajay explains, is that “we don’t quite have the way to give it to them — yet. The ways to do it easier or faster are just not quite there yet.”
These self-serve technologies are being developed and released, and Snowflake is joining by taking it to the next level, such as with their new Snowsight features. The Snowsight UI now allows you to build and share dashboards, utilize worksheets to access, analyze, and manage your data, monitor activity, administer your account, and do anything and everything ‘data sharing’ via the Snowflake Marketplace. Additionally, the Snowflake Marketplace is no longer just data — they’re able to publish consumable data apps. This allows consumers to instantly visualize what they need without having to first interpret the data heavily.
David recently performed a demo with another of one of our partners — ThoughtSpot. ThoughtSpot is embracing this drive for data apps using self-serve as well as embedded, live analytics (ThoughtSpot Everywhere). The demo involved embedding live, natively drillable visualizations into a React app hosted on a network outside of ThoughtSpot. David shares “these companies are taking this [data apps] to the next level and redefining consumption agility. It’s a new era.”
Wait…That’s a data app?
Be honest, who did a little shopping in the Amazon app during Prime Day(s)? (No judgment here because I did.)
I hate to break this to you —technically you can blame David since I’m simply the messenger — but if you think the Amazon app is simply an ordering portal, it isn’t. It’s also a data app. In the case of Amazon’s app, David explains that “They’re giving you instant feedback on things you want…and things you might possibly want/need based on your actions and selections.”
In fact, there are quite a few apps being used in everyday life that qualify as data apps — we’re just not thinking about them in those terms. As a frequent blood donor, David recalls when he realized he has a data app for tracking his blood donations. Then, there are grocery apps, coupon apps, company portals, landing page, you name it. In his opinion, the ability to get access to data quickly via these data apps is what breeds data culture — turning that ‘useless’ data (you had to be there) into ‘useful’ information.
As for Ajay, he thinks that “data apps are probably the thing that will change a lot of what people want from us (data engineers).” In his opinion, change is inevitable when it comes to data apps, and it’s important to skill up and provide that agility when needed. David adds that “it’s going to change the paradigm of how we think about consumption agility going forward. Exciting times are ahead!”
The wrap-up
I’d like to think that if JFK worked in data today, his famous quote would go something like, “Ask not for what your data can do, but for what you can do with your data!” David and Ajay covered a number of fascinating perspectives that do pose the question of what you can do to help your organization be successful when it comes to data and data culture. You can read part one of this series here to see how Snowflake can help you do data better, together.
Subscribe to our blog