Metadata Analysis with Veritas Information Map

At NetApp Insight 2016, I introduced a SaaS-based platform for metadata management and intelligent insights for unstructured data to many attendees while working at the Veritas booth and at a breakout session later in the event. That solution, Veritas Information Map, is now in its third year on the market.

SaaS-based metadata management platforms are all the rage now. You will hear cries of “data analytics”, “metadata analysis”, and “compliance” thrown around, but what you will also hear if you ask the right questions are things like “Let us protect your data, and we will show you some really cool things!” Or “Put your data on our storage devices and we will provide some additional value that is beneficial in other areas of your business.” The key words there are “move your data to our solution.”

Many companies would love to gain insights into their data, but are not willing to uproot their backup solution, or replace their primary storage just so they can gain the benefit of insights. What is desired, is a solution that can provide these benefits in an environment regardless of where the data resides or how it is backed up.

Veritas Information Map provides some really cool insights into the data it collects. Let’s start with a quick view of the data sources we can utilize:

Veritas Information Map Cloud Connectors

Veritas Information Map On-Premises CMIS and Database Connectors

Veritas Information Map On-Premises Native File Server Connectors

Once Information Map has collected metadata from these sources, it can then consolidate the information into a graphical tool that allows you to very quickly navigate throughout your entire data inventory, quickly filtering information in a matter of seconds. Want to find all of the files in your environment that haven’t been accessed in three or more years? Give me five seconds. Want to find all of the email files (.pst, .msg, and about 40 other email types)? Give me five seconds.

With the interactive dashboard, you can quickly narrow down what you are looking for. Information Map provides some valuable insights from the main map, and you can click to dive into any location in your organization for further investigation.

You will see that Information Map has a view of all of the data in your organization across the globe. Across the top we can get some immediate insights into the total storage footprint along with a cost. This cost is fully customizable, allowing you to define how much you spend on each tier of storage, so the costs are accurate to your environment. We also see how much data (and the cost associated) with data that is Orphaned, Stale, or Non-Business.

Stale and Non-Business data is also customizable by you. Your idea of stale data may be data that has not been modified in five years. My idea of stale data may be data that has not been access in three years. It is completely up to you. You can define Non-Business data by file types or file groups. While you may consider media and game files “Non-Business”, I have many customers in the marketing and entertainment industries that would absolutely consider those business-related.

Let’s consider the scenario of a hardware refresh in our Miami location. As part of the refresh, you may consider moving some data to cloud and you may also consider archiving or deleting some data. The policy that we have come up with is as follows:

  1. Delete all data that has not been accessed in four or more years
  2. Archive data that has not been accessed in two to four years
  3. Determine what is left and decide to refresh hardware or move data to another location

We currently have 86TB of data in the Miami location.

Clicking on the Miami location provides immediate details about the location.

Each one of the items is hot-linked and will allow you to navigate deeper into each content source, share, item type, file owner, etc. All in a matter of seconds.

We now filter the data (in a matter of seconds) to find data that has not been accessed in more than four years. We see that we can immediately delete 14TB of data based on our policy to delete data that has not been accessed in more than four years.

We can now look at the data that has not been accessed between two years and four years to see how much data we could archive. This shows us that we can remove an additional 26TB of data from primary storage to give us a total savings of 40TB.

I based my fictitious policies in this blog based on metadata only. To determine the true value of data you must combine metadata analytics with content analytics. We decided to delete all files that haven’t been accessed in over four years. In the real world, we would need to look at the content of the file to help determine if the file could be deleted or if it needs to be retained to match our corporate policy. My next blog article will focus on taking this information to the next level by analyzing and classifying content to help determine how we manage data.