top of page

You're Not A Ferret - Stop Hoarding Data

April 2021

Image source Free for commercial use, no attribution required.

You've no doubt heard the term 'data is the new oil'. Or maybe you've heard someone say that data is a commodity. It must be valuable, given how much effort is put into collecting it by just about everyone.


Have you ever uttered something like "We must have a lot of data scattered around that we could analyze to find out if [fill in the objective]?"


So, you set up systems a while ago that automatically collect data on a variety of things. You have tons of logs of activity from your network, traffic on your website, advertising cost and sales revenue data, insurance and legal costs, etc. If you're a DoD agency, you may have been collecting data on the well-being of your service members.


You may have collected the data solely for compliance reasons, or 'just in case'. There may be data you collected to perform one specific type of analysis, such as cost or marketing effectiveness. As the years rolled on, you collected more and more data; so much now that you need to find more storage capacity. So much data collected that now you have lost track of all the data you have, and its value.


Can you list all the datasets you currently have, off the top of your head?


It is likely that you have data you forgot about, in datasets that are sitting idle. For starters, that's a cyber risk: if you forgot you had the data, then you won't notice when it goes missing or is otherwise compromised. It's also a compliance risk: you may be holding datasets that you should have trashed years ago. You may also be paying for storage for all these datasets that you forgot about; kind of like paying for an employee that doesn't really have a job anymore.


What you have is Dark Data.


Yes, there's a term for what you have. It has been defined by several including Gartner as datasets that are collected in the normal course of operation, stored and then forgotten. There are a lot of reasons why you accumulated Dark Data, but now that you know about it there are a lot of things you can do with it. Here are some ideas:


  • Identify the best day and time to perform network maintenance

  • Discover how to improve your staff well-being, reducing your insurance costs and employee turnover

  • See what parts of your website are working, and what parts are not

  • Spot anomalous activity on your network or in your building entry logs

  • Recommend products your customers might like, thus increasing your sales

  • Identify areas of waste in your operations


Just think about all those times you said "We must have a lot of data…;" chances are  you were probably right.


A data custodian and data engineer can help you find and manage your Dark Data.


The first thing you'll need to do is take an inventory of all the data you have. The second thing is to ensure that all current and future datasets are fed into a data pipeline that will store the data in a searchable and usable way. The data custodian can help you find and gather your current datasets, locate external data that might be of interest, and monitor the data inventory for compliance. A data engineer can set up the pipeline to ingest, clean and parse data into structured datasets that are ready for analysis and that can be easily monitored.


Now that your Dark Data is managed, you need to get some value from it.


If the datasets are staged properly, your data analysts and data scientists should be able to search on a particular topic or field and find one or more datasets that have what they need. For example, let's say that you want to figure out how to improve your employees' work-life balance. Your data folks could start with looking at the hours and days employees spend on the network, email traffic times and volumes, building entry/exit logs, and maybe even healthcare costs. The work-life analysis is already up to at least four datasets, and there are likely more that your data teams could add to these. Imagine a world where these datasets are already staged such that your data teams could jump right into the analysis, and maybe even discover others that are relevant.


Stop collecting and hoarding your data like a ferret. Find it and use it to your benefit. Cybele Data Advisory can help you find and use your data; email us at

bottom of page