This is the third part in an on-going series of Assessing Analytical Maturity – Are your Analytical efforts Viable? If you are new to this series, we recommend reading the first part here and the second part here.
Our last post was dedicated to establishing the power of data and more so how ‘Data is the new oil’. If data is in fact oil, then the data infrastructure requirements form the machinery required to mine, refine, and generate value from it.
Before we delve into how data infrastructure plays its part in assessing analytical maturity of an organization, let’s revisit the lifetime process of data. From its point of origin to the point where it is consumed by business users.
Gartner analyst Doug Laney introduced the 3Vs concept in a 2001 MetaGroup Research Publication, 3D data management: Controlling data volume, variety and velocity.[1] Two extra Vs have been proposed which account for the variability (the range of values in a large dataset) and value which pertains to the need for valuation of enterprise data.
In the Zettabyte era we currently live in, it is imperative that we build a unified platform that supports this data as well as provides access to the concerned analytics and business users.
Importance of data infrastructure- Data sources:
There is a myriad of data sources present today which are being continually leveraged by organizations.[2] Examples are aplenty of structured and unstructured data from internal first-party data sources of CRM and ERP systems or internal websites that is generated and owned by the organizations themselves to second-party sources like point-of-sale (POS) and social media to third-party data sources obtained externally through Nielsen, IRI, etc.
Importance of data infrastructure- Data ingestion:
The pertinent question that arises for organizations is how to extract value from it. Data ingestion, as the term suggests, is taking in or absorbing the data for immediate use or storage in a database. It involves collating necessary data from the different sources, transforming them into the required formats and loading them onto the storage area for further process. Establishing ETL (Extract, Transform, Load) processes is now critical to ensure the viability of analytical initiatives.[3]
- Extraction – During extraction, the desired data is identified and extracted from many different sources, including database systems and applications
- Transformation – After data is extracted, it must be transformed to suit the requirements using rules, lookup tables, or combining with other data
- Loading – Loading the transformed data into a target database
The analytical maturity of an organization heavily relies on the systems in a data infrastructure to handle this stage of the data lifecycle. Failure at this results in large amount of data being underutilized to provide the necessary insights that every business user craves.
Importance of data infrastructure- Data storage:
The next stage of data storage pertains to the recording (storing) of information (data) in a storage medium. Largely used to store data in a relational form, databases form the foundational layer on which various data analytics software run to serve the analytics needs of organizations.
To list a few examples for data storage means and methods:
- HDD – Disks where huge information is stored in TBs (Terabytes)
- Data Warehouse – A large store of data accumulated from a wide range of sources within a company and used to guide management decisions
- Cloud storage – It is a storage space on commercial data centre accessible from any computer with internet access. It is usually provided by a service provider. A limited storage space may be provided free with more space available for a subscription fee. Examples of service providers include Amazon S3, Google Drive, Sky Drive, etc.
- Data lake – It is a storage repository that holds a vast amount of raw data in its native format until it is needed.
Importance of data infrastructure- Data Analysis:
The penultimate stage of Data Analysis is the most relatable to data scientists and has been covered extensively in multiple publications. It is the inspection, cleaning, transforming, and modelling of data with the goal of discovering useful information and interesting insights to supporting decision-making.
To quickly recap on the types of data analysis:
- Descriptive – Provide insights into the past and answers ‘What has happened?’
- Diagnostic – Delves into reason for a behavior and answers ‘Why something happened?’
- Predictive – Apply past patterns to predict future and answers ‘What could happen?’
- Prescriptive – Advice on possible outcomes and answers ‘What should we do?’
There are numerous examples of the various tools and technologies applied to solve these problems. From RStudio an open-source integrated development environment (IDE) for R to Python to SAS (developed by SAS Institute) to PySpark to SPSS, the list is endless. Each of them with their unique statistical computational powers, data handling capabilities and graphics equip data scientists with the necessary tools required for advanced analytics.
Importance of data infrastructure- Data Consumption:
The final stage of Consumption deals with presenting the results or the insights generated during the analysis to the business users, in a consumable format, to optimize processes and take data-driven decisions.
Consumption largely consists of three categories:
- Reporting and BI – The analyzed data can be visualized using various Business Intelligence applications through which the business gains insights. Examples could include Tableau and PowerBI reports and dashboards
- Productization – Product Development enhances the ideology behind making newer and better products
- Mobile-First – In the fast-moving and the ‘on the go’ world of today, it allows for executives to view and make decisions with mobile-friendly applications for reporting
As mentioned in our first post, the success of analytical initiatives is dependent on the leadership of the organization. Hence, it is vital that there is an owner (usually a CDO or CTO) who has a team charged with keeping the platform clean, the processes in place, systems working which are up to date and compliant to governing policies.
Also, mere focus on maintaining status quo spells a recipe for disaster and an inevitable failure. There must be a dedicated innovation and exploration team which keeps up with consumers’ evolving requirements that drive the latest trends in technology. For instance, clients’ demand to exercise absolute control on the solutions that they procure is paving the way for the advent of contextual AI. This necessitates the deployment of a team that can continuously work towards incorporating the latest solutions whilst evangelizing with the users.
Data Infrastructure forms the pillars, built on the foundational brick of data on which analytically mature organizations are built. Our next part will focus on the most critical aspect of people without whom these structures stand to fail. Stay tuned for more.
Bibliography:
1. Lutkevich, Ben, and Ivy Wigmore. “What Are the 3 V’s of Big Data?: Definition from TechTarget.” WhatIs.com, March 3, 2023. https://www.techtarget.com/whatis/definition/3Vs.
2. Bridgwater, Adrian. “The 13 Types of Data.” Forbes, July 5, 2018. http://www.forbes.com/sites/adrianbridgwater/2018/07/05/the-13-types-of-data/#21ba35f33362.
3. Beal, Vangie. “What Is ETL?” Webopedia, July 22, 2022. https://www.webopedia.com/definitions/etl/.