Blog Post
What is BI? What is data analytics? What is big data?
Author
Mandy Doward
Date
April 18, 2016
Length
20 mins
What is BI? What is data analytics? What is big data and what can I use it for?
BI (Business Intelligence), data analytics, big data – we hear these buzzwords all the time. But what do they mean? How can they benefit your business? And can you live without them?
The reality is that most businesses have been using business intelligence (BI) and data systems in some form for decades, but many probably had fairly silo’d, manual setups. As time has progressed, data has become so fundamental we’ve come up with new terminology for it - and new job roles, like Business Intelligence Analyst and Data Scientist. Here we’ll cover the basics and answer a few of the most frequently asked questions about data and BI…
What do you know about your business?
Can you see at a glance if you're meeting performance goals?
Do you know who your best clients are? Do you know who your worst clients are?
Do you know which products or services are your most successful?
Can you predict your busiest and quietest times?
Do you know the characteristics of your most profitable or most challenging clients?
Which is your most successful target audience?
Are you expediting your services in the required time frame?
Do you have the necessary information available to you when you need it, to make the best decisions for the future of your business?
I could go on. You know what information you ideally need to be able to make your business more successful. Your historic and live business data holds many stories and a wealth of information that could help you fine tune your future business activities to be more informed, efficient, and successful. Do you have this information at your fingertips? Can you access it without technical support?
BI and data analytics provide the answers
Business Intelligence is an umbrella term for the components and services that create, store, interpret and present your business data in a way that enables you to gain insight. Data analytics is about presenting information in a clear and visual way, to enable business users and stakeholders to fully understand their business.
A Business Intelligence & Data Analytics solution will deliver valuable business information and insight, when and where you need it. It doesn’t matter where your live data is located, it can all be accessed from a central BI solution.
The BI world is rich with jargon, so you may be familiar with:
Business Intelligence
Data Analytics
Data Warehouses
Data Marts
Staging Databases
Relational Databases
Multidimensional Databases
Cubes
Big Data
ETL
Dashboards
Data Visualisations
Insights
Impressions
Data Governance
Data Quality
Self-Service BI
Team BI
Corporate BI
Data Mining
Pivot Tables & Charts
On-Premise
Cloud
Analytics Ecosystem
These are just some of the terms you will hear in the world of data today. We'll explain the key terms in this blog, but if you come across others and need help, please do get in touch.
What can BI and data analytics do for my business?
A well-designed BI and Data Analytics solution will:
Eliminate guesswork
Deliver faster answers to business questions
Provide customer behaviour insight
Identify cross-selling and up-selling opportunities
Improve efficiency
Show the true costs of your business
Show where your business has been, where it is now and where it might be going
Help you make faster, more informed, and more effective business decisions
Highlight new insights
Reduce fraud
Help you measure risk more accurately
Implement accurate predictive analysis
Support better budget management
Increase revenue
Ensure you focus resources and reduce costs
What is BI?
Business Intelligence (BI) is all about seeing the stories that your business data can tell! More than just extracting information from business data, really digging deep and identifying trends, successful activities, measuring against goals and predicting future events.
This involves extracting data from data sources, transforming the data to a more useful format, loading the data into a more suitable repository and then analysing and mining the transformed data.
In short, stored information will be used to analyse historic and current data to:
Identify trends in business activities
Predict future business activities
Assess the business performance
Measure successes and failures.
Business Data may be located in a variety of data sources from complex relational database structures to more straightforward flat file style data structures. These data sources could be distributed across different systems and different networks. Today we also face the challenge of “Big Data” resources containing vast quantities of unstructured (No SQL) data – I will talk more about this later. Here are a few examples of data sources:
Relational (SQL) Databases
NoSQL Databases
Hadoop
Spreadsheet
Document Management System
Proprietary Business Systems
Legacy Systems
Third Party Systems
Exported Text Files
Business Information needs to be collated and derived from all the available sources and for this reason it might be much simpler if all the data could be converged into a central information repository.
Once all relevant data has been gathered and stored centrally, the real magic can begin. Relationships must be formed between entities within the business and the numeric values that we are measuring, new business values derived from stored facts will need to be generated. What do we mean by numeric values? Well here are a few examples:
Financial Values
Sales Revenue
Cost of Sale
Running Costs
Income
Measurements
Length
Volume
Area
Temperature
Counts of records/events
Number of customers
Number of orders
Number of callouts
Number of incidents
Average values
Highest values
Lowest values
And what do we mean by entities? Well these are the things that we are measuring by, for example:
Time
Location
Product
Service
Customer
Supplier
Employee
Vehicle
Device
Promotion
So, we have all of our data in one place and we've identified the values, entities and relationships within the data. Now to illustrate it. We want to present our business data with clear, understandable, attractive visuals that people will engage with. This is the data analytics part.
Tables
Matrices
Charts
Dashboards
Reports
Excel Pivot Tables & Charts
Slicing and Dicing
BI components explained
Business Intelligence Systems will generally comprise some or all of the following components:
Data
OLTP/OLAP Databases, XML, CSV, Excel Workbooks, Access Databases Data Sources
Data Warehouse
Data Marts
Multidimensional Cubes or Models
Data Mining Structures
Reporting Systems
Client Analysis Tools such as Excel Pivot Tables & Charts, PowerBI, PowerPivot
On Premise or Cloud-based Repositories
To support and implement the above components the following services will be required:
ETL Service
Database Management System (DBMS)
Multidimensional Service
Reporting Service
On Premise or Cloud-based storage location
A BI system will enable ordinary users - people who simply need to be able to access the business information - to view, refresh and generate reports without the need to become database specialists.
In many cases technical authors will design the reports and schedule them for refresh with subscriptions enabling users to keep up to date with the latest versions of reports. They would have the option of viewing reports through an easy to use web front end, a shared network location or via email.
The BI system will also enable more skilled staff to implement Data Mining to analyse business data and analyse business trends, and make future predictions on the basis of these trends.
Data sources
We have established already that our data is varied in structure as well as location. Here I will define the main categories of data sources:
Relational (SQL) database
No SQL Database
Flat File Data
Multidimensional Database
Relational Database
A Relational Database stores data in tables where each table holds records that represent a single object, for example: a customer, a product, an order, a device. Tables consist of columns and contain records (or rows). Relational databases are perfect for working with structured data where the bulk of relationships are of a one to many nature:
one customer places many orders
one device generates many status messages
one vehicle makes many journeys
The structure of the data is known prior to loading data and therefore tables with columns can be created providing a rigid structure before data is loaded. The structure can be changed, but this invariably needs changes to applications and the loading or creation of missing data, which may result in some downtime.
No SQL Database
A No SQL Database addresses the challenge of a massive volume of data that is largely unstructured and prone to rapid change. A relational database cannot cope with such demands and if we try to use them for this purpose performance will be far short of acceptable given the volume of data involved and changes to the structure of the data would take far too long to roll out.
The key features of a No SQL Database are:
More scalable than Relational Databases and offer a much better level of performance.
Can be distributed over many geographic locations rather than being stored on one single location. Cloud computing provides the storage infrastructure for this.
Do not require that a schema is rigidly defined before data is loaded as is the case with a Relational Database. This means that as new attributes are identified the schema can be changed on the fly rather than having to add a new column to a table which might then upset existing applications.
Allows application changes to be made in real-time, whereas changes to relational database applications tend to require some downtime.
Flat File Data
Flat File Data is stored as a single flat record (no tables and lots of data duplication for certain attributes). Flat file data could include any of the following:
CSV, TSV or Delimited File
Fixed Width Text File
Excel workbook/Spreadsheet
XML file
Where the business data source is a legacy or proprietary system, data can generally be exported or extracted from these systems as text. The text may be delimited by any chosen character – the most common being a comma (CSV) or a tab (TSV) – and these text files can then be imported into a central repository.
Another common export and import format for proprietary business systems is XML, a tagged language.
Excel workbooks may be completely flat file in nature or may be more like a table, with data distributed over several worksheets.
Multidimensional Database
Multidimensional databases are designed around analysis requirements as opposed to the business process requirements that a relational database might provide. They are capable of holding millions or billions of records and calculating aggregate values very quickly.
In a multidimensional database we will work with Cubes, Measures and Dimensions, and Dimensions will consist of Hierarchies, Levels and Members. The term Multidimensional database is sometimes used to describe a Relational Database that has been denormalised and designed around aggregate analysis needs. Such databases would be termed also as Online Analytical Processing (OLAP) Databases and would adopt a star-like schema consisting of Fact and Dimension Tables. This database would then be a suitable feed for a Cube based multidimensional database.
Whereas with Relational Databases and NoSQL Databases records may change, data in a Multidimensional is predominantly historic and not changed after it is loaded, although this may not always be the case.
Extract, Transform, Load (ETL)
A complete BI solution requires processes and tools to extract and import the business data from its various data sources, to transform the data into a format that can more easily be interrogated and to generate reports to present information in a readable and usable format.
Data warehousing
A BI system enables global data to be centralised, merged and consolidated in one central location known as the data warehouse. The data warehouse stores static, read-only facts (although sometimes it might change over time it is predominantly read-only historic facts) often pulled from or derived from many different business sources. The underlying structure of the Data Warehouse database may be quite complex, although in general not as complicated as the highly normalised OLTP (Online Transaction Processing) databases that they are constructed from, and doesn’t present the data in a suitable format for business analysts or decision makers.
There are different styles of Data Warehouse design with one of the most common designs being the Multidimensional Database.
The Data Warehouse database (OLAP Database) will typically aggregate tables from OLTP databases implementing a star or snowflake schema, making the design less complex with fewer tables, but more repeated data. The tables will also be split into Fact Tables and Dimension Tables as mentioned earlier.
Populating data warehouses
Data Warehouses will generally be populated through an Extract Transform Load (ETL) system. This system will take data exported from a data source and will transform it to enable it to be imported into a Data Warehouse structure. This invariably involves cleansing data, converting data types, converting data from relational OLTP structures to Data Warehouse structures such as star schemas, and verifying the loaded data.
Data analysis
Data analysis is the process of building aggregations on the Data Warehouse data and drilling down into the business data to present management information, business trends and business predictions.
Reporting
Reporting is the process of presenting information to the end user. Business Information can be deployed to those that need it via email, network share locations or web interfaces to reports.
Mining
Data Mining is the art of looking beyond the surface and finding the stories that your historic data can tell.
Server solutions
Corporate BI solutions will require a centrally deployed server based solution.
Client tools
A common term used today when discussing BI solutions is Self Service BI. This term refers to a tool that enables a user to find out the stories behind their business data without the need to become a technical architect or specialist.
What is cloud computing?
As Microsoft helpfully explains, cloud computing is "the delivery of computing services - including servers, storage, databases, networking, software, analytics, and intelligence - over the Internet. There's much more on that here, if you're interested - Cloud computing explained.
What is data analytics?
Data Analytics describes the process of asking questions about your business. Think about those questions I listed at the beginning of this article, such as “Do you know which is your most successful product or service?”. A Data Analytics solution is part of a BI solution. It comprises client tools that enable us to ask these questions and delivers the answers often in a visual and succinct way.
What is big data?
Big data simply means the huge volumes of business data that you and your business need to include in your analysis. It's often unstructured, and it could be internal to your organisation, but it's often external – located in the cloud.
Businesses utilise a growing number of cloud-based services which generate massive volumes of data, all of which is valuable to the business and needs to be analysed in some way. Big data is often in need of cleansing before analysis as we are probably trying to use this data in a way for which it was not originally intended to be used.
Big data is as it says – very large volumes of data. The challenging thing about big data is:
It's often external to the company
It's often cloud-based
It's very dynamic
It's generally unstructured data. For example...
Text messages
Social Media data
Website content
It's not intended to be a data analysis source
Standard traditional BI tools simply cannot cope with the sheer size of the data
A large amount of Big data is text based.
Some examples of system data that would be classified as Big data:
Equipment D=data
Equipment status messages
Social Media
Twitter
Facebook
Google+
Linked-In
Search Engine data
Google Analytics
A traditional Data Warehouse is not appropriate for centralising and hosting big data. Some data analytics tools such as Microsoft’s Power BI provide integration for big data sources so the client can pull data from many sources into a suite of dashboard reports, but given that Big Data sources often require a certain amount of cleansing before they are suitable for analysing this may not be the best approach.
More and more businesses today are turning to hadoop, an Open Source platform for storing and processing big data.
Many people will also include traditional transactional database data as a part of the big data picture.
In that sense big data is an all encompassing term for all of our data :
Structured (relational database), semi-structured (XML) and unstructured (Word, PDF, Text)
Internal (On-Premise) and external (Cloud).
Big data platforms are available from vendors such as Amazon, IBM, Microsoft and Oracle, and fall into a couple of categories:
Operational
Analytical
Operational big data systems
Operational big data systems create data in real time with user interaction. The data is unstructured data such as that produced by Social media services and is hosted in a No SQL Database. These systems have the following features:
Read and write requests.
Many concurrent sessions (potentially hundreds of thousands)
Selective queries
Accessed by users/customers
MongoDB is an example of an Operational Big Data system.
Analytical big data systems
Analytical big data systems provide complex analysis services, services that are complementary to analysis services provided through SQL based systems. These systems have the following features:
Read requests only
Few concurrent users
Unselective queries
Accessed by Data Scientists
Massively Parallel Processing (MPP) and MapReduce systems are examples of Analytical Big Data Systems.
Hadoop
Hadoop is an Open Source offering written for GNU/Linux (although it is also available for Windows) that provides a Hadoop File System (HDFS), the MapReduce engine and a suite of tools and is fast becoming a key component in a corporate Business Intelligence solution, sitting alongside the traditional Data Warehouse rather than replacing it.
Hadoop is a framework consisting of four key components:
Hadoop Common: The common utilities that support the other Hadoop modules.
Hadoop Distributed File System (HDFS): A distributed file system that provides high-throughput access to application data.
Hadoop YARN: A framework for job scheduling and cluster resource management.
Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
Being open source it is an economical solution enabling unstructured data as well as structured data to be brought into a central HDFS resource. Schemas are interpreted when read rather than a pre-existing and rigid structure being required (as in the case of a relational database Data Warehouse), which is perfect for unstructured data, ever changing, such as social media data. HDFS can potentially be distributed over thousands of computers, thus distributing load.
In its native open source form a high degree of programming will be required to configure and implement a Hadoop solution, but there are a number of suppliers who are providing a more shrink wrapped offering of Hadoop.
Suppliers such as SAS, Tableau, Microsoft and Oracle provide integrations with hadoop to enable their analytic tools to be used against a Hadoop data source.
What is self-service BI?
A full blown Corporate BI solution can take years to research, plan, design and implement and requires highly skilled staff with expertise in many areas. Many of the larger organisations will have dedicated teams focussing on a Corporate BI strategy with centralised data, analytics and reporting and highly skilled technical staff and statistical analysts.
For many organisations, though, they need to get going a little quicker with analysing their data and they don’t have statistical analysts or BI staff at their disposal. For them a Self-Service BI solution is more appropriate.
In a self-service solution the client analytic tools are used to converge the data from different sources, create relationships and produce dashboards, KPIs, matrices, tables and reports. But everything is structured in a way that is relatively easy for a business user to take on themselves.
A self-service BI tool will:
Provide an easy to use user interface
Integrate with the back end data sources containing business data
Enable the user to produce dashboards
Provide a front end to producing queries against business data sources
A self-service BI solution should enable a user to design, create and publish their own reports and analytical findings with no need for the intervention of an IT team that can prove to be a bottleneck in churning the required information out.
Common self-service BI tools include PowerPivot & Power BI from Microsoft, and Tableau Desktop from Tableau.
BI vendors
Many vendors offer products to satisfy Business Intelligence Needs so choosing the right tool set for your business can be a challenge. Here are some of the vendors who offer BI technology solutions:
Oracle – Oracle Business Intelligence 12c
Microsoft
Qlik
Tibco
Tableau
MicroStrategy
OpenText
IBM
Information Builders
SAP
SAS
Their offerings range from Personal (self-service) BI solutions to full corporate BI solutions.
So, how on earth do you choose a solution for your business needs? They all seem to claim to offer pretty much the same features.
I think the first question to ask yourself is “Do I already have any BI software or servers in my business?” If you have any Microsoft SQL Server databases in your organisation then the answer will be yes. Microsoft is one of the few vendors who include their Business Intelligence services with their Database Engine platform.
Many businesses are turning to Microsoft for a Business Intelligence solution as it offers a budget friendly solution if your company already uses SQL Server.
Summary
We've simply scratched the surface of BI, data analytics and big data here, but hopefully you now have a clearer idea of what it's all about.
It's what businesses have been doing for years, but now instead of trying to do all the clever analytical stuff in an Excel spreadsheet that needs to be updated weekly, monthly, quarterly, or annually, there are fantastic tools at our fingertips giving us live dashboards that update with our data. This gives businesses the power to make much faster and more effective decisions, fully informed and back up by data.
Instead of business users having to gather Excel workbooks or CSV files from various parts of the business to import into our magic Excel workbooks we have centralised repositories automatically updated and permanently available.
The sheer array of products and services available may make it complex to implement, but the benefit of having such a complex and powerful architecture is that we have valuable business information at our fingertips and whenever we need it. A fully optimised business, and data-driven decision making at its best. We'd love to help you craft a bespoke solution for your business, so get in touch.
Discover lots more information elsewhere in our blog.
If you have any questions do get in touch at info@ptr.co.uk, and if you would like some assistance in identifying BI and Data Anyltics solutions that might suit your needs our business intelligence consultancy service is here to help.
References
With so many conflicting definitions of Business Intelligence and Data Analytics terms, and so many offerings from different vendors it is a confusing world for anyone trying to assess their needs for their business. I have come across a few helpful documents/articles that are worth a read:
Gartner’s Magic Quadrant for Business Intelligence and Analytics Platforms
Business Intelligence, Data Analytics and Data Mining Solutions Directory
The Best Self-Service Business Intelligence (BI) Tools of 2016
Eight Considerations for Utilizing Big Data Analytics with Hadoop from SAS.
Share This Page
Mandy Doward
Managing Director
PTR’s owner and Managing Director is a Microsoft MCSE certified Business Intelligence (BI) Consultant, with over 30 years of experience working with data analytics and BI.
PTR FAQs
See our FAQs or get in contact for more information on any of our services, solutions, or to start your PTR journey.