Your Future with Analytics

From New Media Business Blog

Jump to: navigation, search

Contents

What is Analytics

Analytics, often described as an overarching concept that encompasses the processes, technologies, and techniques that ultimately help decision-makers transform data into information. According to SAS, a leader in analytics, "...it is an encompassing and multi-dimensional field that uses mathematics, statistics, predictive modelling and machine-learning techniques to find meaningful patterns and knowledge in recorded data."[1]

The Three Types of Analytics

Descriptive Analytics

Also known as data mining or business intelligence, this is the most basic form of analytics. It helps to answer questions surrounding, “What has happened?” Descriptive analytics analyzes live and historical data to understand why an event in the past has occurred. It can analyze data collected from years ago to data collected five seconds ago. The majority of analytics that organizations use fall into this category.[2]

Predictive Analytics

With the understanding of “What has happened?” collected from descriptive analytics, predictive analytics is then used to answer, “What could potentially happen based on the previous trends we have analyzed?” To predict the probability of a future outcome, predictive analytics uses various statistical and machine learning algorithms. However, the nature of predictive analytics is based on probabilities so the accuracy of the predictions is not 100%.

Prescriptive Analytics

Using the information gathered from the descriptive and predictive analytics, the purpose of prescriptive analytics is to advise users on the possible actions they could take and the possible outcomes as a result. Typically, prescriptive analytics users seek to maximize efficiency, key business metrics, and ultimately help optimize the business. It helps answer the question “What should the business do?”

To answer this question, prescriptive analytics simulates the future with a set of assumptions that come from the two previous types of analytics. It is a combination of data, mathematical models, algorithms, and business rules (such as preferences, best practices, and constraints).

Evolution

Analytics was used prior to the development of computers, spreadsheets, and even pencils. Analytics evolved throughout the course of history, starting with the barter economy through to big data analytics.[3]

Barter Economy

Prior to having currency and a monetary exchange system, analytics was present in the barter economy.[3] People of this time period would track who had what with wood, stones, or leave markings on walls. This system allowed wealthy individuals to use analytics to find arbitrage opportunities where they would trade an item multiple times in order to gain more wealth.

Industrial Era

With innovations in manufacturing processes and the development of roads, railroads, and oil, there were many opportunities to use analytics to increase efficiency and productivity. During this time, Frederick W. Taylor developed the first formalized system for business analytics, called the “System of Scientific Management.” Taylor recorded production techniques and the body movements of workers to identify time-saving efficiencies, which resulted in increased production.[3]

Using this research, he became a consultant to Henry Ford and contributed to the development of the assembly line system, revolutionizing the manufacturing industry. During the industrial era, analytics strongly focused on identifying efficiencies, increasing quantities, and decreasing production costs.

Operational Reporting

Operational reporting is still being used in many functional areas of today's businesses. Operational reporting is characterized by analytical reports that are specific to each business unit and not shared company-wide. [3] Prior to digitization, these handwritten reports were used to record data about specific areas in order to identify relevant efficiencies. This created silos between departments. Overall, it was difficult to identify past trends and integrate different reports together as many of these reports were paper-based.

The Beginning of Digital

During the 1970s, computers started to become the norm in large organizations and the use of analytics was lead by a Decision Support System (DSS).[4] A DSS is an information system that supports business decision making in a changing environment by sorting and filtering large amounts of data. This system also allowed users to create operational reports specific to certain time ranges and pull data from different functional areas to be integrated into one report. This lead business leaders to improve their decision-making capabilities while creating a more holistic view of the business.

The Information Age

The Information Age occurred throughout the 1980s and into the 1990s, and is characterized by the rapid increase of storage capacity of computers, with the introduction of Moore's Law. This resulted in data warehouses being able to save large amounts of reports and to use historical data to generate insights. This data could be analyzed to determine market trends, estimate growth, and identify opportunities. These reports became a standardized way for shareholders to better understand a business and determine whether or not to invest.

Microsoft Excel was built on the DSS platform in 1985 and is a tool for users to filter, sort, and create formulas to manipulate data.[5] With the introduction of this software, many businesses transitioned to Microsoft Excel over traditional reporting systems. Currently, it is still a universal tool that can be both, an input of data and output of results in many different programs, such as accounting software.

Google Analytics provides users with the capability to analyze their website, iOS, and Android device traffic, allowing users to develop metrics and determine opportunities for their product, all at no cost.[6] Examples include their audience demographics, click-through rates, and which device accesses their site most often. Google Analytics was revolutionary due to the accessibility of the platform in every house and business that is connected online.

New Age Analytics

With the introduction of programs such as SAP Analytics Cloud, Tableau, and IBM Digital Analytics, business analytics has evolved and created new capacities.

Currently, these are the most prominent uses of analytics:

  1. Analyzing the past: Although we mainly use deductive tools to examine the past, advanced analytical tools can be used to help model the past. There are many questions that are seemingly simple but are difficult to answer because they involve the interactions of multiple variables. For example, "Why did revenues drop last month?"
  2. Optimizing the present: Once a model is created to include historical data and business users understand the relationships among key variables, that information can be used to help optimize the present. For example, a market basket model can help retailers organize store layouts to maximize their revenue.
  3. Predicting the future: By continually applying the model to new information, predictive analytics can guess with a reasonable degree of accuracy the result of a decision. For example, whether a customer may respond positively to the new store layout or an in-store promotion.
  4. Testing assumptions: Advanced analytics can also be used to test assumptions about what drives the business. For example, before spending millions on a store redesign, a retailer might test an assumption that customers prefer to see more expensive items on the right side of the store.[7]

Predictive Analytics

Predictive analytics is a way to predict future events using historical and transactional data.[8] It does not identify exactly what will happen in the future, but it aids decision makers in forecasting future events with a high degree of reliability and allows for risk assessment. It is typically used in organizational decisions, using mass amounts of data collected over time and complex algorithms.

Process

Predictive Analytics Process.
Figure 4 The Predictive Analytics Process Must Be Continuous To Ensure Effectiveness, a process diagram from Forrester Research, Inc.[9]
  • Understand data: It is critical that when using predictive analytics, users understand the data before applying, as the insights drawn from the models are only as strong as the data fed into the algorithms.[8] Predictions could be wrong if the data is not properly understood or inferred before applying. Users need to ensure that the data used contains all relevant information possible.
  • Prepare data: This might include data cleaning, ensuring its consistency, its accuracy, and that it is up-to-date.
  • Model: The prepared data is then fed through a model. There are many different modelling approaches to predictive analytics that can help determine many different use-cases.
  • Evaluate: Statisticians and data scientists will evaluate the models, and business analysts may also contribute from their perspective.
  • Deploy: The given information from the model is then deployed to the correct stakeholders in order to carry out the use-cases.
  • Monitor: Users must continue to monitor the data and the models to ensure they are current and displaying what is intended. If there are bugs or errors, then the models could have incorrect predictions.

The process must be a continuous and iterative cycle in order to be accurate. It allows organizations to become proactive, forward-thinking, and anticipate outcomes and behaviours based on past data. There will be fewer assumptions, more certainty and clarity for decision-makers.




Example of Predictive Analytics

Google maps and traffic.
Figure 2 Google maps and traffic [10]

Google Maps and traffic. This is a map going from SFU Vancouver to SFU Burnaby. The estimated time, high traffic areas, and places with delays are determined using predictive analytics. It uses historical data of that day years past, traffic at the time of the search, and factors in real-time accidents and delays based off of GPS and news. [11] This benefits commuters, delivery service industries, car-sharing, and the general community by showcasing which areas are congested that can be avoided or rerouted. With this information, affected individuals can make an informed decision about what route to take and avoid to get to their destination.















Data Mining

Data mining helps businesses find meaningful patterns and relationships from their copious amounts of data. Companies data mine using statistical, mathematical, and artificial intelligence techniques to find useful insights from their daily transactions.

There are three main categories of data that companies will track to uncover insights:

  1. Usage data: includes the actions that users take when using the software. This includes who is using the software, what buttons and pages the user has clicked, when the user has completed the action, how long it takes to perform the action and any errors or tasks encountered.[12]
  2. User data: describes the demographic information of each individual customer. This includes the psycho-graphic and personal information of each user. Examples of this are: education, city, what the user has purchased, etc.[12]
  3. Corporate data: includes the information that is related to the larger business customers includes information about the organization such as the number of customers using the software product, the number of people working in the organization, etc.[12]

Process of Data Mining

Data mining process.
Figure 3 The process of Data mining.[13]

According to the Cross-Industry Standard Process for Data Mining (CRISP-DM) Reference Model[14], there are six phases of data mining activities: business understanding, data understanding, data preparation, modeling, evaluation and deployment.

  1. Business understanding: The first step to data mining is to distinguish the main goal of the project. In this stage, a project plan is identified, determining is needed and what steps are necessary to complete the data mining, as well as the criteria to determine the measures of success[15].
  2. Data understanding: The second step is finding insights in the data through querying, data visualization, and simple reporting. During this stage, users will be examining the quality of the data and ensuring that it is suitable to achieve the goals set out in the previous stage. The user needs to identify what type of data is needed: usage data, corporate data, or user data[14].
  3. Data preparation: The third step is determining which data sets will be used for analysis by limiting the analysis to data that is relevant to the objective. This is also when data cleaning is necessary to ensure quality. The user can also blend different data sets in this step to come up with more insights and information[16].
  4. Modeling: In this step, the users need to select modeling techniques that best suit the business goals. In this stage, they need to ensure that the modeling assumptions are clearly defined and recorded. After that, they need to create a process to test the model’s quality and validity. Normally users separate the data set into train and test sets, build the model on the train set, and estimate its quality on the separate test set. After that users would proceed to build the model and run the data set to create data models. Once that is completed the model is evaluated, users will interpret the models according to the domain knowledge, and data mining measures of success from business understanding.[14]
  5. Evaluation: During evaluation, the users check the degree to which the data model and the business objectives align to see if there are any issues with the model. If the final models appear to meet the business requirements, review the data mining engagement to see if there are any factors that may have been omitted.[14]
  6. Deployment: In the final deployment stage,using the findings, determine a plan to implement. Once the results are deployed, create a plan to monitor and maintain the daily operations. After that, assess and determine what went right and what needs to be improved for the next time.[14]

Predictive Modeling

Predictive modeling predicts outcomes using statistics.[17] It follows this process:

  1. Sample data: Use data that describes the problem with known relationships between inputs and outputs
  2. Learn a model: Using the sample data, apply an algorithm to create a model
  3. Make predictions: Model can be applied to any type of unknown event

Example of Predictive Modeling

Sentiment analysis used in a Twitter Election study.
Figure 4 Sentiment analysis used in a Twitter Election study. [18]

A study was conducted by Andranik Tumasjan, Timm O. Sprenger, Philipp G. Sandner, Isabell M. Welpe to analyze if Twitter could be an accurate source for predicting election results, particularly in the Germany election in 2009. They created algorithms, found a dataset of over 100,000 political tweets (104,003 tweets), and fed them through their algorithm.[18]

Figure 4 depicts a sentiment analysis - this is one of the models created in the study using the sample data. This shows the top 5 candidates at the time in Germany on each coloured line, and the markings on the web are what was perceived by voters’ tweets. It is insightful that by pooling together data, anyone could see which candidate voters felt the most negatively about (Westerwelle) or the one that seemed the most tentative (Steimeier). This algorithm and model predicts with some accuracy who would win before election results get released.[18]







Machine Learning

Machine learning is a subset of artificial intelligence that often uses statistical techniques to give computers the ability to "learn", as in progressively improve performance on a specific task with data, without being explicitly programmed.[19] They find patterns within data and help predict the future. It improves decision-making because a machine will identify all the likely outcomes based on the constraints outlined in the algorithm, without human error. From there, decision-makers can analyze how it fits their business case and assess the risks.

Machine learning.

Supervised Learning

In supervised learning, the algorithm finds patterns (and develops predictive models) using both input data and output data.[21] There are two possible outcomes:

  • Classification: Items are assigned in a collection to target categories or classes. The goal is to accurately predict the target class for each case in the data. For example, spam filtering is a form of classification, where the inputs are email messages and the classes are "spam" and "not spam."[21]
  • Regression: This is a data mining technique used to predict a range of numeric values, also called continuous values, given a particular dataset. For example, regression might be used to predict the cost of a product or service, given other variables.[21]

Unsupervised Learning

In unsupervised learning, the algorithm finds patterns (and develops predictive models) using only input data.[21] This technique is useful when decision-makers are exploring outcomes. The outcome is usually a form of cluster analysis, which is the process of making a group of abstract objects into classes of similar objects. A cluster of data objects can be treated as one group.

Approaches to Machine Learning

There are dozens of techniques and approaches to machine learning to solve problems, with new ones being developed constantly. This is not an exhaustive list, but here are the three most popular approaches:

  1. Decision tree learning: Decision tree learning uses a decision tree as a predictive model, which maps observations about an item to conclusions about the item's target value. It determines, given x scenario (which includes probabilities of A and B happening, etc), the probability is x,y,z.[22] The intent of decision tree learning is to identify different outcomes with different dependencies, and how the outcomes can be different depending on actions taken in the first few decisions. When a computer is given the dataset, it will tell the user what the possible outcomes are and determine the probability of the outcome happening - in other words, which result is most likely to occur.
  2. Artificial neural networks: An artificial neural network (ANN) learning algorithm, also called "neural network" (NN), is a learning algorithm that where the computer calculates as if actions or decisions are interconnected in a similar fashion to the human brain.[23] Modern neural networks are non-linear statistical data modeling tools. They are typically used to model complex relationships between inputs and outputs, to find patterns in data, or to calculate unknown joint probabilities between observed variables.
  3. Bayesian networks: A Bayesian network is a probabilistic model that represents a set of random variables and their conditional independencies using a graph.[24] For example, it could be used to represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

Other popular techniques:

  • Ensemble models: Ensemble models train multiple similar models and combine the results of each to improve accuracy, reduce bias, reduce variance and identify the best model to use with new data.[21]
  • Gradient boosting: This method resamples the data set several times to generate results that form a weighted average of the resampled data set.[21]
  • Incremental response (also called net lift or uplift models): These model the change in probability caused by an action. This is commonly used to assess marketing programs.[21]
  • K-nearest neighbour (knn): This method can give a classification or regression and it predicts an object’s values or class memberships based on the k-closest training examples.[21]
  • Partial least squares: This technique can be applied to any data set. It can model relationships between inputs and outputs when inputs are correlated, or if there are multiple outputs, or if there are more inputs than observations. This method finds factors that explain both response and predictor variations.[21]
  • Principal component analysis: The purpose of this technique is to find a small number of independent combinations in a set of variables that retain the most amount of information from the original variables.[21]
  • Support vector machine: This supervised learning technique uses associated algorithms to analyze data and identify patterns. It can be used in classification and regression.[21]
  • Time series data mining: Time series data is time-stamped and collected over time at a particular interval. For example, sales in a quarter, calls per month, web visits per day, etc. This method combines data mining and forecasting techniques. These techniques are applied to historical data to improve predictions.[21]

Business Intelligence

Business Intelligence is defined “as a system comprised of both technical and organizational elements that present its users with historical information for analysis to enable effective decision-making and management support, with the overall purpose of increasing organizational performance.”[25] Corporations invest millions of dollars in business intelligence tools to give business analysts the ability to make data-driven decisions, monitor key performance indicators, and gain insights on how to improve their results.


What is Business Intelligence (BI)?.[26]


Top Business Applications

  • Measurement: Enterprises use BI to create a hierarchy of performance metrics and bench-marking. This information gives executives and managers insights about their progress towards their business objectives.[1]
  • Analytics: Companies use analytics to form quantitative processes so that businesses can create data-driven decisions. This allows businesses to uncover insights on the data they collected. This process includes data mining, process mining, statistical analysis, predictive analytics, predictive modeling, business process modelling, data lineage, complex event processing, and prescriptive analytics.[1]
  • Reporting and enterprise reporting: Businesses use reporting to serve as an infrastructure for companies to communicate their progress with their strategic goals. This process includes data visualization, executive information systems, and Online Analytical Processing (OLAP).[1]
  • Collaboration platform: This component gets different areas outside and within the business to work together with data-sharing and electronic data interchange. For example, companies use programs like Slack or Yammer to collaborate.[1]
  • Knowledge management: Companies remain competitive and data-driven by using strategies and practices to identify, create, represent, distribute, and enable adoption of insights and experiences that are true business knowledge. Knowledge management is crucial for learning management to ensure their skills are up to date, knowledge transfer within teams, and regulatory compliance.[1]

Financial Planning

Many organizations fail to execute their strategy due to misaligned resources, thereby creating a need for intelligent planning and analysis. Financial planning is understanding a business’ current and future financial state in order to predict the effect of known variables on future cash flows and outcomes.[2] This can create an overall structure for a business and help determine the allocation of resources in order to achieve operational and strategic goals.

The basic tools used for financial planning include balance sheets, income statements, and cash flows. Financial planning can also determine potential risks, allowing users to data-driven decisions, achieved by running different financial simulations.[3]

Simulations

Simulations help manage risk and cash flows by predicting different outputs as inputs change. This is done by creating relationships between known inputs, assigning mathematical equations to describe the relationships, and generate predicted future outcomes.[4]

In order to test the quality of a financial simulation model, programs such as KNIME[5] allow the user to partition data into a “learning data set” and testing data set. Once the program has been run, quality scores are given to the financial model allowing the user to determine how reliable the outcomes of the financial model are.[6] It is important to note that the quality of input data in simulations determine the quality of its output.

Value Driver Trees

Value Driver Trees (VDT) are a simulation tool that builds connections with different functional areas of a business in order to run simulations and showcase how a change in one area can impact other areas.[7] Examples of questions that VDTs can answer include: What if I fired half my employees? What if I gave everyone a raise? What if I discontinued my products?

VDTs are able to do this by assigning numeric data to different nodes, such as data source nodes, union nodes, and calculation nodes.[8] VDTs typically start with data source nodes that represent raw data which are then built upon. The outcome is a financial model that is able to understand relationships between different data points.

Data Visualization

Data visualization is the graphical representation of information and data. It can range from a simple bar chart depicting the most popular pizza toppings, a subway map colourfully representing an entire city’s transit routes, to a complex matrix linking the connections made between people on social media.[9] By using these visual elements, data visualization is an accessible way to see and understand trends, outliers, and patterns in data.

History

The development of data visualization is often linked to the recent developments in modern statistics. However, the practice of conveying quantitative information in a visual manner has existed for centuries.

The early history of data visualization encompasses cartography, statistics and statistical graphs, and several applications in science and medicine. Its rising adoption can also be attributed to the rise of statistical thinking and the vast amounts of data collected for planning and commerce throughout the 19th century.[10] Since then, the sophistication of data visualization has vastly improved to include better tools for reproducing images, data collection and observation, mapping, and more.

DataVisHistory.
Figure 5 Charles Joseph Minard's map of Napoleon's Russian campaign of 1812. This graph features two dimensions of six types of data surrounding Napoleon’s troops and the external factors surrounding them.
DataVisMile.
Figure 6: The time distribution of events considered milestones in the history of data visualization.

























Examples of Data Visualizations

Here are some examples of data visualizations from Duke University:

Name Some Available Tools:
Bar
  • Excel
  • Google Charts
  • Tableau Public
  • SAP Analytics Cloud
Time Series
  • Timeplot
  • Google Charts
  • Tableau Public
  • SAP Analytics Cloud
Word Cloud
  • Wordle
  • Many Eyes
  • d3
Tree Map
  • Google Charts
  • SAP Analytics Cloud

Best Practice Resources

The Future of Data Visualization

In the past few decades, the emergence of accessible and affordable analytical tools have contributed to the popularity of data visualization. With it being easier to store and process data, careers in data science, business analysts, and product marketers who understand analytics will increase and require experience with data visualization. Here are a few predictions of what may be in store for the future of data visualization:[11]

  • Better tools: As it becomes easier to access, store, and interact with data, the demand for better tools will increase. These tools will have to do more than just create bar charts. They will be expected to connect to a myriad of data sources, create an intuitive user experience for users, and automatically analyze the data so that users can focus on the insights that the software gives them instead.

    Having better tools allows business users to think about why something has happened after being given the statistics instead of spending their valuable time calculating the statistics.

  • Data visualization for the masses: When better tools are developed, not only will it become easier for more people to gain access but people with varying skill sets will be able to start visualizing their data. The task of creating visualizations will not be automatically delegated to a specialist and IT teams will not be burdened with retrieving information from specific databases. Overall, the foundation of knowledge in data visualization will increase and help reduce inefficiencies within organizations who previously overburdened specific teams with creating data visualizations.
  • Constant access to data: With more online platforms being adapted for the mobile experience, there will be an expectation for business users to be able to access their data visualizations on-the-go as well. Whether it’s portable data, portable dashboards, or portable visualizations, having constant access can help business users make important data-driven decisions wherever they are.


Implementation

Excel spreadsheets alone are not robust enough to handle the large volumes of data produced by large companies. Analytics tools for corporations are connected either via the Cloud, on-premise, or a combination of both (hybrid). Analytics cloud software is hosted on the vendor’s data servers and users are able to access the data through their web browser. On the other hand, on-premise software is installed locally on the user’s computer and data servers. A third option, hybrid, combines cloud and on-premise to create a hybrid deployment that caters to business’ unique needs.

Cloud

Generally, Cloud software is priced through a monthly or annual subscription. This subscription usually includes the cost of support, product training and product support. One of the large benefits of Cloud software is its low initial costs compared to on-premise software solutions. The data security of cloud software is managed by the vendor who usually abides by high security standards. Since the solutions are more cookie-cut, there is greater stability and the updates are included in the subscription pricing so there are no unexpected additional costs. Cloud software takes significantly less time for business users to get started and implemented because it can be deployed through a web browser.[12]

On-Premise

Software that is implemented on-premise is associated with a larger upfront investment and users have to invest in hardware and IT. However, by keeping their software on-premise, companies are able to manage their security requirements themselves and adjust their needs accordingly. With on-premise software, there is a greater ability to customize the software to the business’ own specifications. Furthermore, if an on-premise implementation is selected, organizations have more control over how the implementation of the software is decided. However, it will take longer to implement because it is so personalized.[12]

Hybrid

Hybrid deployment is a combination of cloud and on-premise implementation. Many organizations need to be agile and adapt to the market’s needs, which is why they may implement a cloud solution. However, some companies, (namely governmental organizations) are restricted by compliance, data security and delivery models and cannot fully move to the cloud. In situations like these, companies can have sensitive data stored locally on-premise and they can have their day-to-day data stored on the cloud so they can easily access their data.[12]

Older companies, who have on-premise software already deployed, can still access their legacy applications using a hybrid deployment. However, due to the high amount of customization, hybrid deployments tend to be the most costly.

Analyst/Advisory Firms

A technology analyst/advisory firm can help break down the strengths and areas to improve for an industry’s top products, and it also provides customers with an interest in implementing technology with insight on where a product’s roadmaps and visions are leading. Analyst reports can have a large influence on a buying decision in the technology industry and in other sectors.

Analysts go through a large process to acquire all the information on a product. It could manifest itself as a survey, a request for information (RFI) or a request for proposal (RFP), surveys sent to customers who are references of the product, video demos, briefings, and more. This is to ensure the analysts have a sound understanding of the product before evaluation.[13]

A few of the examples of analyst/advisory firms are Gartner and Forrester. Gartner’s survey is a “Magic Quadrant” and Forrester’s survey is a “Wave”. Once the information is compiled and evaluated, the analysts rank each firm in terms of capabilities, value, future outlook, etc.

Competitor Analysis

Gartner MQ.
Figure 7 Gartner Magic Quadrant for BI and Analytics in 2018.

This is an example of the Gartner Magic Quadrant. Gartner goes through a process to find out information about the product in question, typically information about the company, the product, the customers, and evidence for all of the three. The information is then assessed and placed on a points system that evaluates where the company is located on their visual survey result, the Magic Quadrant.[14]


According to Gartner, there are 4 quadrants, Niche Players, Visionaries, Challengers, and Leaders:


  • Leaders execute well against their current vision and are well positioned for tomorrow.
  • Visionaries understand where the market is going or have a vision for the changing market rules, but do not execute well yet.
  • Niche Players focus successfully on a small segment, or are unfocused and do not out-innovate or outperform others.
  • Challengers execute well today or may dominate a large segment, but do not demonstrate an understanding of market direction.

The report published by Gartner shows the example as a summary, then the details of each company are listed later in the report with a description about the company and the product, as well as its strengths and cautions.[14]




Industry Use Cases for Analytics

Company Company Background Application of Analytics Impact on Business
Harrah’s Las Vegas Article Industry: Tourism
Large hotel and casino
  • Used Total Gold Customer Loyalty program to track customer analytics
  • Harrah’s Laughlin gained 14% in revenues[15]
Rolls Royce Article Industry: Automotive
Luxury car manufacturing company and retailer
  • Used visualization tools to look at and understand big data
  • Reduced costs by diagnosing and faults, correcting them, and preventing them from reoccurring
  • Streamlined production processes by removing past faults from the design process [16]
AirBnB Article Industry: Automotive
Online platform for people to rent or lease properties
  • Used Google tag manager to track conversions
  • Improved vendor data collection to 90%
  • 8% improvement in page load time[17]
Carnival Cruises Article Industry: Tourism
Cruise line that operates worldwide
  • Apply data analytics for price optimization projects to increase profit
  • Dynamically change on a daily for beds in one city versus another to maximize profits [18]


How analytics cured cancer.[19]


Professional Careers in Analytics

Some professional careers that are dependent on analytical skills include the following:

Job Skills Hiring Company Examples Average Salary (USD)
Data Scientist
  • Distributed computing
  • Predictive modelling
  • Visualization
  • Math, statistics, and machine learning
Google, Microsoft, Adobe $118,709
Data and Analytics Manager
  • Database systems
  • Leadership and project management
  • Data mining and predictive
Slack, Google, Facebook $116,725
Data Architect
  • Data warehousing solutions
  • ETL and BI tools
  • Data modeling
  • Systems development
Visa, Coca-cola, Logitech $100,118
Data Engineer
  • Database systems (SQL and NO SQL)
  • Data modeling and Extract, Transform, Load (ETL)
  • Data APIs
  • Data warehousing solutions
Spotify, Facebook, Amazon $95,936
Statistician
  • Statistical theories
  • Data mining and machine learning
  • Hadoop
  • Database systems
  • Cloud tools
Linkedin, Pepsico, Johnson and Johnson $75,069
Database Administrator
  • Backup and recovery
  • Data modelling and design
  • Hadoop
  • Database systems
  • Data security
  • ERP and Business knowledge
Reddit, Tableau, Twitter $67,672
Business Analyst
  • Basic tools (MS Office)
  • Data visualization tools (Tableau)
  • “Storytelling”
  • BI and Data modelling
Uber, Dell, Oracle $65,991
Data Analyst
  • Spreadsheet tools (ie. Excel)
  • Database systems
  • Communication and visualization
  • Math, statistics, and machine learning
IBM, HP, DHL $62,379

Information from dataversity.net

**National average salaries in the US in $USD


Future Outlooks

The term “Big Data” represents the rapidly growing amount of untapped data collected in existing analytical applications and data warehousing systems. To tackle this overwhelming collection of data, three things will emerge and reinforce themselves: the Algorithmic Economy, the Analytics Architecture, and the Internet of Things (IoT).

Algorithmic Economy

Algorithms are currently used in processes that allow software to evolve past its original intent, such as in machine learning, artificial intelligence, and predictive analytics. The algorithms within software may soon be worth more than the software itself as software is becoming cheaper and more accessible.[1] Developers may begin selling the algorithms separately from software in marketplaces. The development of an algorithmic economy will derive from the need to create algorithms that are able to utilize some of the big data collected from sensors, applications, social media, and the internet of things.

Analytics Architecture

Analytics architecture refers to the data and platforms available for a business to perform analytics, such as the infrastructures, applications, data warehouses, and processes.[2] Currently, many businesses use a centralized enterprise database to consolidate data prior to performing analysis. The current architecture is manageable, however, as more data becomes available and the tools to analyze data evolve, the architecture will become increasingly complex. There will be an influx of connections within and outside the company’s architecture. Organizations will begin working together in order to complement each others analytic solutions, creating better and faster insights.[3]

The Internet of Things (IoT)

IoT refers to everyday objects with embedded computing devices, allowing them to collect and transmit data over the internet.[4] Examples include Smart TVs, Digital Assistants, and Smart Thermostats. As computing devices become smaller and cheaper to manufacture, more items will have computing devices embedded into them. With more items transmitting data to central databases, there will be greater amounts of data to analyze and the need for better algorithms will arise.[5] The IoT will impact every industry as businesses are looking understand this data and respond in real-time. Businesses will need to learn how to integrate IoT data with traditional data collection methods to develop insights.

How it works: Internet of Things.[6]

All three of these concepts will create a reinforcing cycle: IoT will collect information that will enter the algorithmic economy for analysis, thereby increasing the complexity of business’ analytics architecture allowing the creation and enhancement of IoT devices. As many of these processes become automated, the speed of processing will need to increase as data and information are consistently being transmitted to different platforms.

Drawbacks and Limitations

  • History cannot always predict the future: Using historical data to predict the future assumes that the data in history was complete and sound and that it can predict what may come in the future. It also assumes that the circumstances will not change, which is almost always wrong when the system involves people[1].[2]
  • The issue of unknown unknowns: A model is only as good as the data inputted, and if that is not complete, the model will misinterpret the data. Even if the data is extensively collected, there is always the possibility that new variables can arise or that the probabilities between variables will change with time, which can be critical to the outcome.[1]
  • Self-defeat of an algorithm: After an algorithm becomes the one solution to problems, it can be taken advantage of by people who understand the algorithm and have the incentive to manipulate the outcome.[1]

Security

Data Breaches

Using analytics requires a lot of data, which translates to more personal information being gathered, stored, and analyzed by businesses. In 2017, US companies took an average of 206 days to detect a data breach, compromising valuable information and costing companies millions of dollars.[3] The delay in breach detection can be attributed to: software vulnerabilities, lack of access controls, third-party errors, inadequate network security, and insider threats.[4] Currently, there are not many government regulations protecting personal information and how that information is used. With organizations creating lengthy terms and conditions that many users blindly accept, there is a risk that sensitive information could be vulnerable.

Data encryption transforms data into a code to prevent unauthorized access. Many organizations are integrating encryption into security systems as data breaches are becoming harder to detect.

Data Rights and Ownership

Many people are not aware of the capabilities of smartphones and technology to transmit data. For example, when setting a destination in Google Maps, it transmits the users’ location in order to find the optimal route. Once data has been transmitted, it is unclear who owns it. Currently, outlined in many of the terms and conditions of social media sites, the content created by users are owned by the specific user, however, the social media site is able to use the content without permissions and without paying the user.[5]

An example of the complexity of who owns data can be found in the case of Microsoft vs. U.S. During this case, Microsoft was under fire in a drug trafficking investigation for only handing over data located in US servers and not from servers located elsewhere, specifically Dublin, Ireland. However, the information the US government needed was located in servers in Dublin. As this case remains unresolved, it illustrates the need for governments to evolve with technology.

Authors

Colin Chu Cynthia Lee Kathy Lee Sarah Xu
Beedie School of Business
Simon Fraser University
Burnaby, BC, Canada
Beedie School of Business
Simon Fraser University
Burnaby, BC, Canada
Beedie School of Business
Simon Fraser University
Burnaby, BC, Canada
Beedie School of Business
Simon Fraser University
Burnaby, BC, Canada


References

  1. 1.0 1.1 1.2 Bari, Anasse. Mohamed,Chaouchi. Jung, Tommy. The Limitations of the Data in Predictive Analytics. https://www.dummies.com/programming/big-data/data-science/the-limitations-of-the-data-in-predictive-analytics/
  2. https://www.dummies.com/programming/big-data/data-science/the-limitations-of-the-data-in-predictive-analytics/
  3. https://www.itgovernanceusa.com/blog/how-long-does-it-take-to-detect-a-cyber-attack/
  4. https://symmetrycorp.com/blog/cyber-security-threats-detection-takes-long/
  5. Are you ready? Here is all the data Facebook and Google have on you. (2018). Retrieved from https://www.theguardian.com/commentisfree/2018/mar/28/all-the-data-facebook-google-has-on-you-privacy
Personal tools