The synthesis between research, development and innovation has long been a feature of the University and Higher Education sector.
The collaboration between these institutions and technology companies is a symbiotic one, with research ideas, development opportunities and collaboration flowing between the sectors all the time. Many of the best performing companies today are those that have grown from, or are closely aligned to the research and development culture.
Many companies are focused on building their own data science capabilities in house. However, for many building this capability can seem to be a daunting task, and the focus has often been on the scarcity of the data scientists, the “unicorns” of the data world who are able to use their blend of data, programming, statistics, modelling and communication to transform the enterprise and deliver improved products and services.
However, the reality of transforming organisations into future focused businesses lies in adopting and adapting the methods that the researchers themselves use to produce repeatable, reliable, transferable, verifiable and robust research. What’s more, the tools that are used across both sides of the academic/business divide are the same.
Research isn’t always necessary. If you know what you are looking for, or if the theories you have need adjusting it will only be necessary to do development work, and that is what developers generally do best, following the deductive research path.
If you really need proof, to understand complex patterns or a different kind of model and really need to find out what you are looking at then the research path will take you through the full cycle.
It’s simply a matter of following a structured research cycle and using the tools that are already to hand to enable this to happen.
The Deductive Reasoning Cycle (Analytics)
Deductive reasoning is used when we apply a theory to our data and reach a logically certain conclusion. Historical database and analytics systems utilise known algorithms to calculate the values of current observations, e.g. Tax Rates or cumulative profit.
- Make observations:
Businesses are always making observations about the world that they operate in, and collect and create lots of data on a daily basis. This data can be structured and held in databases or spreadsheets, or less structured and be based on interactions with others.
Many businesses choose to store these observations in databases, storage or spreadsheets and often display the most important metrics in dashboards and reports. These and the business rules will contain the theory of why these observations are the way they are, for example, that a higher price leads to lower order volumes.
- Think of Questions:
Analysing data and making observations leads to questions that the business will want to answer, it may be that a dip in order volumes, or a spike in fulfilment costs is observed. The answers to these questions may be easily deduced by a closer examination of the figures, or by consulting with others as to why these patterns are observed.
The research journey may stop here, and it will merely be necessary to feedback the findings to the systems, processes or people that are involved.
- Formulate Hypotheses:
For more tricky and involved questions it may be necessary to look more closely into the causes that underlie the observations we are making.
Finding the root cause may involve further analysis of the problem area, possibly by using clustering or grouping to find localised patterns, by joining the data set in question to other datasets to clarify areas of overlapping concerns. Machine learning can be applied to the data to see if there are simple statistical correlations within the observations (using naïve Bayes algorithms to include or exclude the influence of independent factors, for example).
Again, if a satisfactory explanation is found to the business domain problem this can be fed back into the observations and be used to enhance the knowledge base without needing to progress further. The theory may have been tweaked or changed a little but has been validated and shown to hold.
This is a fairly well established process in most companies where the observations can be deduced from theories that are already known and it underpins most of the operations and development work that businesses do.
In complex systems there may be many different factors influencing the observed behaviour.
Inductive Reasoning Cycles (Data Science)
Inductive reasoning is where the hypothesis we have made can be shown (with reasonable probability) to support the observations we have made. That is, the observations we make support a theory.
People who work with observations and data all the time can often generalise about the behaviour of a system from their experience of working with it.
Sometimes it may be necessary to develop a new theory as to why certain patterns are being observed, and it will be necessary to use the advanced statistical analysis techniques to see if the hypothesis and any assumptions hold true.
Statistical packages are important in doing this because they can be used to define a general theory (or model) that will (in probability) explain the observations we have and make insights into future observations and data.
Such models can be useful predictors of behaviour, and predictive analytics systems are becoming increasingly common to help inform and guide business decision making.
- Develop Testable Predictions:
Unresolved hypotheses of system behaviour can be developed into quantifiable, testable predictions.
Thus, a hypothesis may be something like “There is a relationship between price increases and the number of units sold”, a prediction would be more structured as in “A 10% increase in retail prices in the UK market will result in a 5% reduction in sales volume and profit”.
- Gather More Data:
One of the major jobs of data scientists is gathering, cleansing, merging and transforming the data that will be needed to test the predictions. In this example it would be necessary to extract the sales, customer, pricing and location data from the relevant systems to produce a cleansed set of data with which to carry out a statistical analysis.
Many companies will already be familiar with the process of cleansing, mastering and importing data for analysis and can obtain statistical analysis packages such as R, SAS or SPSS to assist in the data science activities.
It is also possible to incorporate qualitative data into the data modelling process, as long as it is converted into a quantifiable form first. In this way grouping and coding event types can be utilised, or even incorporating the results of questionnaires or interviews can be utilised to produce richer models of system behaviour.
- Refine, Alter, Expand or Reject Hypothesis:
Statistical analysis of the observation data will suggest whether or not there is a relationship in the dataset between the dependent variable (the price) and the independent variables (cost and location data), and, depending on the algorithms used will be able to suggest a generalisability about price sensitivity that could be incorporated into a more general theory.
Sometimes the evidence linking the observations to the predictions are non-existent or weak, in which case it is necessary to go back and reject or reframe the research hypothesis and predictions. Statistical correlations invariably contain degrees of uncertainty and these must be either taken into account by either amending the hypothesis or reducing the scope and reach of the resulting theory. As an example, the price sensitivity may only be applicable to a certain range or location, so either the hypothesis or the resulting theory need to be amended.
The data science cycle usually involves several cycles of processing, interpreting results and adjusting the predictions in order to reach a generalised model of system behaviour that appears to match the observations seen.
- Developing the Theory
The results need to be checked for accuracy, reliability, consistency and applicability to the problem domain before being incorporated into a new theory which can be utilised.
One way in which this can be done is by using sets of data to ‘train’ the model to assess its’ suitability prior to incorporating it into the knowledge management systems. This trained model is then trialled with additional observation data to see if the theory that the model uses can make useful insights.
Once the theory has been tested the business and development teams can utilise the models created to gain new insights, observations and incorporate this knowledge into the interpretation of the data.
- Apply Theory
Once a model has been developed it can then be incorporated back into the work systems. This may involve embedding the new algorithms or rules into systems, spreadsheets or database stores to take advantage of the new capabilities.
Summary
Developing new theories about the behaviour of systems is a cornerstone of business and product development and can help organisations to transition and transform their view of the market.
The true impact of taking a Data Science approach is using just enough of the science to be useful. For many day to day operations utilising the full data science capabilities to research the behaviour of their business is too much, it is often enough to utilise existing tools, processes and people to achieve their aims.
However, when modelling complex systems it is important that Data Science is incorporated into the business in a structured way. This enables the science part of the Data Science to investigate problems and issues, and to provide the advanced insights that can provide innovative business information solutions and true data driven business models.