Data Lake Is Not Exactly a Gold Mine, but…
Everybody is talking about “Big Data” and the enormous potential it has to optimize processes, enable machine learning, and (creepy!) spark into life Artificial Intelligence. This has come to levels where one is led to believe that implementing a Data Lake pretty much equals printing money. The skeptic in me not only dislikes the over-used term of a “Data Lake”, but also naturally doubts the immediate benefits. Therefore, looking at the data which is readily available and which may lead to natural improvements of business processes to save money short-term seems logical.
The colossal potential of Predictive Maintenance
In fact, a McKinsey has revealed last year that in the area of “supply-chain management and manufacturing” the biggest potential impact is seen in the area of Predictive Maintenance with a sky-rocketing amount of $500bn to $700bn of positive impact (“Notes from the AI frontier” featuring data from the McKinsey Global Institute analysis). This use case is closely followed by Yield Optimization with $300bn to $600bn potential, thus outranking and dwarfing other scenarios such as spend analysis by far.
Most global companies are using SAP’s ERP solutions to manage their finances and logistics – probably short of Oracle, Corp. Vital data is stored in SAP databases around the globe, ranging from financial ledgers, stocks, assets, to equipment and maintenance related data in the SAP PM module (Plant Maintenance). Looking at the enormous potential of Predictive Maintenance and the importance of SAP data, it appears only natural to combine these for a powerful scenario.
In this blog post I will show how this can be achieved using Datavard Glue as middleware.
Let’s take stock first of what data and tools we will need for such a scenario in three steps, including from data acquisition, data crunching, and the consumption of the results.
Step 1: Let there be Data!
Obviously, we need data, both from SAP and from non-SAP. From the non-SAP area, we will data depending on the type of business and machinery we want to implement the Predictive Maintenance scenario for. For example, we may need sensor data to determine if some machinery (e.g. a truck, a gearbox, …) is becoming louder or shaky. Vibration sensors are readily available, and no matter if this is a simple solution put together using a mini-computer such as a Raspberry Pi (which one might do for a Proof of Concept) or a more out of the box solution, this data can be easily collected. This data can be streamed into a data management platform for further processing (note how I’m avoiding the over-used and over-hyped term “data lake” here!).
Weather data may be very important as well – or even more important than sensor data. For example, the TfL (Transport for London, i.e. the guys running the London Underground to “keep London moving”) found that hot weather is responsible for a large share of equipment failures. That makes a Predictive scenario rather simple: grab readily available weather data from a data provider, or (heck!) simply use a thermometer and corelate this with equipment data (read: trains, engines, signals,….) – basically sorting with a “where last service is longest ago” priority, and there you go. Of course, in real life such a scenario will grow more accurate and more sophisticated (also complex) through the integration of Data Science (basically statistics).
From an SAP-perspective, you will want to grab PM data – depending on what is used in SAP PM of course: Equipment, Assets, Functional Locations, Maintenance Orders, Maintenance Plans, Work orders, and Measuring Points along with past measurements.
The figure below shows how Datavard Glue with its out-of-the-box content can help to bring SAP data (here with example data for SAP PM) to the data platform of your choice.
Both for non-SAP data (e.g. weather data from an external provider) and SAP data, our solution Datavard Glue can help to acquire the available data. In fact, most companies implementing a “Big Data” scenario such as Predictive Maintenance, but also others such as Customer 360, state that data acquisition from various data silos is one of the major tasks and challenges.
Step #2: Let there be algorithms!
Depending on the need you will want to use tools for statistical analysis (or Data Science if you wish), such as R. You will also want to use some tooling for data correlation, and of course for prediction. Various vendors will offer you different tools. Some platform vendors, such as Cloudera with their market leading Hadoop platform for example offer a Data Science Workbench which integrates many such solutions already.
However, to get started you will need no major cloud service subscription nor platform – you can get very far with very readily available tools such as MS Excel, Python, and some patience. Python libraries such as “statsmodels” can help tremendously. Of course, it may also make sense to explore options for unsupervised machine learning (such as “mllib”), but usually you will need some available infrastructure such as a Spark cluster for this.
Of course, you could get one in the cloud, but in case you want to explore and test drive options, you will be reluctant to spend money on some cloud account just to “play around”. The major cloud providers will be interesting for you once you consider standardization and a role out of your new use case.
Step #3: let there be integration!
What you will want to look at even during a PoC are options on how to standardize a scenario such as Predictive Maintenance later on, and how to integrate the various pieces. The figure below illustrates how you can leverage an integration technology such as Datavard Glue.
In this architecture, we use Datavard Glue for four purposes:
- We make SAP data available for the data platform with delta-enabled data replication, in real time streaming if required. Well, and including “deSAP-ifying” the data with friendly fields, contextualization, and lineage of course.
- We can leverage Datavard Glue to tap into REST APIs, e.g. to provide weather data.
- Datavard Glue can trigger scripts and actions on your data platform
- Finally, Datavard Glue can consume results from the data processing, e.g. to generate maintenance orders in SAP based on the predictions it makes
The “data platform” in this architecture may well be a Cloudera Hadoop cluster, or BigQuery with GCP, the Google Cloud Platform, or – and that is a good way to get started to dig into the potential benefits of Predictive Maintenance in your company – a set of simple CSV files which you crunch through a simple Python script.