Business reporting without data coming from SAP systems is hardly possible. However, there are two kinds of professionals in the area of Business Reporting and Data Warehousing: those coming from an SAP background, and those who come from a technical or non-SAP background.

If you’re coming equipped with an SAP background (like myself), you will be very familiar with a complex relational database model which used to be called SAP R/3 (nowadays S/4HANA), you will be very familiar with business processes and with enterprise use cases. If you have a non-SAP background, you will probably know more about streaming data, sensor data, maybe even machine learning, and data lakes.

Personally, I have an SAP background. For all kinds of reporting there used to be strong SAP based solutions. In the old days, this used to be ABAP query, LIS, and CO-PA, followed by the SAP Business Warehouse which of course was a giant leap forward. Nowadays, a similar leap forward is happening in the integration of SAP data and data lakes.

Data lakes and SAP systems – two different worlds?

When I was looking at the challenge of integrating data lakes with SAP data I was a bit dumbfounded by two strong opinions which struck me at …. let’s simply say odd.

  • I found it strange that people were doubting the use of a data lake for reporting purposes.
  • I found it equally strange that some people were even considering using the SAP Business Warehouse or SAP HANA for non-SAP data.

The second scenario is interesting: SAP BW is built for highly structured data, coming from SAP transactional systems. SAP HANA is lightning fast and can handle all kinds of data (including unstructured data). However, using such technologies to basically do the job of a Data Lake cannot be right. TCO, available tools and platform, skills on the market, … there are strong reasons for using SAP data on a Data Lake (e.g. on the Hadoop of your choice, e.g. Cloudera Hadoop), and not the other way round load Data Lake data into BW or HANA. If you are thinking “SAP Data should stay within the SAP technology stack”, you may want to consider SAP BDS (formerly Altiscale) or HANA VORA, of course.

As a consequence, there is only one option: using data generated by transactional Business Processes within the SAP world within Data Lakes. This is where SAP and non-SAP data should be combined. The use of such an integrated data lake for advanced data processing, for data marts, and as next generation Business Warehouse.

Hortonworks, recently (much to my delight) acquired by Cloudera, has been propagating such an approach for a while already. This seems like the natural way to go: establish a multi-purpose data lake and use this Lake for scenarios ranging from pure data storage (e.g. archived data, aged from SAP), to data science, machine learning & AI driven data computations. Along the way, reporting capabilities are natural must-have to visualize and consume the data.

Integrating SAP with data lakes – where to start?

When it comes to the integration of SAP data and data lakes a typical first step is to use the data lake for simply storing aged SAP data to reduce the overall SAP TCO. This is a good first step, because you provide results fast by “picking the low-hanging fruit”. You can easily use a data lake for ERP archives and aged SAP BW data, for example.

This diagram shows the required connections and building blocks for a modern integration architecture between SAP and Data Lakes:

This is relatively easy and fast to achieve and provides tangible TCO savings. The side-effect is that if you have this kind of integration established, the ground is already prepared for more advanced use cases, because the technology integration is there, and you are ready to enrich your data lake with SAP data.

Obviously, that leaves several challenges for SAP customers:

  1. there is a major knowledge gap between experts in the SAP and the Big Data world. This knowledge gap may be of mainly technical nature for SAP experts when they need to work with a data lake. This is a big challenge of course. In the opposite direction, the challenge is even bigger, because of the nature of customer specific business processes combined with a complex transactional data model in SAP which come on top of technology.
  2. There are major technology gaps which need to be bridged: processing engines, different storage technologies, security, software logistics, and the question of where to run the platform: in the cloud or on-premise.
  3. Finally, there is a challenge to identify and access relevant data from SAP. There are several ways of doing this of course, all with their own pros and cons. You can access ERP directly, you can use SAP BW as a kind of pass-through. You can use a combination of SLT with SAP Data Services, or a 3rd party ETL tool. In all cases, the challenge of identifying the right data and treating it right when it comes to access control, authorizations, and personal data protection remains.