SAP data tends to be complicated and cryptic

SAP’s ERP data model consists of tens of thousands of database tables. Here is an example to illustrate the complexity of SAP systems: At Datavard, we’re happily running SAP’s S/4HANA to manage our business processes, and a last count on our production system reveals 121.512 tables in the SAP Data Dictionary (many of them of course empty because they are industry and process specific, and therefore not used). With such a complexity, identifying the correct subset of data for your use case (e.g. churn reduction) and implementing appropriate contextualization of SAP data is a complicated task to implement data lake integration and scenarios.

Contextualization of data

What makes matters more complicated: data from transactional SAP systems is typically rather cryptic, with a large number of fields containing abbreviations, flags, and pointers to other tables. Such data should be translated into user friendly values which can be used directly, without constant lookups into other tables. After all, you will not want to rebuild the complete SAP data model by joining dozens of tables together on Hadoop when querying the data. The technology to be used for tapping into SAP data clearly needs to support such contextualization, which is sometimes also referred to as de-normalization of data.

Choosing the source of data: SAP BW

Therefore, there is a decision to make as to the source of data. Wouldn’t it be much easier to simply access data through SAP’s Business Warehouse? After all, some of the data discovery, contextualization, and cleansing required is already performed during the load of ERP data to SAP BW. There are advantages and disadvantages involved with using either ERP or BW as source of SAP data.

When using the SAP BW system as source for the data, you may get some advantages.

SAP Business Warehouse provides a generic data model with attributes, hierarchies, and texts. Data is granularly available through Info Objects, which are in turn used in Info Providers such as Info cubes and (A)DSO (Data Store Objects). This makes the identification of data through the use of Info Objects with clear names easier when compared to SAP ERP field names. Finally, software logistics and change control may be simpler when accessing data from BW instead of the transactional ERP system. BW systems are usually not considered “systems of record”, therefore simpler solutions for challenges such as authorizations, SoX compliance, export control, etc. tend to be available for BW when compared to ERP.

There are some disadvantages when tapping into BW, however. Ultimately, you will need to make a decision based on a tradeoff between the previously mentioned advantages, of course. When looking into the details, there is obviously a delay to get the data when reading it from SAP BW. ERP data first needs to be loaded from ERP to BW, which happens usually on a daily basis. Therefore, no (near) real-time use of SAP data on your data lake would be possible when loading data from BW.

Even worse, you will find that not all relevant ERP data is available in SAP BW. You may easily get 80% or more of the data you need, but when implementing complex scenarios you may find the data on the Business Warehouse lacking. Finally, failed or erroneous ERP loads to SAP BW cause tremendous pain when operating data flows from ERP to your data lake on a pass-through BW basis. Whenever loads to SAP BW fail, or are contaminated with wrong data, you will need to roll back and troubleshoot not only the SAP BW data, but also the data on the data lake.

Choosing the source of data: SAP ERP

Therefore, I see several advantages when loading data to data lakes directly from SAP ERP. First of all, a near-real time use of data is possible. Of course, the ETL solution of your choice needs to support very granular change data capture to achieve this. SAP SLT and our own solution, Datavard Glue, provide such features. I will go into more detail on the differences between SLT and Datavard Glue to help you decide on the best way for your use case.

Another advantage of accessing data directly in ERP is that all Business Logic is available for ETL, for example you can not only use raw table data, but also the output from interfaces such as BAPIs. You can even use of output of SAP transactions, e.g. stock list, or ABAP query output and replicate this to your data lake, e.g. to HIVE tables on Hadoop.

Finally, when running Datavard Glue in ERP, you are ready for bi-directional data integration: Data from your data lake can be directly consumed in business processes in SAP ERP. For example, you can use results of data processing on Hadoop during lookups in user exits when running ERP transactions.

To run on SAP ERP, you need to implement a solid authorization concept of course. Here it is of great benefit to use a native ABAP solution for ETL. With Datavard Glue, we provide a set of SAP authorization objects which you can use to ensure that users can only use their own data, and that they do not mis-use a data discovery or ETL tool as a backdoor to access data they should not be able to access.