Welcome to the third entry in my series of Hadoop and SAP-related posts where I would like to delve into costs and performances of different Nearline Storage solutions for SAP BW, including Hadoop – a highly scalable yet cheap to run platform for data storage and data processing.
In my first post I answered questions from our customers and explained why S/4HANA will not replace SAP’s Business Warehouse. In the second post I discussed different options of integrating the SAP ABAP world with Hadoop. One of these options is Glue, a middleware developed at Datavard. This middleware works for both ERP and BW and helps SAP customers leverage the power of Hadoop from a SAPGUI and ABAP-driven platform. Now, the time has come to take a closer look on Nearline Storage solutions.
Reducing the footprint of BW on HANA systems
When running a SAP Business Warehouse, one of the first-choices for integrating SAP with Big Data is NLS (Nearline Storage). NLS is a way to move data out of the primary database and store it in a more cost efficient, but slower secondary database. Data is still available for reporting, though – the SAP BW system will query both databases. For any end user of SAP BW, the location of the data they will remain invisible. Queries simply work. Performance is another question – the data from a cheap storage medium will not be available as fast as, for example, from the HANA in-memory column store.
SAP itself offers a NLS implementation based on the ex-Sybase IQ database to reduce HANA data volumes by moving parts of the data into a cold-storage. We offer our own implementation of Nearline Storage, which is fully compatible with SAP BW and certified by SAP.
Is the IQ database the best way out? At Datavard, we identified two issues with this solution:
- First of all, IQ is a good database – but it is expensive to run. For some data centres it also proves difficult to maintain and integrate into their IT processes. Of course, it is possible to have IQ in the cloud, but that comes at a hefty price – some cloud vendors offer IQ at hosting costs comparable to HANA. We see demand from our customers to have more flexibility to store data in other databases, e.g. Hadoop.
- Secondly, we see the additional need for housekeeping. That is an area all by itself, totally decoupled from archiving and Nearline Storage. Many of our customers have considerable potential for reducing data volumes by simply purging technical and redundant data.
NLS storage on Hadoop
Our solution OutBoard works on many storage types, including SAP/Sybase IQ and Hadoop. In fact, OutBoard can store data depending on its value and age in different storages connected to the same SAP BW system.
As far as Hadoop is concerned, there are different options available:
- HDFS is the most basic option. Basically, files are simply pushed into Hadoop’s File System. While easy to install and cheap to operate, performance is not very good.
- HIVE is a better option – it is an SQL engine running on Hadoop. Data in HIVE can be made available to Hadoop native applications. HIVE is not very fast, but scales very well. A query on 100 million records will have a runtime similar to a query running against a billion records.
- Impala is data processing engine built by Cloudera on top of HIVE. Datavard partners with Cloudera due to their excellent distribution of HIVE, good support for customers and partners, and finally their cutting-edge Impala implementation.
- Spark is a similar framework for processing data on HIVE, but in our experience it is slightly slower when compared to Impala. With HANA VORA (which is basically a Spark based SAP extension), Spark may gain additional ground when compared to Impala in the future.
Datavard recommendation: NLS on Hadoop
Looking at the operational costs, ease of implementation, and performance, my recommendation is very clear: Impala on HIVE is currently the most robust and advanced option to operate Nearline Storage for SAP BW. Some of our customers run their Hadoop cluster on premise by themselves, and are satisfied with this solution. All major cloud providers such as Azure or AWS support it.
TCO of databases and storage
Our recommendation is well-grounded once you consider the total cost of ownership of BW data storage on different databases as a “side car”. In most cases, you will end up with a trade-off between cost and performance.
Here is some recent data from 2016 based on collaboration with our customers implementing NLS. All these figures are customer-specific and may vary between different regions, customers, and other factors such as choice of hardware vendor or SAP discounts. All the charts below represent data from a group of our customers and from the recent project history (for our European customers, I used the current EUR to USD conversion rate).
The graph compares different databases, I included SAP HANA as a reference only. All the other databases can serve as NLS storage using Datavard OutBoard. SAP’s NLS implementation currently only supports SAP IQ as a side car. The lower the bar, the cheaper the TCO of the storage type will be.
One TB (Terabyte) of SAP HANA per year usually costs around 110,000 USD. A TB of SAP IQ weighs in at around 90,000 USD, so not too much below HANA itself. Both HANA and IQ compress data with pretty much similar factors (1:7). Similar (or even better) compression can be achieved with other databases such as IBM’s DB2 BLU or Oracle. DB2 Blu costs around 55,000 USD per year and TB. Oracle tends to be a bit cheaper.
Looking at the performance of the different storage types, the picture is very different. These two aspects need to be considered: the business requirement for the reporting performance, and the scalability of the performance with large data volumes. The following graph shows the reporting speed in BW queries with 10 million queried records in NLS and with 100 million queried records in NLS. Note the differences in performance and scalability.
SAP IQ is very fast for 10 million records – Hadoop is much slower than the column based SAP IQ. However, when we up the data volume, Hadoop does not look bad at all. Especially considering the difference between 10mio and 100mio records the true power of Hadoop can be seen.
Finally, it is interesting to look at the performance-to-cost ratio. To calculate it, I divided the average performance by the cost of the database. The higher the result, the better value for money:
The above figures lay the groundwork for our firm recommendation: Hadoop simply rules. As a side note, SAP has grasped it as well and the data we have here explains why SAP offers SDA and HANA VORA to integrate Hadoop with SAP BW on HANA for Big Data scenarios. At Datavard, we use a functional extension called Glue to bridge the gaps between SAP and the Big Data world (see here for more information: “Glue: Hadoop and Big Data Integration with SAP”)
So, what should you do now?
If you are running a large BW system, a heavily growing BW system, or BW on HANA (or all of the above), you should investigate the concept of NLS (e.g. this article by Jan Meszaros is a good start: “Six Things to Consider when Implementing Nearline Storage for SAP BW” ). Even if the figures you are looking at differ for your landscape (no matter if on premises or in the cloud), the following holds true: you can save big money and have an easier life with clever data management and NLS.
You might also want to get your hands on the Datavard NLS solution OutBoard for a test drive (PoC) with your own data in your own system landscape to familiarise yourself with the functionality and options. Datavard OutBoard is NLS solution certified by SAP and Cloudera and has been used successfully by the largest organizations since. OutBoard addresses all the limitations in the SAP NLS interface (especially write enabled accessed to archived data – BW stragglers) and provides massive automation through the complete implementation cycle (lookup translation, mass creation of DAPs and archiving request creation and scheduling).