Hadoop is the hot topic of the Big Data world. With the help of open source frameworks any large data volume of various structures can be quickly and above all cost-effectively managed, used and evaluated.
Hadoop alone is however insufficient in meeting the requirements for Big Data analytics. Evaluating semi- or unstructured data in combination with current corporate data is advisable within an analytical database that applies modern analytical techniques. It makes sense therefore to separate sales and mass data processing and to trust the specialists in each discipline. The ideal mix between a high performance SAP HANA database and solid Hadoop platform opens new paths in the area of real time analytics, and at the same time saves costs.
At the beginning of September, “SAP HANA Vora” was introduced as an underpinning to this ideal constellation. This tool provides for an even deeper integration between the high performance In-Memory data platform SAP HANA and the Big Data component of Hadoop.
THE MAIN ATTRACTION: DISTRIBUTED DATA PROCESSING
A fundamental asset in comparison with other systems is that Hadoop is not dependent on expensive proprietary hardware for data storage and processing. The benefits of the distributed file systems extend for that matter even to the distributed processing of data and can therefore be scaled almost without limits over standard servers. Exactly this scalability and distributed processing is ideal for the management of the continually growing flood of data that surrounds us.
HADOOPONOMICS: THE NUMBERS SPEAK FOR THEMSELVES
Hadoop is not only an option but essential for Big Data scenarios, according to the market research company Forrester Research. In order to emphasize the financial uses of open source software, Forrester analysts coined the term “Hadooponomics”. According to Forrester, the costs for large Hadoop distributions amount to 2,000 to 3,000 dollars per node per year. In contrast, a HANA node costs approximately 750,000 dollars per year.
A well-known UK company compared the costs of conventional data storage with the estimated costs for a Hadoop replacement. One TB in an Oracle database generates costs in the amount of 48,000 Euro per year. The company calculated costs of 1,540 Euro per year for storage of the same data volume in Hadoop. In the face of these immense cost differences, it is economical to work on key data (the most valuable and most often used) in SAP HANA and keep the remaining data available in Hadoop.
On one hand, moving mass data from storage to a Hadoop cluster saves enormous costs in operating SAP HANA. On the other hand, Hadoop is optimally tied to SAP HANA over many interfaces. End users do not require any special know-how for Hadoop, as they can use the SAP HANA “Single Point of Entry”.
A major advantage of Hadoop is its data streaming. Mass data can be analyzed directly by statistical models in Hadoop. The essence of these analyses is subsequently simply passed on to SAP HANA. This approach makes it possible to evaluate large volumes of data from various structures with high performance and in real time. The combination of SAP HANA and Hadoop is also suitable for small businesses with minimal IT resources. With Predictive Analytics, SAP offers standard analyses that make it possible for businesses with minimal statistical know-how to create appropriate models.
This article comes from our Blue Book on integrating SAP Hana with Hadoop. Click on the link below to download the full pdf version.