What is SAP HANA Cloud Data Lake

In this SAP HANA Cloud tutorial, we will discuss SAP HANA Cloud Data Lake and learn about its characteristics, and components, also learn how to create a data lake instance in SAP Hana Cloud and access it.

What is SAP HANA Cloud Data Lake?

A data lake is a place where all kinds of data are stored. To make data-driven decisions, data from this repository can be accessed, reviewed, and used.

Using a cloud data lake enables your company to cut costs, boost performance, and gain faster access to large-scale, inexpensive data that produces insights. The cloud data lake has relational capabilities and acts on object storage and files.

Why do we need to build a cloud data lake in SAP Hana Cloud

Data lakes are built to manage costs while storing data at petabyte size, and many more reasons like:

  • Cost-effective capacity: Use elastic scalability, start with small, and expand your data lake to petabytes.
  • Combine data from various temperatures: Easily shift data between warm, cold, and hot states.
  • Get rid of the data swamps: Don’t move data for data science purposes, instead, give a single, integrated data lake to your specific use cases and lines of business.
  • Create applications for data science: Make use of a lot of data and models to power your data science applications.

Characteristics of a data lake in SAP Hana Cloud

A data lake has three key characteristics:

  • Landing Zone: A location where your raw data can reside.
  • Staging Zone: A staging area where data is transformed for analytical purposes.
  • Data exploration Zone: A region of data exploration where analytics, apps, and machine learning models use data.

Component of a data lake in SAP Hana Cloud

The data lake Relational Engine component and the default data lake files component are the two essential parts of the SAP HANA Cloud data lake.

SAP HANA Cloud Data Lake
data lake

Data Lake Files

Data lake files provide controlled access to the storage of semistructured, unstructured, and structured data files.

Multiple file containers are a technique used by Data lake files. The diagnostics file container and the default file container are both provided when you build a data lake instance. You control the file container and SAP is in charge of the diagnostics file controller.

  • File Container: It is referred to as the managed object store, this is a cloud storage repository that is owned and administered by SAP

The client controls who has access to the file container. You can manage the content of the file container and write to it directly through its REST API or indirectly through services like data lake Relational Engine or SQL on file.

  • The file container enables you to use Files as a ‘data lake’ which is a repository for Big Data. In the file container, you can put structured, unstructured, and semi-structured files.
SAP HANA Cloud Data Lake
Data lake file container and diagnostics file container

The data lake Relational Engine must be enabled in order for the diagnostics file container to exist during the creation of the data lake instance.

Data Lake Relational Engine

Large volumes of data are stored and analyzed by relational engines. To reduce the expense, it makes use of cheap storage solutions while preserving top-notch performance and completed SQL access to the data.

Structured data is kept in a relational engine. Use data lake files, if you want to keep unstructured data. The data lake Relational Engine is by default activated when you provision a data lake instance, but it can be deactivated.

How to create Data Lake in SAP HANA Cloud

A data lake can be built independently or integrated with a HANA database. It is optional in either case to enable the data lake Relational Engine. The data lake files component, however, is always active regardless of how you build the data lake.

Creating a standalone Data Lake instance

We are going to use the SAP HANA Cloud Central to create the instance of a data lake with the enabled Relational Engine component.

Click on Create button in SAP HANA Cloud and select the Data Lake.

Data Lake SAP HANA Cloud

Then choose the SAP HANA Cloud, Data Lake, and click on Next Step button as shown in the below picture.

Data Lake in SAP HANA Cloud

The next step appears for providing the details about your data lake instance. In the Basic section, give the instance a name such as hana_data_lake and an optional description if you want. Then click the Next Step button.

Create data lake instance in SAP Hana Cloud
how to create data lake instance basic details

Choose the Allow All IP Addresses under the section Allowed Connections, then click on the Next Step button.

Create data lake instance connections in SAP Hana
how to create data lake instance connections

The next step appears for setting the parameters for your data lake Relational Engine:

  • The data lake instance has the Data Lake Relational Engine component activated by default.
  • Provide a password for the user HDLADMIN under the Credentials section. Then re-enter the password to confirm it. Keep this password in mind.
  • While creating a data lake Relational Engine, the default user created is the HDLADMIN user.
Create data lake instance in SAP Hana Cloud
how to create data lake instance credentials

Accept the Coordinator and Workers default values for Size on the same page, then click Next Step.

what is sap hana data lake
how to create data lake instance data lake size

The next step appears for setting up the instance of your data lake with advanced options.

  • Choose Configure to be most compatible with SAP IQ under Initialization Mode.
  • Accept the default values under the General and Nchar options.
  • For your data lake instance, the Automatically Backup Database check box is already checked, if you have paid user, but are on trial it is disabled.
  • Click on the Review and Create button
data lake in sap hana cloud
how to create data lake instance data lake advanced setting

The next step appears for reviewing the setting of the instance, after reviewing click on the Create button to create a data lake instance.

what is sap hana data lake
how to create data lake instance reviewing instance setting

After performing the above steps, you will see the created instance on SAP HANA Cloud Central.

how to create data lake instance created
how to create data lake instance created

How to access data lake in SAP HANA Cloud

You can access the data lake and begin storing, accessing, and manipulating the data after creating the data lake. Before accessing the data lake make sure that the data lake is active.

To check the data lake instance whether it is running or not, use the SAP BTP cockpit to access the SAP HANA Cloud Central.

  • On the page, you will see the running instances. By choosing to view all the instances, you can find the information of your standalone data lake instance, if it is running, then ok, otherwise start the data lake instance.

To connect to a data lake instance, you can use a number of different approaches.

  • Interactive SQL command line interface (CLI)
  • SAP HANA Database explorer
  • Interactive SQL (dbisql) – a graphical interface
  • isql

Now, we will use each approach to connect to the data lake one by one:

Using SAP HANA Database Explorer

Once the running state of the data lake instance is verified, click on the three dots and chose to open the SQL Console.

data lake in sap hana cloud
How to access the data lake in SAP HANA Cloud using database explorer

After clicking on OPEN SQL Console, it asks for credentials of the data lake instance that we have to define while creating it. Enter username and password for the instance.

After performing the above steps, now you have access to the data lake instance using the SAP HANA Database Explorer.

  • In the database explorer’s left-hand database menu, search for the name of your data lake instance.
  • Now to launch the new SQL Console, click the SQL icon in the top-left area.
data lake in sap hana cloud
How to access the data lake in SAP HANA Cloud database explorer launching sql console

To connect with another method, you can follow the official documentation of SAP HANA Cloud.

Conclusion

In this SAP HANA Cloud tutorial, we have learned what is Data Lake in SAP HANA Cloud and its characteristics with components of the data lake like data lake files and Relational Engine. Also, we covered, how to create Data Lake in SAP HANA Cloud and different approaches to access the data lake instance.

You may also like: