Configure Access Privileges for Non-HDFS Users

This page describes how to configure permissions in Apache Ranger or other authorization systems so that Pulse can collect logs, metadata, and query data across Spark, Hive, Kafka, etc. To enable this, you must grant explicit permissions to non-HDFS users.

Steps to Configure

Grant permissions to the Spark log directories.

For non-HDFS users, create a policy in Ranger (or equivalent authorization service) to allow read and execute access to the Spark history log directories. The default directory locations vary by Hadoop distribution:

"ODP 3.x and 2.x" and HDP 2.x – /spark2-history
CDH and CDP – /user/spark/applicationHistory, /user/spark/spark2ApplicationHistory

Grant permissions to the Hive query data paths.

Provide read and execute permissions for the following Hive query data paths based on the distribution:

ODP 2.x, HDP 2.x, CDH 5.x, CDH 6.x –/tmp/ad
ODP 3.x and HDP 3.x – /warehouse/tablespace/external/hive/sys.db/dag_data, /warehouse/tablespace/external/hive/sys.db/query_data
CDP 7.x – /warehouse/tablespace/managed/hive/sys.db/dag_data, /warehouse/tablespace/managed/hive/sys.db/query_data

Add the Describe permissions for Kafka

If an HDFS user or other user connects to Kafka but does not have full privileges to read metadata from all topics:

Add the user to the default access policy for the Describe permissions on all topics in Ranger.

Add the SELECT privileges in Ranger for Hadoop SQL.
1. Log in to the Ranger UI.
2. Navigate to Hadoop SQL.
3. Locate the policy named all–databases, tables, and columns.
4. Click Edit under Action.
5. Add SELECT privileges for the non-HDFS user.

Result

Non-HDFS users gain the necessary permissions to collect Spark logs, Hive query data, and Kafka topic metadata.
Pulse can successfully retrieve and display the required monitoring metrics.

Last updated on

Was this page helpful?