Configure Access Privileges for Non-HDFS Users

This page describes how to configure permissions in Apache Ranger or other authorization systems so that Pulse can collect logs, metadata, and query data across Spark, Hive, Kafka, etc. To enable this, you must grant explicit permissions to non-HDFS users.

Steps to Configure

  1. Grant permissions to the Spark log directories.

For non-HDFS users, create a policy in Ranger (or equivalent authorization service) to allow read and execute access to the Spark history log directories. The default directory locations vary by Hadoop distribution:

  • "ODP 3.x and 2.x" and HDP 2.x/spark2-history
  • CDH and CDP/user/spark/applicationHistory, /user/spark/spark2ApplicationHistory
  1. Grant permissions to the Hive query data paths.

Provide read and execute permissions for the following Hive query data paths based on the distribution:

  • ODP 2.x, HDP 2.x, CDH 5.x, CDH 6.x/tmp/ad
  • ODP 3.x and HDP 3.x/warehouse/tablespace/external/hive/sys.db/dag_data, /warehouse/tablespace/external/hive/sys.db/query_data
  • CDP 7.x/warehouse/tablespace/managed/hive/sys.db/dag_data, /warehouse/tablespace/managed/hive/sys.db/query_data
  1. Add the Describe permissions for Kafka

If an HDFS user or other user connects to Kafka but does not have full privileges to read metadata from all topics:

  • Add the user to the default access policy for the Describe permissions on all topics in Ranger.
  1. Add the SELECT privileges in Ranger for Hadoop SQL.
    1. Log in to the Ranger UI.
    2. Navigate to Hadoop SQL.
    3. Locate the policy named all–databases, tables, and columns.
    4. Click Edit under Action.
    5. Add SELECT privileges for the non-HDFS user.

Result

  • Non-HDFS users gain the necessary permissions to collect Spark logs, Hive query data, and Kafka topic metadata.
  • Pulse can successfully retrieve and display the required monitoring metrics.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard