xStore Catalog Iceberg

Apache Iceberg Catalog

Overview

The Apache Iceberg catalog connects xStore to Iceberg table metadata so you can browse schemas and tables in the Catalog Browser and sync them to xCompute Compute Clusters for querying.

Two separate Iceberg surfaces exist in xStore — this guide covers the Catalog Browser catalog only.

  • This guide — Registering an Iceberg catalog inside a Metalake in the Catalog Browser. Use this when you have existing Iceberg tables managed by an external metadata store (Hive Metastore, a relational database, or a remote Iceberg REST service) and want to make them visible and queryable through xStore.
  • Iceberg REST tab — A built-in Iceberg REST service for creating and managing Iceberg namespaces and tables directly within your xStore cluster. See xStore Iceberg for that workflow.

When to Use the Iceberg Catalog

Use the Iceberg catalog when:

  • You have existing Iceberg tables on S3, GCS, ADLS, or HDFS and want to register their metadata in xStore.
  • Your Iceberg metadata is managed externally — in a Hive Metastore, a JDBC-accessible database, or a remote Iceberg REST catalog (such as Polaris or Nessie).
  • You want Trino or Spark on xCompute to query those Iceberg tables after a single sync operation.

Choosing a Catalog Backend

When creating an Iceberg catalog, the first choice is the Catalog Backend — where Iceberg's table metadata is stored. The data files themselves live in your object store or file system; the backend only stores the metadata (schemas, snapshot history, partition specs).

BackendMetadata stored inURI format
HiveHive Metastore (Thrift)thrift://host:9083
JDBCA relational database (PostgreSQL, MySQL)jdbc:postgresql://host:5432/iceberg_meta
RESTA remote Iceberg REST catalog (Polaris, Nessie, etc.)http://catalog-host:8181/api/catalog

Select the backend that matches where your Iceberg metadata currently lives.

Prerequisites

  • You have a running xStore cluster. See (Link Removed).
  • You have a metalake. See (Link Removed) .
  • Access to the metadata backend (Hive Metastore URI, database credentials, or REST catalog endpoint).
  • Access credentials for the storage system where your Iceberg data files are stored (S3, GCS, ADLS, or HDFS).

Creating an Iceberg Catalog

  1. Navigate to Data Catalog → Browse → Catalog in the sidebar.
  2. Select your metalake in the Catalog Browser tree and click New Catalog (or the + icon).
  3. Set a Catalog Name — for example, lakehouse. This becomes the Trino catalog identifier after sync.
  4. For Provider, select Apache Iceberg.

Common Fields (All Backends)

FieldRequiredDescription
Catalog BackendYesSelect hive, jdbc, or rest
URIYesThe endpoint for the metadata backend (format depends on backend — see below)
WarehouseYesThe root storage path where Iceberg data files are stored (e.g. s3://my-bucket/warehouse or hdfs://namenode/warehouse)

Hive Backend

The Hive backend stores Iceberg table metadata in an existing Hive Metastore. This is the most common configuration when you are adding Iceberg tables to an existing Hive environment.

URI format: thrift://hive-metastore.internal:9083

Hive Backend Fields

FieldRequiredDescription
Catalog BackendYesSelect hive
URIYesHive Metastore Thrift URI (e.g. thrift://hive-metastore.internal:9083)
WarehouseYesRoot path for Iceberg table data (e.g. s3://my-bucket/iceberg)
Hive Service PrincipalNoRequired only when using Kerberos auth. The Hive Metastore service principal (e.g. hive/_HOST@EXAMPLE.COM)

Hive Backend — Authentication

Select Kerberos if your Hive Metastore is Kerberos-secured. Otherwise, select Simple.

Kerberos fields:

FieldRequiredDescription
Kerberos PrincipalYesThe principal xStore authenticates as (e.g. xstore@EXAMPLE.COM)
Kerberos KeytabYesUpload the .keytab file for the Kerberos principal

Hive Backend — Optional Configuration Files

FileDescription
Core Site XMLUpload core-site.xml for custom HDFS configuration
HDFS Site XMLUpload hdfs-site.xml
Hive Site XMLUpload hive-site.xml for Hive-specific settings
Kerberos ConfigUpload krb5.conf for Kerberos environments

JDBC Backend

The JDBC backend stores Iceberg metadata in a relational database. This is common in setups where you want a lightweight, SQL-queryable metadata store without running a full Hive Metastore.

URI format: jdbc:postgresql://host:5432/iceberg_meta (or jdbc:mysql://host:3306/iceberg_meta)

JDBC Backend Fields

FieldRequiredDescription
Catalog BackendYesSelect jdbc
URIYesJDBC connection string pointing to your metadata database (e.g. jdbc:postgresql://db-host:5432/iceberg_catalog)
WarehouseYesRoot path for Iceberg data files (e.g. s3://my-bucket/warehouse)

JDBC Backend — Authentication

The JDBC backend always uses password authentication.

FieldRequiredDescription
JDBC UserYesUsername for the metadata database
JDBC PasswordYesPassword for the metadata database

REST Backend

The REST backend connects to any Iceberg REST-compatible catalog service — including Apache Polaris, Project Nessie, or any other service that implements the Iceberg REST Catalog specification.

URI format: http://polaris-server:8181/api/catalog (or https:// for TLS-secured services)

REST Backend Fields

FieldRequiredDescription
Catalog BackendYesSelect rest
URIYesBase URL of the Iceberg REST catalog service (e.g. http://polaris.internal:8181/api/catalog)
WarehouseYesWarehouse identifier as expected by the REST catalog service (may be a storage path or a catalog-specific name depending on the service)

REST Backend — Authentication

Select Simple for unauthenticated or token-based REST catalogs (where the token is passed via catalog properties). No additional credential fields are required in the form for Simple auth. Consult your REST catalog's documentation for token configuration if needed.

Storage Configuration

Regardless of which metadata backend you choose, Iceberg data files need to be read from and written to a storage system. Configure the storage section to provide credentials for the object store or file system where your Iceberg data lives.

Storage Type

Select the storage system that holds your Iceberg data files:

Storage TypeUse when data is on
S3Amazon S3 or S3-compatible storage (MinIO)
GCSGoogle Cloud Storage
ADLSAzure Data Lake Storage Gen2
HDFSHadoop Distributed File System

S3 Storage Configuration

FieldRequiredDescription
S3 EndpointNoCustom endpoint for MinIO or non-AWS S3 (e.g. http://minio.internal:9000). Leave blank for standard AWS S3.
S3 RegionNoAWS region (e.g. us-east-1)

S3 Authentication:

  • Access Key — Provide an S3 Access Key ID and S3 Secret Access Key for programmatic access.
  • IAM Role — Use the IAM role attached to the xStore cluster's node. No credential fields are required. This option is only available when xStore is running on AWS.

GCS Storage Configuration

GCS Authentication:

  • Service Account — Upload a GCS Service Account JSON key file. The service account must have Storage Object Viewer (read) or Storage Object Admin (read/write) permissions on the bucket.

ADLS Storage Configuration

FieldRequiredDescription
Azure Storage AccountYesThe name of your Azure Storage account (without .dfs.core.windows.net)

ADLS Authentication:

  • Account Key — Provide the Azure Account Key (storage account key from the Azure portal).
  • Service Principal — Provide Azure Tenant ID, Azure Client ID, and Azure Client Secret for an Entra ID (Azure AD) service principal with Storage Blob Data Contributor access.

HDFS Storage Configuration

HDFS Authentication:

  • Simple — No authentication. Use for non-Kerberos HDFS clusters.
  • Kerberos — Provide a Storage Kerberos Principal and upload a Storage Kerberos Keytab file. This is separate from the Hive Metastore Kerberos credentials if you are using the Hive backend.

Completing the Catalog

After filling in all the required fields:

  1. Click Create Catalog.
  2. The catalog appears in the Catalog Browser tree under your metalake.
  3. Expand the catalog to browse Iceberg namespaces (shown as schemas) and tables.
  4. Click any table to open the Table Detail Panel — showing column definitions, partition specs, and Iceberg properties.

Syncing and Querying

Sync the Iceberg catalog to your Compute Cluster to enable Trino and Spark queries:

  1. Select the catalog in the tree.
  2. In the right-hand panel, click Linked Clusters.
  3. Click Sync next to your Compute Cluster.
  4. In the SQL Editor, verify the catalog is available:
SQL
Copy

Common Issues

No namespaces appear after catalog creation

  • For the Hive backend: confirm the Metastore Thrift URI is reachable from the xStore cluster on port 9083.
  • For the JDBC backend: verify the JDBC URL, username, and password, and confirm the Iceberg catalog tables exist in the database (they are created by Iceberg on first write).
  • For the REST backend: confirm the REST catalog URL is reachable and returns a valid response.

Storage access errors when browsing tables

  • Verify that the storage credentials you provided have read access to the warehouse path.
  • For S3, ensure the bucket policy or IAM role allows s3:GetObject and s3:ListBucket on the warehouse prefix.
  • For ADLS, confirm the service principal has Storage Blob Data Reader or higher on the container.

Trino cannot read tables after sync

  • Ensure the Compute Cluster can reach both the metadata backend and the storage system.
  • Verify that the Warehouse path in the catalog matches the actual location of your Iceberg metadata files.
VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches