xStore Architecture

xStore — Unified Data Catalog and Architecture

Overview

What is xStore?

xStore is the unified data catalog and apply governance layer for the Acceldata xDP platform. It provides a single place to register, organize, and govern all of your catalog — whether they live in legacy and modern data stack like hadoop, cloud object store, relational database, iceberg, data lake etc — and makes those catalogs discoverable and queryable by any connected compute engine such as Spark, Trino etc.

Supported Catalog Providers

TypeProvider(s)Use Case
RelationalPostgreSQL, MySQL, HiveOperational and analytical RDBMS
Object StoreFiles on Azure (ADLS Gen2), GCS, S3 (and S3-compatible: MinIO, Vast S3, Ceph S3, Apache Ozone)Raw and curated zones on cloud or sovereign object storage
Lakehouse — Open Table FormatApache IcebergUpsert-capable open table formats; ACID, schema evolution, time travel
Metadata / Catalog LayerAWS GlueCentralized metadata catalog for schema, partitions, and table discovery across data lake
Distributed File SystemHadoop / HDFSExisting on-prem Hadoop estates and HDFS-resident tables
StreamingApache KafkaKafka topics exposed as queryable tables
Cloud Data WarehouseSnowflakeSnowflake databases and schemas
Lakehouse — FederationDatabricks Unity CatalogFederate an existing Unity Catalog without data movement

Capabilities

  • Multi-Source Metadata Federation: Register and govern data assets across heterogeneous systems — Iceberg, Hive, PostgreSQL, Snowflake, S3, Kafka, and more — through a single UI and API, without moving the underlying data.
  • Hive-Compatible for Legacy Workloads: Existing Spark jobs and Hive workloads discover and query xStore-managed tables without modification, through a built-in Hive Metastore-compatible interface.
  • Open Standards, No Vendor Lock-In: Built on Apache Gravitino and the Apache Iceberg REST Catalog specification. Any Iceberg-native client — Spark, Trino, PyIceberg — can connect using the open REST protocol without proprietary drivers or SDKs.
  • Automatic Governance Registration: Every catalog registered in xStore is automatically backed by an Apache Ranger service definition. Fine-grained access policies at the catalog, schema, and table level are enforced at query time — from the moment a catalog is created.
  • Native Iceberg Lakehouse Management: Create and manage Iceberg namespaces and tables directly from the portal, with built-in support for schema evolution, hidden partitioning, time travel, and snapshot history — no external metadata store required.
  • One-Click Catalog Sync to Compute: Push catalog configuration to Trino and Spark in one click. No manual reconfiguration — compute engines pick up new catalogs within seconds.

Architecture

xStore is positioned between query engines and metadata backends. It exposes multiple protocol surfaces upward so any engine can connect using its native catalog protocol, and routes through a set of core capabilities to one or more metadata backends downward. The same control point is where governance, lineage, and observability are evaluated, so a policy authored once is enforced uniformly across every engine that goes through xStore.

Protocol surface

xStore speaks several catalog protocols upward so engines can connect using whatever they already speak:

  • Iceberg REST API — the open standard catalog protocol. Engines that speak Iceberg REST (Spark, Trino, Flink, PyIceberg, Snowflake external tables) connect natively. This is the recommended path for new integrations.
  • Hive Metastore Thrift — legacy compatibility for engines that still speak Hive Thrift. xStore terminates Thrift requests and serves them from the same underlying state.
  • Glue API passthrough — drop-in replacement for direct Glue API calls. Engines pointed at xStore's Glue endpoint behave as if they were calling Glue directly but benefit from xStore's caching and federation transparently.
  • JDBC and REST — for catalog browsing, metadata search, and integration with BI tools and data discovery applications.

Core capabilities

Six capabilities sit between the protocol surface and the metadata backends:

  • Metadata cache — hot tables, partitions, and statistics are cached in memory and periodically refreshed against the backend. Read latency drops 10 to 100 times, and backend API quota consumption drops correspondingly. Cache invalidation is event-driven where backends support change notification, time-based otherwise.
  • Federation router — presents a single namespace tree that spans multiple backends. A query can reference us_east.glue.sales.orders and eu_central.hive.crm.customers in the same statement; xStore routes each table reference to the correct backend transparently.
  • Schema evolution — Iceberg snapshot semantics are exposed uniformly. Time-travel queries (read the table as of a specific snapshot or timestamp), branch and tag operations for staged changes, and safe schema changes (add column, rename column, reorder) are first-class.
  • Partition optimizer — predicate pushdown, partition pruning, and statistics-driven planning. xStore maintains column-level statistics (min, max, null count, distinct count) and serves them to engines that support cost-based optimization.
  • Domain isolation — the same xStore instance can serve many tenants or business domains, each with its own logical namespace, policy bindings, and quota. This is what makes xStore work as a multi-tenant catalog without exposing one tenant's metadata to another.
  • Compaction service — Iceberg tables accumulate small files over time, especially in streaming write patterns. The compaction service rewrites them periodically into larger files, manages snapshot expiration, and keeps query performance stable.

Replication and lineage

Three additional capabilities handle multi-region operation and observability:

  • Async Glue replication — for AWS-native services that read directly from Glue (Athena, Redshift Spectrum, EMR), xStore replicates schema changes back to Glue asynchronously. Glue stays current; engines that go through xStore get richer features; nothing breaks.
  • Cross-region metadata replication — for multi-site disaster recovery, xStore replicates its own metadata to a passive site. RPO is typically seconds; failover is a configuration change at the engine endpoint.
  • Lineage capture — every read and write through xStore is recorded with column-level provenance. The lineage graph is queryable through GraphQL and powers impact analysis, compliance reporting, and data discovery.

Object storage as the data layer

xStore catalogs metadata, but the actual data files (Parquet, ORC, Iceberg manifests) live on object storage. The data layer is intentionally simple — append-mostly, immutable, content-addressable. This separation is what enables independent scaling:

  • Object storage scales to petabytes and beyond without capacity planning. S3, GCS, ADLS, Vast S3, Ceph S3 and Ozone all provide effectively unlimited capacity at flat per-GB pricing.
  • Compute is provisioned per workload — transient clusters spin up against the same files that persistent clusters are reading.
  • Apache Ozone, Vast S3 and Ceph S3 provide an S3-compatible interface for sovereign on-prem deployments, allowing the same Iceberg layout to work in air-gapped environments.

Governance enforced at the catalog protocol layer

The architectural payoff of routing through xStore is that governance is enforced at the catalog protocol layer, not in each engine. A Ranger policy that masks a column for a particular role applies to Spark, Trino, Flink, and Snowflake equally — because none of them can see the column without going through xStore, and xStore evaluates the policy on every metadata request.

Four governance services compose at this layer:

  • Apache Ranger — table, column, and row-level policies. RBAC and ABAC supported; tag-based policies for sensitive data classification.
  • Lineage and impact analysis — captured automatically; queryable; powers discovery and compliance review.
VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches