Data Sources and Extensions

Python Data Source APIs

Spark 4.1.1 supports creating custom data sources and sinks entirely in Python — no Scala or Java required — for both batch and streaming queries.

Bash
Copy

XML Connector

Built-in XML support for reading and writing XML files with configurable row and root tag options.

Bash
Copy

Data Source V2 (DSV2)

DSV2 provides a cleaner, higher-performance data source API with support for constraint pushdown, replacing older V1 APIs.

Bash
Copy

Delta Lake 4.0

Delta Lake 4.0, compatible with Spark 4.1.1, introduces major lakehouse improvements.

FeatureDescription
Delta ConnectFull Spark Connect support for Delta operations
Coordinated CommitsMulti-cloud concurrent write support
Liquid ClusteringAdaptive clustering for faster reads and writes
Time TravelQuery historical versions by timestamp or version
Bash
Copy
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
  Last updated