Apache Spark 4.1.1

Apache Spark 4.1.1 is a major milestone in the Spark 4.x series, delivering a comprehensive set of improvements across streaming, SQL, Python, connectivity, and usability.

This page consolidates all key features with working code examples and configuration references.

AreaFeatures
ConnectivitySpark Connect, Spark ML on Connect GA
Structured StreamingReal-Time Mode, Arbitrary Stateful Processing V2
SQLANSI Mode, SQL Scripting GA, VARIANT GA + Shredding, Recursive CTE, Approximate Sketches, Collation
PySparkArrow-native UDFs/UDTFs, Python Worker Logging, Python Data Source with Filter Pushdown, pandas 2.x
Data SourcesPython Data Source APIs, XML Connector, Data Source V2 (DSV2), Delta Lake 4.0
Custom FunctionsSQL UDFs, Python UDTFs, Arrow UDFs, Python UDFs, UDF Profiler
UsabilityStructured Logging, Error Class Framework, Behavior Change Process

For details, access the following pages:

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
  Last updated
On This Page
Apache Spark 4.1.1