Documentation
ODP 3.3.6.4-1
Release Notes
What is ODP
Installation
Component User guide and Installation Instructions
Upgrade Instructions
Downgrade Instructions
Reference Guide
Security Guide
Troubleshooting Guide
Uninstall ODP
Title
Message
Create new category
What is the title of your new category?
Edit page index title
What is the title of the page index?
Edit category
What is the new title of your category?
Edit link
What is the new title and URL of your link?
Connectivity
Summarize Page
Copy Markdown
Open in ChatGPT
Open in Claude
Connect to Cursor
Connect to VS Code
Spark Connect
Spark Connect is a client-server architecture that decouples Spark client applications from the cluster, enabling remote connectivity using the standard DataFrame API. Developers can connect from any IDE — PyCharm, Jupyter, VS Code — using a lightweight 1.5 MB client instead of the full 355 MB PySpark installation.
Install:
Bash
xxxxxxxxxxpip install pyspark-connectExample:
Bash
xxxxxxxxxxfrom pyspark.sql import SparkSession spark = SparkSession \ .builder \ .appName("Spark Connect Example") \ .master("sc://your-spark-server:15002") \ .getOrCreate() df = spark.read.format("csv").option("header", "true").load("/path/to/data.csv")df.show()Spark 4.1.1 Spark Connect improvements:
| Enhancement | Detail |
|---|---|
| Protobuf plan compression | Execution plans compressed with zstd, reducing network overhead for large/complex plans |
| Chunked Arrow result streaming | Query results streamed in chunks over gRPC, improving stability for large result sets |
| Large local relations | Removed the previous 2 GB size limit, enabling DataFrames from large Pandas or in-memory objects |
Spark ML on Connect
Spark ML on Spark Connect is now Generally Available for the Python client. A new model size estimation mechanism allows intelligent model caching on the driver — models are cached in memory or spilled to disk based on estimated size.
Bash
xxxxxxxxxxfrom pyspark.ml.classification import LogisticRegressionfrom pyspark.ml.feature import VectorAssemblerfrom pyspark.sql import SparkSession spark = SparkSession.builder.remote("sc://your-spark-server:15002").getOrCreate() data = spark.read.format("libsvm").load("/path/to/data.txt")lr = LogisticRegression(maxIter=10, regParam=0.3)model = lr.fit(data)print(model.coefficients)Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
Last updated on May 14, 2026
Was this page helpful?
Next to read:
Structured Streamingnull
Discard Changes
Do you want to discard your current changes and overwrite with the template?
Archive Synced Block
Message
Create new Template
What is this template's title?
Delete Template
Message