Documentation
ODP 3.3.6.4-1
Release Notes
What is ODP
Installation
Component User guide and Installation Instructions
Upgrade Instructions
Downgrade Instructions
Reference Guide
Security Guide
Troubleshooting Guide
Uninstall ODP
Title
Message
Create new category
What is the title of your new category?
Edit page index title
What is the title of the page index?
Edit category
What is the new title of your category?
Edit link
What is the new title and URL of your link?
Data Sources and Extensions
Summarize Page
Copy Markdown
Open in ChatGPT
Open in Claude
Connect to Cursor
Connect to VS Code
Python Data Source APIs
Spark 4.1.1 supports creating custom data sources and sinks entirely in Python — no Scala or Java required — for both batch and streaming queries.
Bash
xxxxxxxxxxfrom pyspark.sql.datasource import DataSource, DataSourceReader class CustomDataSource(DataSource): def reader(self, schema): return CustomReader(schema, self.options) class CustomReader(DataSourceReader): def partitions(self): return [None] def read(self, partition): for row in self._load_records(self.options.get("path")): yield row spark.dataSource.register(CustomDataSource, "custom_source")df = spark.read.format("custom_source").option("path", "/path/to/data").load()df.show()XML Connector
Built-in XML support for reading and writing XML files with configurable row and root tag options.
Bash
xxxxxxxxxxfrom pyspark.sql import SparkSession spark = SparkSession.builder.appName("XML Connector").getOrCreate() df = spark.read \ .format("xml") \ .option("rowTag", "record") \ .load("/path/to/data.xml") df.write \ .format("xml") \ .option("rootTag", "records") \ .option("rowTag", "record") \ .save("/path/to/output.xml")Data Source V2 (DSV2)
DSV2 provides a cleaner, higher-performance data source API with support for constraint pushdown, replacing older V1 APIs.
Bash
xxxxxxxxxxfrom pyspark.sql.datasource import DataSource class CustomDSV2(DataSource): def reader(self, schema): return MyDSV2Reader(schema, self.options) def writer(self, schema, overwrite): return MyDSV2Writer(self.options) spark.dataSource.register(CustomDSV2, "custom_dsv2")df = spark.read.format("custom_dsv2").option("option_key", "value").load()df.write.format("custom_dsv2").option("option_key", "value").save()Delta Lake 4.0
Delta Lake 4.0, compatible with Spark 4.1.1, introduces major lakehouse improvements.
| Feature | Description |
|---|---|
| Delta Connect | Full Spark Connect support for Delta operations |
| Coordinated Commits | Multi-cloud concurrent write support |
| Liquid Clustering | Adaptive clustering for faster reads and writes |
| Time Travel | Query historical versions by timestamp or version |
Bash
xxxxxxxxxxCREATE TABLE orders (id INT, amount DOUBLE, status STRING) USING delta;INSERT INTO orders VALUES (1, 99.99, 'pending'), (2, 149.50, 'completed'); -- Time travelSELECT * FROM orders VERSION AS OF 0;SELECT * FROM orders TIMESTAMP AS OF '2025-01-01T00:00:00'; -- Optimize with liquid clusteringALTER TABLE orders CLUSTER BY (status);Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
Last updated on May 14, 2026
Was this page helpful?
Next to read:
Custom Functionsnull
Discard Changes
Do you want to discard your current changes and overwrite with the template?
Archive Synced Block
Message
Create new Template
What is this template's title?
Delete Template
Message