Jupyter Hub Interactive Session

JupyterHub Interactive Session

JupyterHub Interactive Session provides a browser-based JupyterLab environment integrated with xDP, letting you interactively develop and run PySpark code against cluster resources and data stores without any local setup.

Launching a Session

  1. From the left navigation, select Notebooks and click Launch Notebook.
  2. Resource Configuration — Set CPU and Memory request/limits for the session. Accept defaults for a first session.
  3. Select Data Store — Click + Add data store dependency and choose a pre-configured Data Store (HDFS, S3, ADLS, ODP etc.). This securely injects the credentials and configuration needed to access the source.
  4. Advanced Settings — Expand to configure Spark dynamic allocation, a custom Docker image, environment variables, and Python paths.
  5. Click Launch Notebook. xDP provisions resources and opens JupyterLab in a new tab.

Running Your First Query

In the JupyterLab Launcher, click AccelData Pyspark to open a new notebook.

JupyterLab Launcher — AccelData Pyspark kernel

JupyterLab Launcher — AccelData Pyspark kernel

In the first cell, enter a Spark SQL query and press Shift + Enter:

Configuration Reference

ParameterDescriptionDefault
Request CPU / MemoryResources requested for the session container.1 CPU / 1G
Limit CPU / MemoryMaximum resources the session container can use.1 CPU / 2G
Data StorePre-configured xDP Data Store to attach for data access.None
Enable Dynamic AllocationSpark dynamically scales executors. Set Min, Max, and Initial Executors.Off
Jupyter Driver ImageCustom Docker image for the session.System default
Image Pull SecretsKubernetes secret for private registry authentication.None
Image Pull PolicyAlways, IfNotPresent, or Never.IfNotPresent
File to MountMount files or directories into the session container.None
Environment VariablesKey-value pairs injected into the session container.None

Best Practices

  • Always use Data Store dependencies to connect to data sources — avoids hardcoded credentials and enables centralized access governance.
  • Shut down sessions when done from the JupyterHub control panel to release cluster resources.
  • Version your notebooks — use the integrated Terminal to commit to a Git repository regularly.
  • Start with small resources (1 CPU, 2G Memory) and relaunch with more if the workload requires it.
VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches