Resource Management (Spark)

Choosing the Best Size for a Spark Cluster

Spark is a parallel data processing engine that can handle large amounts of data in memory. A Spark cluster consists of one driver and several executors.

The process of sizing a Spark cluster includes evaluating the best hardware and configuration for your individual use case and workloads. First, you need to determine the amount of data you will be processing and the desired performance level. This will help you decide how many executors and how much memory each executor should have. Additionally, you should consider the available resources in your infrastructure and budget constraints to ensure optimal cluster sizing.

Here are the essential considerations for sizing a Spark cluster:

Data size and complexity
Workload types
Memory and CPU
Storage
Data partitioning
Driver and executor configuration

Driver Sizing

Spark drivers can be configured with fewer cores and memory. In general, drivers are more responsible for dynamic code generation, plan generation, storing some collection results, and so on.

Executor Sizing

In most cases, a user will select at least one executor. There are now numerous techniques to adjust spark clusters to get higher performance while lowering hardware costs.

Resource Allocation Strategies for Spark Data Processing

Key	Description
N	The size of data in uncompressed mode, expressed in terabytes.
M	The number of partitions formed in Spark after reading data.

Case 1 - Entire data processed in a single pass and all in memory, no data spillage to disk:

Total Executor Memory (RAM) required: N + N x 0.3 (The 0.3 here represents 30% overhead memory.)
Total number of Executor Cores required: M (equivalent to the number of data partitions)

Case 2 - Entire data processed in a single pass, RAM, and disk spill:

Total Executor Memory Required (Ram+ Disk Spill): You can choose to reduce the RAM enough so that it is sufficient for spark task execution and have a larger disk so that it can spill over the cached data to disk.
Total number of Executor Cores required: M (equivalent to the number of data partitions if you want it in a single pass).

Case 3 - Entire data processed in multiple passes, RAM, and disk spill: Most used method

Number of Executor Memory required (Ram+ Disk Spill): You can choose to reduce the RAM enough so that it is sufficient for spark task execution and have a larger disk so that it can spill over the cached data to disk.

Assume you need to process 1TB of data with 1000 partitions.

As a result, the total memory requirement will be 1TB plus 30% of 1TB, which will be shared by all executors in the Spark cluster. You can choose to distribute this memory across RAM and disk. The cached data is written to disk, while RAM is used for execution and overhead. This reduces the amount of RAM allocated to each executor node.

Total Executor Cores required: The M can be divided into multiples here.

For example, if a 1 TB data set includes 500 partitions,

Here the M =500

If you are fine with the data being processed in 20 iterations, you can have Spark Executors spread across 100 cores.

The catch is that processing 500 data partitions will take 20 iterations, which will take longer than before. This is how a single core can be tuned to process several partitions.

Sample Cluster size computations for Multiple pass and Ram + Disk Spill:

Consider a 1 TB dataset with data quality rules such as range check, not null, contains, aggregates, and so on.

You can begin with the following cluster configuration:

In a node pool with 160 cores and 640GB of memory. (i.e. 20 nodes, each with 8 cores and 32GB RAM)
An executor cluster with 144 CPU cores (across executor nodes) is a good starting point. Distribute these cores evenly across 48 executors.
Each executor with at least 3 to 5 cores is preferable.
Starting with 14 GB of memory per executor is a decent starting point, and this can be increased to 20 GB.
You can use up to 80/90% of memory resources per node (including spark executor and overhead RAM), with the remaining 10% reserved for operating system use.
It is reasonable to use cores up to 90% for running our jobs.
To begin, the driver will require 1-2 cores and 2GB of RAM.
Allocate 20% of the executor RAM to overhead memory. So, 48 executors, each with 3 - 5 cores and 14-20 GB of RAM.

Scenario of spilled stored data to disk:

In this scenario, the data size exceeds the memory allocated to the cluster, thus attach 2-3X the disk capacity. 3 × 20GB = 60GB + some buffering over the executor pool.
Make sure that the sum of executor memory + executor disk size is more than the data size.

In this situation, the Spark cluster memory does not completely fit the entire data amount; it performs several spillovers to disk during execution, but the job will complete; the catch is a large number of IO calls to disk.

General Guidelines

Begin with 3-5 executor cores.
Each executor should have 14-20 GB of RAM.
Total cores among executors can be one-third of total partitions, assuming 1 partition is 128 MB in size.

1000 GB = 1000 GB / 128 MB = about 8000 partitions. So, across the executors in a spark cluster, you can go up to 8000/(3 to 5) cores.

Overhead memory should account for at least 20-30% of executor memory.
Always keep more disk space on hand.

In cases where the data is skewed, even if most of the executors are completed, some executors will take a long time, increasing the overall job time.

The decision to increase cores per executor, increase executors with fewer cores, or increase executor memory all depends on the processing design, job distributions, partitions, and so on.
You must continue monitoring for a few runs to fine tune the resources; the same setups will not work for the different runs due to the dynamic nature of data.

FAQ

Q1: Why is the size of data tricky in Spark?

In Spark, the size of data can be tricky because the data size at the source is often compressed. When Spark reads this data, it decompresses it, leading to a significant increase in size.

Q2: How much can data size increase when decompressed in Spark?

Decompressed data in Spark can increase substantially. For instance, 1TB of compressed data could expand up to 10 times its original size after being decompressed.

Q3: What should be considered when provisioning memory for executor nodes in Spark?

It's crucial to account for the potential increase in data size after decompression when provisioning memory for executor nodes. This helps in effectively managing memory resources and avoiding performance issues.

Q4: How does the increase in data size after decompression affect Spark's performance?

An increase in data size can lead to memory pressure on executor nodes, potentially causing performance degradation or out-of-memory errors if not properly managed.

Q5: What strategies can be used to manage large data sizes in Spark?

Strategies include optimizing data partitioning, efficient memory management, and tuning Spark configurations to balance memory allocation and processing needs.

Q6: Can data size expansion be predicted in Spark?

While exact prediction can be challenging, understanding the compression ratio and data characteristics can help estimate the potential increase in data size after decompression.

Handling Out of Memory (OOM) issues

Q1. What should I do if I encounter an OOM error in Spark?

If you face an Out-of-Memory (OOM) error, there are strategies to mitigate it. These include adjusting memory and core allocations in executors.

Q2. How can I resolve OOM errors without increasing executor memory?

If increasing executor memory isn't an option (e.g., due to node-level memory limits), consider reducing the core allocation in executors. This approach decreases concurrent processing but can help manage memory more effectively.

Q3. Will reducing core allocation in executors affect processing speed?

Yes, reducing core allocation will decrease concurrent processing, which may lead to slower processing times. However, it can be a necessary trade-off to resolve OOM errors.

Q4. What if additional memory is available? How should I adjust settings?

If extra memory is available, you can increase both the Executor Memory and the overhead memory while keeping the core allocation per executor constant. This adjustment helps handle larger data without reducing concurrent processing.

Q5. Does increasing Executor Memory affect concurrent processing?

Increasing Executor Memory and overhead memory without altering core allocation does not reduce concurrent processing. This method is preferable when additional memory resources are available.

Q6. Are there any other general tips for avoiding OOM errors in Spark?

Regularly monitor memory usage and adjust configurations as needed. Properly sizing your executors and being aware of the data size after decompression are key to preventing OOM errors.

Spark History Server Integration

ADOC extends its capabilities with the integration of the Spark History Server, offering a robust solution for monitoring Spark applications. This integration is pivotal for a comprehensive understanding of Spark job performance and behavior over extended periods.

The Spark History Server in ADOC plays a vital role in:

Debugging: Facilitating the troubleshooting of Spark applications.
Performance Analysis: Assisting in the detailed analysis of performance metrics.
Job Tracking: Providing insights into job status and execution behavior through historical data.

The Spark History Server, integral to Apache Spark's open-source distributed computing system, specializes in monitoring and analyzing Spark applications. It enables users to review completed Spark jobs, offering a persistent historical view that extends beyond the capabilities of the Spark Web UI, which is limited to active jobs. This server is essential for a comprehensive understanding of an application's performance over its entire lifecycle.

Configuration for Optimal Data Management

For effective utilization of the Spark History Server:

Storage Directory Configuration: Set up a storage directory on cloud platforms like GCP, GCS, S3, etc., to facilitate data read-write operations.
Access Permissions: Ensure appropriate permissions are granted to the service user for directory management. Key configurable properties of the Spark History Server include:
- Application Retention: Control the number of applications retained by the server.
- Data Reading Interval: Set the frequency at which the server reads data.

These properties are specific to the history server and do not directly affect Spark job configurations

To integrate the Spark History Server with ADOC:

Deployment: Integrate the history server as an integral component of the data platform.
Storage Setup: Configure the designated storage directory.
Parameter Configuration : Enable the necessary parameters for optimal server functionality.


Integration with Data Planes	Customized History Server setup for each data plane You can verify that all historical data and events are appropriately captured in the designated directory by modifying the history server setup in ADOC for each data plane. This is critical because different data planes may have different needs or data sources that must be recorded accurately. A data plane that focuses on consumer behavior data, for example, may have different data processing requirements than a data plane that analyzes sales data. We may maximize data collecting and analysis for specific use cases by customizing the history server arrangement for each data plane.
Monitoring the Completed Jobs	Comprehensive job monitoring capabilities The Spark History Server offers comprehensive job monitoring capabilities, allowing users to track and analyze all completed jobs. It provides detailed information on the success or failure of each job, including metrics such as execution time, resource utilization, and output data. The server presents these insights through an intuitive user interface, making it easy for users to navigate and understand the performance of their Spark jobs. With this level of monitoring, users can quickly identify bottlenecks, optimize job configurations, and improve overall efficiency.

Last updated on

Was this page helpful?