Title
Create new category
Edit page index title
Edit category
Edit link
Summary of New Features
This section consists of a summary of new features introduced in this release.
Java11 is now supported at runtime for all ODP stack components.
Hadoop
Hadoop for more than 2 NameNodes
The initial implementation of HDFS NameNode high-availability provided for a single active NameNode and a single Standby NameNode. By replicating edits to a quorum of three JournalNodes, this architecture can tolerate the failure of any one node in the system.
However, some deployments require higher degrees of fault tolerance. This is enabled by this new feature, which allows users to run multiple standby NameNodes. For instance, by configuring three NameNodes and five JournalNodes, the cluster can tolerate the failure of two nodes rather than just one.
The HDFS high-availability documentation is now updated with instructions on how to configure more than two NameNodes.
HDFS Router-Based Federation
HDFS Router-Based Federation adds an RPC routing layer that provides a federated view of multiple HDFS namespaces. This is similar to the existing ViewFs and HDFS Federation functionalities, except the mount table is managed on the server side by the routing layer rather than on the client. This simplifies access to a federated cluster for existing HDFS clients.
For more information, see HDFS-10467 and the HDFS Router-based Federation documentation.
HDFS RBF: RDBMS-based token storage support
HDFS Router-Router Based Federation now supports storing delegation tokens on MySQL, HADOOP-18535 which improves token operation throughput over the original Zookeeper-based implementation.
HDFS: Dyanamic DataNode Reconfiguration
HDFS-16400, HDFS-16399, HDFS-16396, HDFS-16397, HDFS-16413, HDFS-16457.
Several Datanode configuration options can be changed without having to restart the datanode. This makes it possible to tune deployment configurations without restarting the cluster-wide Datanode.
Support for Microsoft Azure Data Lake and Aliyun Object Storage System FileSystem Connector
Hadoop now supports integration with Microsoft Azure Data Lake and Aliyun Object Storage System as alternative Hadoop-compatible filesystems.
API-based Configuration of Capacity Scheduler Queue Configuration
The OrgQueue extension to the capacity scheduler provides a programmatic way to change configurations by providing a REST API that users can call to modify queue configurations. This enables automation of queue configuration management by administrators in the queue’s administer_queue ACL.
For more information, see YARN-5734 and the Capacity Scheduler documentation.
Support with New File System APIs
HADOOP-18671 moved several HDFS-specific APIs to Hadoop Common to make it possible for certain applications that depend on HDFS semantics to run on other Hadoop compatible file systems.
In particular, recoverLease() and isFileClosed() are exposed through the LeaseRecoverable interface, while setSafeMode() is exposed through the SafeMode interface.
In particular, recoverLease() and isFileClosed() are exposed through the LeaseRecoverable interface, while setSafeMode() is exposed through the SafeMode interface.
Adoption of Iz4-java and snappy-java
For LZ4 and Snappy compression codec, Hadoop now moves to use lz4-java and snappy-java instead of requiring the native libraries of these to be installed on the systems running Hadoop.
Support for non-volatile Storage Class Memory (SCM) in the HDFS Cache Directives
Aims to enable storage class memory first in the read cache. Although the storage class memory has non-volatile characteristics to keep the same behaviour as the current read-only cache, we don’t use its persistent characteristics currently.
Application Catalog for YARN Applications
The application catalog system provides an editorial and search interface for YARN applications. This improves the usability of YARN for managing the life cycle of applications.
Scheduling of Opportunistic Container
Scheduling of opportunistic containers through the central RM (YARN-5220), distributed scheduling (YARN-2877), as well as the scheduling of containers based on actual node utilization (YARN-1011) and the container promotion or demotion (YARN-5085).
MapReduce Task-level Native Optimization
MapReduce now provides support for a native implementation of the map output collector. For shuffle-intensive jobs, this can lead to a performance improvement of 30% or more.
For more information, see the release notes for MAPREDUCE-2841.
Apache Hive: Summary of New Features
- Hive Iceberg Integration: Streamlines data management with seamless integration of Apache Iceberg tables;
- Improved Transaction and Locking Capability: Enhances the ACID compliance of Hive with improved transaction handling and locking mechanisms;
- Table Maintenance: Introduces compaction mechanisms for both Hive ACID and Iceberg tables to optimize storage and performance;
- Hive Docker Support: Simplifies deployment with official Apache Hive Docker images for easier setup and configuration. Explore the Docker images on Docker Hub for seamless deployment;
- Compiler Improvements: Anti-join support, branch pruning, column histogram statistics, HPL/SQL support, scheduled queries, new and improved cost-based optimization (CBO) rules leading to better query plans;
- Materialized Views Support: Enables the creation and management of materialized views for accelerated query processing;
- Runtime Optimizations: Enhances query performance with optimizations in Apache Tez and Apache Hive LLAP, ensuring faster data processing;
- Hive Replication: Introduces improved replication features both for external and ACID tables for efficient data distribution and disaster recovery; and
- Support for Apache Ozone: Introduces support for Apache Ozone, enabling seamless integration with Ozone-based object stores for scalable and efficient storage solutions.
Apache HBase: Summary of New Features
[HBASE-28168] - Add an option in RegionMover.javato Isolate One or More Regions on the RegionServer.
- Description: This enhancement allows users to isolate one or more regions on a RegionServer, providing better control over region management during operations.
- Impact: Improves operational flexibility and aids in maintenance tasks related to region management.
- Priority: Minor
- Link: JIRA Issue HBASE-28168
[HBASE-27314] - Make Index Block Customizable and Configurable.
- Description: This feature enables customization and configuration of the index block, allowing users to optimize index performance based on their use case.
- Impact: Increases flexibility in data retrieval and storage strategies.
- Priority: Major
- Link: JIRA Issue HBASE-27314
[HBASE-27104] - Add a Tool Command to List Unknown Servers
- Description: A new command has been added that allows users to list unknown servers, enhancing server management capabilities.
- Impact: Improves operational awareness and troubleshooting of unknown server instances.
- Priority: Major
- Link: JIRA Issue HBASE-27104
[HBASE-27129] - Add a Configuration that Allows Configuration of Region-Level Storage Policies
- Description: This update introduces configuration options for defining storage policies at the region level, allowing for better data management based on storage requirements.
- Impact: Enhances the granularity of data management strategies within HBase.
- Priority: Major
- Link: JIRA Issue HBASE-27129
[HBASE-27028] - Add a Shell Command for Flushing Master Local Region
- Description: A new shell command has been added to flush the master local region, providing additional tools for region management.
- Impact: Facilitates better maintenance operations within HBase.
- Priority: Minor
- Link: JIRA Issue HBASE-27028
[HBASE-26826] - Backport StoreFileTracker to Branch 2.5
- Description: Backported functionality related to StoreFileTracker to branch 2.5, enhancing operability and support for legacy systems.
- Impact: Improves support and performance for users on the 2.5 branch.
- Priority: Major
- Link: JIRA Issue HBASE-26826
[HBASE-26342] - Support Custom Paths of Independent Configuration and Pool for HFile Cleaner
- Description: This feature allows users to specify custom paths for independent configurations and pools for the HFile cleaner.
- Impact: Enhances configurability and management of HFile cleaning processes.
- Priority: Major
- Link: JIRA Issue HBASE-26342
[HBASE-27018] - Add Tool Command to List Live Servers
- Description: A command has been introduced to list live servers, improving monitoring and server management capabilities.
- Impact: Enhances visibility into the current state of the HBase cluster.
- Priority: Major
- Link: JIRA Issue HBASE-27018
[HBASE-26617] - Use Spotless to Reduce the Pain on Fixing Checkstyle Issues
- Description: Integration of Spotless has been introduced to help manage and automate code formatting, simplifying the process of fixing checkstyle issues.
- Impact: Enhances code quality and reduces maintenance overhead.
- Priority: Major
- Link: JIRA Issue HBASE-26617
[HBASE-26959] - Brotli Compression Support
- Description: Support for Brotli compression has been added, allowing users to utilize this efficient compression algorithm within HBase.
- Impact: Enhances data storage efficiency and potentially improves read/write performance.
- Priority: Minor
- Link: JIRA Issue HBASE-26959
[HBASE-25865] - Visualize Current State of Region Assignment
- Description: A new feature to visualize the current state of region assignment has been introduced, aiding in monitoring and management.
- Impact: Improves operational awareness and region management capabilities.
- Priority: Blocker
- Link: JIRA Issue HBASE-25865
[HBASE-26703] - Allow Configuration of IPC Queue Balancer
- Description: Users can now configure the IPC queue balancer, enhancing the management of inter-process communication within HBase.
- Impact: Improves the performance and stability of HBase communication.
- Priority: Minor
- Link: JIRA Issue HBASE-26703
[HBASE-26576] - Allow Pluggable Queue to Belong to Fast Path or Normal Balanced Executor
- Description: This update allows pluggable queues to be designated as belonging to either the fast path or normal balanced executor, providing more flexibility in queue management.
- Impact: Enhances the performance tuning capabilities of HBase.
- Priority: Minor
- Link: JIRA Issue HBASE-26576
[HBASE-26347] - Support Detecting and Excluding Slow DNS in Fan-Out of WAL
- Description: Enhancements have been made to detect and exclude slow DNS entries during the write-ahead log (WAL) fan-out process.
- Impact: Improves the reliability and performance of the WAL mechanism.
- Priority: Major
- Link: JIRA Issue HBASE-26347
[HBASE-26284] - Add HBase Thrift API to Get All Table Names Along with Whether It Is Enabled or Not
- Description: The HBase Thrift API has been enhanced to retrieve all table names, including their enabled status, improving API usability.
- Impact: Enhances the API's functionality and ease of use.
- Priority: Major
- Link: JIRA Issue HBASE-26284
[HBASE-26141] - Add Tracing Support for HTable and Sync Connection on Branch 2
- Description: Tracing support has been added for
HTableand sync connections, aiding in performance monitoring and troubleshooting. - Impact: Enhances observability within HBase operations.
- Priority: Major
- Link: JIRA Issue HBASE-26141
[HBASE-6908] - Pluggable Call Blocking Queue for HBase Server
- Description: A pluggable call blocking queue has been introduced for HBase Server, providing improved management of RPC calls.
- Impact: Enhances performance tuning capabilities and server responsiveness.
- Priority: Major
- Link: JIRA Issue HBASE-6908
[HBASE-25841] - Add Basic JShell Support
- Description: Basic support for JShell has been added, enhancing usability and interactivity with HBase.
- Impact: Improves the user experience for those using HBase in a shell environment.
- Priority: Minor
- Link: JIRA Issue HBASE-25841
[HBASE-25756] - Support Alternate Compression for Major and Minor Compactions
- Description: This feature adds support for using alternate compression methods during major and minor compactions.
- Impact: Enhances flexibility in data storage and performance optimization.
- Priority: Minor
- Link: JIRA Issue HBASE-25756
[HBASE-25751] - Add Writable Time to Purge Deletes to Scan Options
- Description: Users can now add a writable time to purge deletes in scan options, providing more control over data visibility.
- Impact: Improves data management capabilities.
- Priority: Major
- Link: JIRA Issue HBASE-25751
[HBASE-25665] - Disable Reverse DNS Lookup for SASL Kerberos Client Connection
- Description: The option to disable reverse DNS lookup for SASL Kerberos client connections has been added, improving security and connection reliability.
- Impact: Enhances performance and security in Kerberos environments.
- Priority: Major
- Link: JIRA Issue HBASE-25665
[HBASE-25587] - [HBCK2] Schedule SCP for All Unknown Servers
- Description: The HBCK2 tool now includes functionality to schedule SCP for all unknown servers, improving server recovery and management processes.
- Impact: Enhances operational capabilities within HBase.
- Priority: Major
- Link: JIRA Issue HBASE-25587
[HBASE-25460] - Expose Draining Servers as Cluster Metric
- Description: The Draining servers can now be exposed as a cluster metric, aiding in cluster monitoring and management.
- Impact: Improves operational awareness and helps manage cluster resources effectively.
- Priority: Major
- Link: JIRA Issue HBASE-25460
[HBASE-25496] - Add Get Namespace RSGroups Command
- Description: A new command has been introduced to retrieve namespace region server groups, enhancing administrative capabilities.
- Impact: Improves cluster management functionality.
- Priority: Major
- Link: JIRA Issue HBASE-25496
[HBASE-24620] - Add a Cluster Manager That Submits Commands to ZooKeeper and Its Agent That Picks and Executes Those Commands
- Description: This enhancement introduces a cluster manager for improved command submission and execution in ZooKeeper environments.
- Impact: Enhances cluster management and operational efficiency.
- Priority: Major
- Link: JIRA Issue HBASE-24620
[HBASE-22749] - Distributed MOB Compactions
- Description: Enhancements for distributed MOB (Mobile Object) compactions have been made, improving data management within HBase.
- Impact: Increases performance and efficiency of data storage and retrieval.
- Priority: Major
- Link: JIRA Issue HBASE-22749
Apache ZooKeeper: Summary of New Features
[ZOOKEEPER-3301] - Enforce the Quota Limit
- Description: This enhancement enforces a quota limit on ZooKeeper nodes, helping manage resource allocation and ensuring that clients do not exceed predefined limits.
- Impact: Improves the stability and predictability of ZooKeeper clusters by preventing resource overuse.
- Link: ZOOKEEPER-3301
[ZOOKEEPER-3601] - Introduce the Fault Injection Framework: Byteman for ZooKeeper
- Description: The integration of Byteman enables fault injection in ZooKeeper, facilitating more rigorous testing of failure scenarios and resilience.
- Impact: Enhances the ability to simulate failures and test the robustness of ZooKeeper deployments.
- Link: ZOOKEEPER-3601
[ZOOKEEPER-4211] - Expose Quota Metrics to Prometheus
- Description: Quota metrics are now exposed to Prometheus for monitoring and allowing users to track quota usage and performance.
- Impact: Improves observability and performance tracking within ZooKeeper environments.
- Link: ZOOKEEPER-4211
[ZOOKEEPER-1112] - Add Support for C Client for SASL Authentication
- Description: This feature adds support for SASL authentication in the C client of ZooKeeper, enhancing security and access control.
- Impact: Expands authentication capabilities for clients using the C programming language.
- Link: ZOOKEEPER-1112
[ZOOKEEPER-3264] - Benchmark Tools for ZooKeeper
- Description: New benchmarking tools have been introduced to measure ZooKeeper's performance and capacity under various conditions.
- Impact: Provides developers and administrators with valuable insights into ZooKeeper's performance metrics.
- Link: ZOOKEEPER-3264
[ZOOKEEPER-3681] - Add s390x Support for Travis Build
- Description: This update introduces support for building ZooKeeper on s390x architecture using Travis CI.
- Impact: Expands the build environment compatibility for ZooKeeper, enabling broader support for different hardware platforms.
- Link: ZOOKEEPER-3681
[ZOOKEEPER-3714] - Add (Cyrus) SASL Authentication Support to Perl Client
- Description: The Perl client for ZooKeeper now supports Cyrus SASL authentication, improving security features for Perl-based applications.
- Impact: Enhances security for applications interacting with ZooKeeper using Perl.
- Link: ZOOKEEPER-3714
[ZOOKEEPER-3874] - Official API to Start ZooKeeper Server from Java
- Description: An official API has been introduced to allow starting the ZooKeeper server directly from Java applications.
- Impact: Simplifies the integration of ZooKeeper with Java applications and frameworks.
- Link: ZOOKEEPER-3874
[ZOOKEEPER-3948] - Introduce a Deterministic Runtime Behavior Injection Framework for ZooKeeperServer Testing
- Description: This feature provides a framework to inject deterministic runtime behaviors, improving the testing of ZooKeeperServer under controlled conditions.
- Impact: Enhances the testing capabilities for developers, leading to more robust and reliable code.
- Link: ZOOKEEPER-3948
[ZOOKEEPER-3959] - Allow Multiple SuperUsers with SASL
- Description: This update allows the configuration of multiple super users in ZooKeeper when using SASL authentication.
- Impact: Enhances security management by providing greater flexibility in user permissions.
- Link: ZOOKEEPER-3959
[ZOOKEEPER-3969] - Add Whoami API and CLI Command
- Description: A new API and CLI command have been added to identify the current user, improving transparency in user actions.
- Impact: Facilitates better user management and auditing within ZooKeeper.
- Link: ZOOKEEPER-3969
[ZOOKEEPER-4030] - Optionally Canonicalize Host Names in Quorum SASL Authentication
- Description: This feature allows for the optional canonicalization of host names during SASL authentication in quorum configurations.
- Impact: Improves security and compatibility in distributed ZooKeeper deployments.
- Link: ZOOKEEPER-4030
[ZOOKEEPER-27] - Unique DB Identifiers for Servers and Clients
- Description: Unique identifiers for servers and clients have been introduced to improve tracking and management within ZooKeeper.
- Impact: Enhances clarity and reduces confusion when managing ZooKeeper instances.
- Link: ZOOKEEPER-27
[ZOOKEEPER-1260] - Audit Logging in ZooKeeper Servers
- Description: Audit logging has been added to ZooKeeper servers, allowing for better monitoring and compliance.
- Impact: Improves security and compliance monitoring capabilities in ZooKeeper.
- Link: ZOOKEEPER-1260
[ZOOKEEPER-1634] - A New Feature Proposal to ZooKeeper: Authentication Enforcement
- Description: A proposal has been made to enforce stricter authentication measures within ZooKeeper, improving overall security.
- Impact: Enhances security practices in ZooKeeper environments.
- Link: ZOOKEEPER-1634
[ZOOKEEPER-1703] - Please Add Instructions for Running the Tutorial
- Description: This update provides clearer instructions for running ZooKeeper tutorials, facilitating better onboarding for new users.
- Impact: Improves user experience and accessibility for newcomers to ZooKeeper.
- Link: ZOOKEEPER-1703
[ZOOKEEPER-1962] - Add a CLI Command to Recursively List a Znode and Children
- Description: A new CLI command has been introduced to list a znode and its children recursively, improving usability.
- Impact: Enhances the ease of use for managing znodes within ZooKeeper.
- Link: ZOOKEEPER-1962
[ZOOKEEPER-2875] - Add Ant Task for Running OWASP Dependency Report
- Description: An Ant task has been added to facilitate the generation of OWASP dependency reports, improving security management.
- Impact: Enhances the security posture of ZooKeeper by making vulnerability tracking easier.
- Link: ZOOKEEPER-2875
[ZOOKEEPER-2933] - Ability to Monitor the jute.maxBuffer Usage in Real-Time
- Description: This feature enables real-time monitoring of
jute.maxBufferusage, providing insights into buffer limits. - Impact: Improves performance tuning and resource management capabilities.
- Link: ZOOKEEPER-2933
[ZOOKEEPER-2994] - Tool Required to Recover Log and Snapshot Entries with CRC Errors
- Description: A new tool has been developed to recover log and snapshot entries that have CRC errors, enhancing data integrity.
- Impact: Improves data recovery options for ZooKeeper administrators.
- Link: ZOOKEEPER-2994
[ZOOKEEPER-3066] - Expose on JMX of Followers the ID of the Current Leader
- Description: The ID of the current leader is now exposed on JMX for followers, enhancing monitoring capabilities.
- Impact: Improves observability and management of leader-follower relationships in ZooKeeper.
- Link: ZOOKEEPER-3066
[ZOOKEEPER-3091] - Prometheus - Monitoring system and time series database Integration
- Description: Integration with Prometheus - Monitoring system & time series database has been added, allowing users to monitor ZooKeeper metrics effectively.
- Impact: Enhances observability and monitoring capabilities for ZooKeeper environments.
- Link: ZOOKEEPER-3091
[ZOOKEEPER-3092] - Pluggable Metrics System
- Description: A pluggable metrics system is introduced to allow users to define and use custom metrics in ZooKeeper.
- Impact: Provides flexibility in monitoring and enables integration with various monitoring systems.
- Link: ZOOKEEPER-3092
[ZOOKEEPER-3114] - Built-in Data Consistency Check
- Description: A built-in data consistency check feature is implemented to ensure data integrity within ZooKeeper.
- Impact: Improves reliability and trustworthiness of ZooKeeper data.
- Link: ZOOKEEPER-3114
[ZOOKEEPER-3137] - Utility to Truncate Logs to a zxid
- Description: A new utility has been added to truncate logs to a specified zxid (ZooKeeper Transaction ID).
- Impact: Aids in log management and reduces storage requirements.
- Link: ZOOKEEPER-3137
[ZOOKEEPER-3140] - Allow Followers to Host Observers
- Description: Followers can now host observer nodes, enhancing the flexibility of the ZooKeeper ensemble.
- Impact: Improves scalability and performance in ZooKeeper deployments.
- Link: ZOOKEEPER-3140
[ZOOKEEPER-3160] - Custom User SSLContext
- Description: This feature allows users to define a custom SSLContext for secure connections.
- Impact: Enhances security and customization of ZooKeeper communication.
- Link: ZOOKEEPER-3160
[ZOOKEEPER-3167] - API to Get Total Count of Recursive Sub Nodes
- Description: An API and corresponding CLI command are added to retrieve the total count of recursive sub-nodes under a specific path.
- Impact: Simplifies the management of hierarchical data in ZooKeeper.
- Link: ZOOKEEPER-3167
[ZOOKEEPER-3209] - New getEphemerals API
- Description: A new API to retrieve all ephemeral nodes created by a session is introduced.
- Impact: Enhances the usability and management of ephemeral nodes in ZooKeeper.
- Link: ZOOKEEPER-3209
[ZOOKEEPER-3244] - Option to Snapshot Based on Log Size
- Description: Users can now configure ZooKeeper to create snapshots based on log size.
- Impact: Improves snapshot management and efficiency.
- Link: ZOOKEEPER-3244
[ZOOKEEPER-3269] - Testable Facade with QueueEvent Method
- Description: A queueEvent() method is added to the testable facade, improving the testability of ZooKeeper components.
- Impact: Facilitates better unit testing practices for ZooKeeper developers.
- Link: ZOOKEEPER-3269
[ZOOKEEPER-3311] - Delay in Transaction Log Flush
- Description: Users can now allow a delay in the transaction log flush process.
- Impact: Enhances performance tuning capabilities for ZooKeeper deployments.
- Link: ZOOKEEPER-3311
[ZOOKEEPER-3331] - Automatic IP Authorization for Netty Connections
- Description: ZooKeeper now automatically adds IP authorization for Netty connections, enhancing security.
- Impact: Improves security measures for client connections to ZooKeeper.
- Link: ZOOKEEPER-3331
[ZOOKEEPER-3343] - New Documentation for ZooKeeper Tools
- Description: A new documentation file,
zookeeperTools.md, is introduced to provide guidance on ZooKeeper tools. - Impact: Enhances user experience by providing clear instructions on using ZooKeeper tools.
- Link: ZOOKEEPER-3343
[ZOOKEEPER-3344] - New Script for Snapshot Toolkit
- Description: A new script,
zkSnapShotToolkit.sh, is created to encapsulateSnapshotFormatterwith usage documentation. - Impact: Simplifies snapshot management for ZooKeeper users.
- Link: ZOOKEEPER-3344
[ZOOKEEPER-3371] - Port Unification for Admin Server
- Description: This update unifies the ports used for the admin server in ZooKeeper.
- Impact: Simplifies configuration and management of the ZooKeeper admin server.
- Link: ZOOKEEPER-3371
Apache NiFi: Summary of New Features
[NIFI-12231] - Add Completion Strategy to FetchSmb
- Description: This update introduces a completion strategy feature for the
FetchSmbprocessor, which enhances the control over how files are handled after they are fetched via the SMB protocol. Users can now define what happens to the files once the fetching process is completed. - Impact: Improves the flexibility and efficiency of file processing in NiFi workflows involving SMB file systems.
- Link: JIRA Issue NIFI-12231
[NIFI-13030] - Provide Endpoint for Comparing Different Versions of Registered Flows
- Description: An endpoint has been added that allows users to compare different versions of registered flows. This feature simplifies the management and understanding of changes across different versions of NiFi flows.
- Impact: Facilitates better version control and flow management within the NiFi environment.
- Link: JIRA Issue NIFI-13030
[NIFI-13304] - Add SplitExcel Processor
- Description: The new
SplitExcelprocessor enables users to split Excel files into multiple FlowFiles based on specified criteria, such as rows or sheets. This is especially useful for processing large Excel files within a NiFi flow. - Impact: Enhances NiFi’s capabilities in handling Excel data, making it more versatile in dealing with large datasets.
- Link: JIRA Issue NIFI-13304
[NIFI-11992] - Add PutZendeskTicket Processor and ZendeskRecordSink
- Description: This update introduces the
PutZendeskTicketprocessor andZendeskRecordSink, which allow NiFi to interact with Zendesk by creating and managing tickets. This improves integration with the Zendesk support platform. - Impact: Expands NiFi’s integration capabilities with third-party support systems like Zendesk.
- Link: JIRA Issue NIFI-11992
[NIFI-12241] - Add Processors Supporting Efficient Parquet Splitting
- Description: New processors have been added to support efficient splitting of Parquet files, which is critical for processing large datasets and improving performance in data pipelines.
- Impact: Enhances the performance and scalability of data processing pipelines that handle Parquet files.
- Link: JIRA Issue NIFI-12241
[NIFI-12382] - Add DatabaseTableSchemaRegistry Service
- Description: A new
DatabaseTableSchemaRegistryservice has been introduced, allowing NiFi to store and retrieve table schemas directly from databases. This service improves schema management and integration within NiFi dataflows. - Impact: Enhances schema management capabilities and database integration within NiFi.
- Link: JIRA Issue NIFI-12382
[NIFI-12386] - Add a FilterAttribute Processor
- Description: The
FilterAttributeprocessor allows users to filter FlowFiles based on their attributes. This provides more granular control over which FlowFiles are processed or discarded within a dataflow. - Impact: Increases the flexibility of data processing by enabling attribute-based filtering of FlowFiles.
- Link: JIRA Issue NIFI-12386
[NIFI-12639] - Backport JSON Schema Registry for ValidateJson
- Description: The JSON Schema Registry has been backported to work with the
ValidateJsonprocessor, enabling schema validation against a registry. This helps in ensuring JSON data consistency within NiFi flows. - Impact: Improves data quality by allowing JSON schema validation against a central registry.
- Link: JIRA Issue NIFI-12639
[NIFI-12614] - Create Record Reader Service for Protobuf Messages
- Description: A record reader service specifically for Protobuf messages has been added, allowing NiFi to efficiently read and process Protobuf data within its dataflows.
- Impact: Expands NiFi's support for different data formats, particularly for those using Protobuf.
- Link: JIRA Issue NIFI-12614
[NIFI-12960] - Support Reading Password-Protected Files in ExcelReader
- Description: The
ExcelReaderservice has been enhanced to support reading password-protected Excel files. This makes NiFi more versatile in handling secured documents within dataflows. - Impact: Increases the usability of NiFi in environments where data security is critical.
- Link: JIRA Issue NIFI-12960
[NIFI-12115] - Add ListenOTLP Processor for Collecting OpenTelemetry
- Description: The
ListenOTLPprocessor has been added to NiFi, enabling the collection of OpenTelemetry data directly into NiFi flows. This is crucial for monitoring and observability within data pipelines. - Impact: Enhances NiFi’s capabilities in observability and monitoring by integrating with OpenTelemetry.
- Link: JIRA Issue NIFI-12115
The EncryptContentAge and DecryptContentAge Processors Supporting GitHub - FiloSottile/age: A simple, modern and secure encryption tool (and Go library) with small explicit keys, no config options, and UNIX-style composability. Specification.
- Description: New
EncryptContentAgeandDecryptContentAgeprocessors have been introduced, supporting the GitHub - FiloSottile/age: A simple, modern and secure encryption tool (and Go library) with small explicit keys, no config options, and UNIX-style composability. specification. These processors provide modern and simple encryption for content within NiFi. - Impact: Improves the security of data processed within NiFi by supporting contemporary encryption standards.
[NIFI-11197] - Add YAML Record Reader
- Description: A new YAML Record Reader has been added, allowing NiFi to read and process YAML formatted data within its dataflows. This expands NiFi’s capabilities to handle different data formats.
- Impact: Enhances NiFi’s versatility in dealing with YAML data.
- Link: JIRA Issue NIFI-11197
PackageFlowFile Processor for Writing File Streams and Attributes as FlowFile Version 3
- Description: The
PackageFlowFileprocessor has been updated to support writing file streams and attributes as FlowFile Version 3, enhancing compatibility and functionality. - Impact: Improves the management and processing of FlowFiles within NiFi, especially for more complex workflows.
Migrated from H2 Database Engine to JetBrains Xodus for Storing Flow Configuration History
- Description: NiFi has migrated from the H2 database engine to JetBrains Xodus for storing flow configuration history. This change improves the performance and scalability of configuration storage within NiFi.
- Impact: Enhances the stability and scalability of flow configuration management in NiFi.
Apache Knox: Summary of New Features
[KNOX-2631] - KnoxSSO for Secure Shell Access
- Description: This feature introduces Single Sign-On (SSO) capabilities for Secure Shell (SSH) access, allowing users to authenticate once and gain secure access to shell environments without repeated logins.
- Impact: Enhances security and user experience by simplifying the authentication process for shell access.
- Link: KNOX-2631
[KNOX-2703] - Make Acceptable JWT Types Configurable
- Description: This enhancement allows administrators to configure acceptable JWT (JSON Web Token) types for authentication, providing more flexibility in managing security policies.
- Impact: Improves security customization and adaptability to various authentication scenarios.
- Link: KNOX-2703
[KNOX-2776] - Concurrent Session Limit for UIs
- Description: Introduces the ability to set concurrent session limits for user interfaces, enabling better control over resource usage and improving system stability during peak loads.
- Impact: Enhances resource management and system reliability by preventing excessive concurrent user sessions.
- Link: KNOX-2776
Apache Ranger: Summary of New Features
[RANGER-3828] - Fine-grained Access Control over Nested Structures
- Description: This enhancement provides fine-grained access control capabilities for nested structures, allowing for more precise permission management within data hierarchies.
- Impact: Improves data security by allowing administrators to enforce stricter access controls on complex data structures.
- Link: RANGER-3828
[RANGER-3852] - Performance and Scalability Analyzer Tool for Ranger
- Description: Introduces a new tool for analyzing the performance and scalability of Ranger, helping users identify bottlenecks and optimize their configurations.
- Impact: Enhances operational efficiency by providing insights into performance metrics and scalability options.
- Link: RANGER-3852
[RANGER-3855] - RangerExternalUserStoreRetriever Class
- Description: This feature adds the RangerExternalUserStoreRetriever class, facilitating the retrieval of external user store information for better integration and management.
- Impact: Improves user management by allowing Ranger to interact with external user stores more effectively.
- Link: RANGER-3855
[RANGER-3971] - Upgrade HBase Version to 2.4.6
- Description: Updates the HBase version in Ranger to 2.4.6, ensuring compatibility with the latest HBase features and improvements.
- Impact: Enhances functionality and stability by utilizing the latest enhancements in HBase.
- Link: RANGER-3971
[RANGER-4028] - Ranger - Upgrade Bootbox.js
- Description: Upgrades Bootbox.js library within Ranger, providing improvements in dialog management and user interface responsiveness.
- Impact: Enhances user experience with improved modal dialogs and interactions.
- Link: RANGER-4028
[RANGER-3815] - PolicyItem Supports Validity Period Setting
- Description: This feature allows administrators to set validity periods for policy items, enabling better control over access rights over time.
- Impact: Improves security by allowing temporary access rights to be enforced and automatically revoked after a specified period.
- Link: RANGER-3815
[RANGER-4025] - Ranger Improvement - Roles Import/Export API for Ranger Admin
- Description: Introduces an API for importing and exporting roles within Ranger admin, facilitating easier role management across different environments.
- Impact: Simplifies the migration and management of user roles in Ranger.
- Link: RANGER-4025
[RANGER-4047] - Ranger KMS Health Metrics
- Description: Adds health metrics for Ranger KMS (Key Management Service), providing insights into the operational health of the key management system.
- Impact: Enhances monitoring capabilities for administrators, ensuring better oversight of key management operations.
- Link: RANGER-4047
[RANGER-4221] - Enable File Sync Source for Ranger Usersync in Docker
- Description: This feature enables file synchronization as a source for Ranger UserSync in Docker environments, simplifying user management in containerized setups.
- Impact: Improves user synchronization capabilities in Docker deployments.
- Link: RANGER-4221
[RANGER-4230] - New REST APIs for Force Deletes of Users & Groups
- Description: Introduces new REST APIs to forcefully delete users and groups in Ranger, providing administrators with more control over user management.
- Impact: Enhances administrative control by enabling quick removal of users and groups when necessary.
- Link: RANGER-4230
[RANGER-4255] - Introduce an Option in Ranger to Control Retention Period of x_auth_ sess Table Data
- Description: This enhancement introduces an option to control the retention period of data in the
x_auth_sesstable, allowing for better data governance and compliance. - Impact: Improves the data management capabilities related to session data retention.
- Link: RANGER-4255
[RANGER-4303] - Plugin Memory Sizing
- Description: Adds functionality for sizing memory for Ranger plugins, enabling better resource allocation and performance tuning.
- Impact: Enhances the performance and stability of Ranger by allowing administrators to optimize memory usage for plugins.
- Link: RANGER-4303
Apache Druid: Summary of New Features
[#15049] - DDSketch
- Description: A new DDSketch extension is now available as a community contribution. The DDSketch extension (
druid-ddsketch) supports approximate quantile queries using the DDSketch library. - Impact: Enables efficient and accurate quantile queries in Druid, enhancing analytical capabilities.
- Link: DDSketch Extension
[#15340] - Spectator Histogram
- Description: A new histogram extension is introduced as a community contribution. The Spectator-based histogram extension (
druid-spectator-histogram) provides approximate histogram aggregators and percentile post-aggregators based on Spectator fixed-bucket histograms. - Impact: Improves histogram aggregation and percentile calculations in Druid, allowing for better data insights.
- Link: Spectator Histogram Extension
[#15755] - Delta Lake
- Description: A new Delta Lake extension is available as a community contribution. The Delta Lake extension (
druid-deltalake-extensions) allows users to ingest data stored in a Delta Lake table into Apache Druid. - Impact: Enhances data ingestion capabilities by enabling integration with Delta Lake, streamlining data workflows.
- Link: Delta Lake Extension
Apache Spark : Summary of New Features
[SPARK-45360] - Initialize Spark Session Builder Configuration from SPARK_REMOTE
- Description: This feature allows the initialization of the Spark session using configuration settings provided by the
SPARK_REMOTEkey, enabling dynamic setup of a remote Spark cluster's connection parameters at runtime. - Impact: Simplifies remote cluster connections.
- Link: SPARK-45360
[SPARK-47717] - Support Hive Tables as a Streaming Source and Sink
- Description: Adds support for using Hive tables as both a source and sink in Spark streaming, enabling native streaming of Hive data without intermediary batch jobs.
- Impact: Streamlines data processing workflows.
- Link: SPARK-47717
[SPARK-42423] - Add Metadata Column for File Block Start and Length
- Description: Introduces support for the metadata columns
_metadata.file_block_startand_metadata.file_block_lengthto enhance observability by including file block metadata. - Impact: Improves data observability and analysis.
- Link: SPARK-42423
[SPARK-44066] - Support Positional Parameters in Scala/Java sql()
- Description: Allows the use of positional parameters in SQL queries, aligning with SQL standards and JDBC/ODBC protocols.
- Impact: Enhances query flexibility.
- Link: SPARK-44066
[SPARK-43922] - Add Named Parameter Support in Parser for Function Calls
- Description: Supports named arguments in user-defined functions, built-in functions, and table-valued functions.
- Impact: Improves function call clarity and usability.
- Link: SPARK-43922
[SPARK-43071] - Support SELECT DEFAULT with ORDER BY, LIMIT, OFFSET for INSERT Source Relation
- Description: Extends column default support to allow the use of
ORDER BY,LIMIT, andOFFSETclauses in INSERT source relations. - Impact: Increases flexibility of INSERT operations.
- Link: SPARK-43071
- For example:
xxxxxxxxxxcreate table t1(i boolean, s bigint default 42) using parquet;insert into t1 values (true, 41), (false, default);create table t2(i boolean default true, s bigint default 42, t string default 'abc') using parquet;insert into t2 (i, s) select default, s from t1 order by s limit 1;select * from t2;> true, 41L, "abc"[SPARK-44503] - Add SQL Grammar for PARTITION BY and ORDER BY Clauses After Table Arguments for TVF Calls
- Description: Enhances SQL grammar by allowing
PARTITION BYandORDER BYclauses after table-valued function (TVF) arguments. - Impact: Improves SQL syntax and usability.
- Link: SPARK-44503
[SPARK-42123] - Include Column Default Values in DESCRIBE and SHOW CREATE TABLE Output
- Description: Displays column default values in
DESCRIBEandSHOW CREATE TABLEoutputs, improving schema transparency across V1 and V2 tables. - Impact: Enhances schema management.
- Link: SPARK-42123
[SPARK-43792] - Add Optional Pattern for Catalog.listCatalogs
- Description: Introduces an optional pattern argument for
Catalog.listCatalogs, allowing users to filter catalogs based on specific names. - Impact: Facilitates catalog management.
- Link: SPARK-43792
[SPARK-43881] - Add Optional Pattern for Catalog.listDatabases
- Description: Similar to
Catalog.listCatalogs, introduces a pattern option forCatalog.listDatabases, enabling better filtering of database names. - Impact: Improves database management.
- Link: SPARK-43881
[SPARK-44145] - Callback When Ready for Execution
- Description: Adds a callback feature that triggers when a Spark job is ready for execution, allowing developers to hook into the job lifecycle for custom actions.
- Impact: Enhances job lifecycle management.
- Link: SPARK-44145
[SPARK-42750] - Support Insert By Name Statement
- Description: Enables insertion of values into a table by specifying column names, enhancing the readability and robustness of
INSERT INTOstatements. - Impact: Improves insert operation clarity.
- Link: SPARK-42750
[SPARK-44131] - Add call_function for Scala API
- Description: Introduces a
call_functionAPI for Scala, simplifying the invocation of functions within Spark's SQL API. - Impact: Enhances usability of the Scala API.
- Link: SPARK-44131
[SPARK-43529] - Stable Derived Column Aliases
- Description: Ensures that column aliases in derived tables remain stable across Spark SQL transformations, improving the predictability of alias usage.
- Impact: Increases SQL query predictability.
- Link: SPARK-43529
[SPARK-43529] - Support General Constant Expressions as CREATE/REPLACE TABLE OPTIONS Values
- Description: Extends support for using general constant expressions in
CREATE/REPLACE TABLE OPTIONSclauses, providing more flexibility in defining table properties. - Impact: Enhances table property management.
- Link: SPARK-43529
[SPARK-36124] - Support Subqueries with Correlation Through INTERSECT/EXCEPT
- Description: Adds support for correlated subqueries within
INTERSECTandEXCEPTSQL statements, enhancing Spark SQL's ability to handle complex queries. - Impact: Expands query capabilities.
- Link: SPARK-36124
[SPARK-43205] - IDENTIFIER Clause
- Description: Introduces the
IDENTIFIERclause to SQL parsing, improving the handling of specific identifier-based use cases. - Impact: Enhances SQL parsing capabilities.
- Link: SPARK-43205
[SPARK-42427] - ANSI Mode: Conv Should Return an Error if Internal Conversion Overflows
- Description: In ANSI mode, the
convfunction will now throw an error if an internal conversion overflows, ensuring stricter compliance with ANSI SQL standards during data conversions. - Impact: Improves data conversion reliability.
- Link: SPARK-42427
Apache Livy: Summary of New Features
[LIVY-423] - Adding Scala 2.12 Support
Description: This update introduces support for Scala 2.12 in Livy, allowing users to run Scala applications with the newer Scala version. Impact: Enhances compatibility and enables the use of Scala 2.12 features in Livy applications. Link: LIVY-423
[LIVY-970] - Ordering and Pagination Support in GET /statement Request
Description: The Livy API now includes support for ordering and pagination in the GET /statement request, improving the usability of the API for retrieving statements.
Impact: Facilitates better management and retrieval of statements in Livy, making it easier to navigate through large sets of data. Link: LIVY-970
[LIVY-969] - New Docker-Based Integration and Debugging Environments
Description: This feature introduces a Docker-based integration and debugging environment for Livy, simplifying the setup process and improving the development experience. Impact: Streamlines the development and testing processes for Livy applications, enhancing productivity for developers. Link: LIVY-969
Apache Impala: Summary of New Features
| JIRA | Description |
|---|---|
| IMPALA-3268 | Added the command "SHOW VIEWS". |
| IMPALA-5323 | Supports Kudu BINARY. |
| IMPALA-9118 | Added the debug page for in-flight DDLs in catalogd. |
| IMPALA-11495 | Added the glibc version and effective locale to the Web UI. |
| IMPALA-12156 | Supports the Impala Statestore HA. |
| IMPALA-12229 | Supports the soft-deleted tables for Kudu. |
| IMPALA-12243 | Added the support for DROP PARTITION for Iceberg tables. |
| IMPALA-12313 | Added the support for UPDATE statements for Iceberg tables |
| IMPALA-12426 | This release provides the SQL Interface to Completed Queries/DDLs/DMLs |
| IMPALA-12486 | Added the catalog metrics for ParallelFileMetadataLoader. |
| IMPALA-12965 | Added the debug query option to skip runtime filter. |
Apache Zeppelin: Summary of New Features
Apache Zeppelin 0.11.2 introduces several new features and enhancements that improve performance and usability. Some of the highlights include:
- Enhanced Interpreter Management: The ability to download interpreters dynamically at runtime, reducing the package size and allowing for more flexible deployment options.
- Updated Interpreter Support: Updated support for newer versions of Apache Spark and Flink. Specifically, the Docker image is pre-configured to support Spark 3.5.1 and Flink 1.19.1.
- Docker Improvements: Easier containerized deployment of Zeppelin through Docker, including support for persisting logs and notebooks via Docker volume options, making it simpler to run Zeppelin in isolated environments.
- Security Enhancements: New security improvements for role-based access control and interpreter process isolation to safeguard multi-user environments.