NiFi Upgrade Toolkit: High-Level Design for Upgrading ODP NiFi 1.x to 2.7.2

1. Problem and Scope

Upgrading NiFi 1.x to 2.x involves more than replacing a binary. The flow file, configuration, authorizers, controller service wiring, scripted processors, and Kerberos integration all have breaking changes. Without a structured approach, operators face:

  • Silent failures at startup (unreadable sensitive props, missing CS, removed types)
  • No safe rollback path if something goes wrong mid-upgrade
  • No confirmation that NiFi 2 is actually healthy after restart

This toolkit addresses those three failure modes with four automation stages. It deliberately does not rewrite flow.json.gz — flow editing belongs to the NiFi UI and Registry, not an offline script.

  • Automated flow file rewriting or bulk processor replacement
  • Multi-node rolling-cluster orchestration (documented as node-by-node; future release)
  • Custom NAR rebuild or ExecuteScript port
  • NiFi 2 platform features (flow analysis rules, Python processors)

2. System Overview

System Diagram
Copy

Each stage is a standalone script. They share nothing at runtime; Stage 2a's archive is the only cross-stage artifact (the installer requires it for rollback).

3. Components

Stage 1 - Pre-upgrade Check

ComponentRole
pre_upgrade_check.shShell wrapper: discovers NiFi home, resolves paths, dispatches to the Python engine. Optionally invokes Registry pre-check.
pre_upgrade_check.pyPython engine: parses the flow file, runs 17 checks, writes JSON + TXT reports.
registry_pre_upgrade_check.shThin wrapper for Registry-only invocation.
registry_pre_upgrade_check.py5 checks against Registry 1.x config files.
resources/*.csvRule data: deprecated types, Kerberos mappings, bundle renames, removed script engines, connector CS requirements

Key Design Properties

  • The shell wrapper owns all host-side discovery (process scan, split-conf layout, common install paths). The Python engine receives resolved absolute paths and never rediscovers them.
  • The flow is parsed once. All component-level checks populate a shared CheckContext in a single pass; check functions only format output. This keeps runtime linear regardless of check count.
  • Three parsing strategies are selected automatically: XML streaming (lowest memory, no deps), ijson streaming (low memory, requires ijson), full JSON load (fallback). All produce identical output.
  • Rules live in CSV files. Adding a new deprecated component type requires no code change.

3.2 Stage 2a — Backup Helper

upgrade_pre_activity_nifi.sh creates a timestamped .tar.gz of conf/, the flow file, H2 database, embedded ZooKeeper, and all content and provenance repository paths (resolved from nifi.properties).

This archive is the rollback source for Stage 2b. The installer checks for its existence as a hard preflight; no archive means no install.

Default mode is dry-run. --execute creates the archive.

3.3 Stage 2b — Tarball Installer

upgrade_nifi.sh performs the binary swap on a single host. It has no runtime dependencies beyond standard system tools (wget, tar, sha256sum, systemctl).

Key Design Properties

  • sha256 gate before any mutation. Tarball and checksum file are downloaded and verified before services are stopped. A corrupted or partial download fails with zero side-effects.
  • Rollback by trap. A trap rollback ERR is armed at the moment services stop and disarmed only after the new services are confirmed listening. Any failure in between — extract, conf merge, service start, port wait — triggers the rollback: symlinks revert, the Stage 2a archive is extracted over the filesystem, old services restart.
  • Atomic cutover via symlink. New binaries are extracted into a versioned directory. The current/ symlink is flipped as a single atomic operation (ln -sfn). A failed extract never touches the live directory.
  • OS dispatch only where necessary. RHEL 8/9 and Debian/Ubuntu follow the same extract → merge → symlink → start path. The only per-OS branch is service-user creation (useradd vs adduser).
  • All paths are configurable flags. The defaults match ODP standard layouts; non-standard installs override them.

3.4 Stage 3 — Post-upgrade Check

ComponentRole
post_upgrade_check.shUnified wrapper: runs NiFi REST checks, then Registry checks if --registry-base-url is given. Aggregate exit = max(NIFI_EXIT, REG_EXIT).
post_upgrade_check.py6 REST checks against the NiFi 2 API: version, invalid processors, invalid controller services, cluster node health, system diagnostics, error bulletins.
registry_post_upgrade_check.py3 REST checks against Registry 2: version (/about), bucket reachability (/buckets), auth config reachability (/access/config).

Auth-gated endpoints responding with 401/403 are treated as WARN (endpoint alive) rather than ERROR when no credentials are supplied. This avoids false failures in environments where auth is enforced.

4. Stage Interactions and Data Flow

Bash
Copy

5. Failure and Safety Properties

ScenarioBehavior
Corrupted or partial tarball downloadsha256sum -c fails; script aborts before any service is stopped
Extract fails mid-waytrap ERR fires; symlinks revert; backup archive restores conf/; old services restart
NiFi 2 fails to start (port timeout)Same trap path — old service comes back
No Stage 2a backup presentInstaller hard-fails at preflight; nothing runs
Registry not installed--skip-registry (auto-set if --nifi-registry-home is empty); installer proceeds NiFi-only
Clustered node detectedWarning printed; operator proceeds node-by-node; no blocking

6. Extensibility

The toolkit is designed so that most rule changes require no code change:

ChangeHow
New deprecated component typeAdd row to NiFi2-deprecated-components.csv
New Kerberos-affected componentAdd row to NiFi1-components-with-legacy-kerberos.csv
New bundle coordinate renameAdd row to NiFi2-bundle-coordinate-changes.csv
New connector requiring a CSAdd row to NiFi2-connector-cs-requirements.csv
EL functions removed in a future upgradeAdd NiFi2-removed-el-functions.csv; Check 11 activates automatically
Support a different NiFi version pairRefresh CSVs; update default tarball URL

Adding a new component-level check that fits the single-pass model requires adding a case to apply component_checks() and a formatting function — the flow traversal and context infrastructure are already in place.

7. Deployment Model

The toolkit ships as a .tar.gz containing shell scripts, Python scripts, and CSV files. No compilation step, no installed agent, no persistent daemon. Operators extract it on the NiFi host, run the four stages in order, and discard it.

Runtime requirements: Python 3.6+, bash, wget, tar, sha256sum, systemctl. The ijson Python package is optional (improves memory efficiency for large flows); install_deps.sh installs it.

The scripts run on the NiFi host itself — no remote API calls to any management plane other than the Acceldata tarball repository in Stage 2b

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
  Last updated