Title
Create new category
Edit page index title
Edit category
Edit link
NiFi Upgrade Toolkit: High-Level Design for Upgrading ODP NiFi 1.x to 2.7.2
1. Problem and Scope
Upgrading NiFi 1.x to 2.x involves more than replacing a binary. The flow file, configuration, authorizers, controller service wiring, scripted processors, and Kerberos integration all have breaking changes. Without a structured approach, operators face:
- Silent failures at startup (unreadable sensitive props, missing CS, removed types)
- No safe rollback path if something goes wrong mid-upgrade
- No confirmation that NiFi 2 is actually healthy after restart
This toolkit addresses those three failure modes with four automation stages. It deliberately does not rewrite flow.json.gz — flow editing belongs to the NiFi UI and Registry, not an offline script.
- Automated flow file rewriting or bulk processor replacement
- Multi-node rolling-cluster orchestration (documented as node-by-node; future release)
- Custom NAR rebuild or ExecuteScript port
- NiFi 2 platform features (flow analysis rules, Python processors)
2. System Overview
xxxxxxxxxxflow.json.gz nifi.properties authorizers.xml lib/ nifi-upgrade-toolkit/ Stage 1 pre_upgrade_check CSV rulesets issues report Stage 2a backup helper backup.tar.gz Stage 2b tarball installer wget + sha256 extract + flip Stage 3 post_upgrade_check REST API health reportEach stage is a standalone script. They share nothing at runtime; Stage 2a's archive is the only cross-stage artifact (the installer requires it for rollback).
3. Components
Stage 1 - Pre-upgrade Check
| Component | Role |
|---|---|
pre_upgrade_check.sh | Shell wrapper: discovers NiFi home, resolves paths, dispatches to the Python engine. Optionally invokes Registry pre-check. |
pre_upgrade_check.py | Python engine: parses the flow file, runs 17 checks, writes JSON + TXT reports. |
registry_pre_upgrade_check.sh | Thin wrapper for Registry-only invocation. |
registry_pre_upgrade_check.py | 5 checks against Registry 1.x config files. |
resources/*.csv | Rule data: deprecated types, Kerberos mappings, bundle renames, removed script engines, connector CS requirements |
Key Design Properties
- The shell wrapper owns all host-side discovery (process scan, split-conf layout, common install paths). The Python engine receives resolved absolute paths and never rediscovers them.
- The flow is parsed once. All component-level checks populate a shared CheckContext in a single pass; check functions only format output. This keeps runtime linear regardless of check count.
- Three parsing strategies are selected automatically: XML streaming (lowest memory, no deps), ijson streaming (low memory, requires ijson), full JSON load (fallback). All produce identical output.
- Rules live in CSV files. Adding a new deprecated component type requires no code change.
3.2 Stage 2a — Backup Helper
upgrade_pre_activity_nifi.sh creates a timestamped .tar.gz of conf/, the flow file, H2 database, embedded ZooKeeper, and all content and provenance repository paths (resolved from nifi.properties).
This archive is the rollback source for Stage 2b. The installer checks for its existence as a hard preflight; no archive means no install.
Default mode is dry-run. --execute creates the archive.
3.3 Stage 2b — Tarball Installer
upgrade_nifi.sh performs the binary swap on a single host. It has no runtime dependencies beyond standard system tools (wget, tar, sha256sum, systemctl).
Key Design Properties
- sha256 gate before any mutation. Tarball and checksum file are downloaded and verified before services are stopped. A corrupted or partial download fails with zero side-effects.
- Rollback by trap. A trap rollback ERR is armed at the moment services stop and disarmed only after the new services are confirmed listening. Any failure in between — extract, conf merge, service start, port wait — triggers the rollback: symlinks revert, the Stage 2a archive is extracted over the filesystem, old services restart.
- Atomic cutover via symlink. New binaries are extracted into a versioned directory. The current/ symlink is flipped as a single atomic operation (ln -sfn). A failed extract never touches the live directory.
- OS dispatch only where necessary. RHEL 8/9 and Debian/Ubuntu follow the same extract → merge → symlink → start path. The only per-OS branch is service-user creation (useradd vs adduser).
- All paths are configurable flags. The defaults match ODP standard layouts; non-standard installs override them.
3.4 Stage 3 — Post-upgrade Check
| Component | Role |
|---|---|
post_upgrade_check.sh | Unified wrapper: runs NiFi REST checks, then Registry checks if --registry-base-url is given. Aggregate exit = max(NIFI_EXIT, REG_EXIT). |
post_upgrade_check.py | 6 REST checks against the NiFi 2 API: version, invalid processors, invalid controller services, cluster node health, system diagnostics, error bulletins. |
registry_post_upgrade_check.py | 3 REST checks against Registry 2: version (/about), bucket reachability (/buckets), auth config reachability (/access/config). |
Auth-gated endpoints responding with 401/403 are treated as WARN (endpoint alive) rather than ERROR when no credentials are supplied. This avoids false failures in environments where auth is enforced.
4. Stage Interactions and Data Flow
xxxxxxxxxxStage 1: pre_upgrade_check.sh reads: flow.json.gz, nifi.properties, bootstrap.conf, authorizers.xml, lib/*.nar, resources/*.csv writes: nifi-pre-upgrade-report-<ts>.json / .txt exit 0 safe to proceed Stage 2a: upgrade_pre_activity_nifi.sh --execute reads: nifi.properties (for repo paths) writes: /var/tmp/nifi-upgrade-backups/nifi-pre-upgrade-backup-<ts>.tar.gz Stage 2b: upgrade_nifi.sh reads: backup archive (preflight), /etc/os-release downloads: nifi-2.7.2-odp-bin.tar.gz + .sha256 (Acceldata repo) mutates: /usr/odp/<version>/nifi (extract) /usr/odp/current/nifi (symlink flip) conf/ (merge from backup) bootstrap.conf (java.home rewrite) on failure: full restore from backup archive, old services restart exit 0 NiFi 2 responding on port Stage 3: post_upgrade_check.sh reads: NiFi 2 / Registry 2 REST API (live) writes: stdout (pass/warn/fail per check) exit 0 upgrade confirmed healthy5. Failure and Safety Properties
| Scenario | Behavior |
|---|---|
| Corrupted or partial tarball download | sha256sum -c fails; script aborts before any service is stopped |
| Extract fails mid-way | trap ERR fires; symlinks revert; backup archive restores conf/; old services restart |
| NiFi 2 fails to start (port timeout) | Same trap path — old service comes back |
| No Stage 2a backup present | Installer hard-fails at preflight; nothing runs |
| Registry not installed | --skip-registry (auto-set if --nifi-registry-home is empty); installer proceeds NiFi-only |
| Clustered node detected | Warning printed; operator proceeds node-by-node; no blocking |
6. Extensibility
The toolkit is designed so that most rule changes require no code change:
| Change | How |
|---|---|
| New deprecated component type | Add row to NiFi2-deprecated-components.csv |
| New Kerberos-affected component | Add row to NiFi1-components-with-legacy-kerberos.csv |
| New bundle coordinate rename | Add row to NiFi2-bundle-coordinate-changes.csv |
| New connector requiring a CS | Add row to NiFi2-connector-cs-requirements.csv |
| EL functions removed in a future upgrade | Add NiFi2-removed-el-functions.csv; Check 11 activates automatically |
| Support a different NiFi version pair | Refresh CSVs; update default tarball URL |
Adding a new component-level check that fits the single-pass model requires adding a case to apply component_checks() and a formatting function — the flow traversal and context infrastructure are already in place.
7. Deployment Model
The toolkit ships as a .tar.gz containing shell scripts, Python scripts, and CSV files. No compilation step, no installed agent, no persistent daemon. Operators extract it on the NiFi host, run the four stages in order, and discard it.
Runtime requirements: Python 3.6+, bash, wget, tar, sha256sum, systemctl. The ijson Python package is optional (improves memory efficiency for large flows); install_deps.sh installs it.
The scripts run on the NiFi host itself — no remote API calls to any management plane other than the Acceldata tarball repository in Stage 2b