Check Kudu Cluster Health with ksck
Kudu provides the ksck tool to check cluster health and gather detailed information. It can detect issues such as under-replicated tablets, unreachable tablet servers, or tablets without a leader.
Run ksck to Check the Cluster Health
- Run the command as the
kudu
user. - To see all available commands and options, use:
ksck --help
- If the cluster is healthy, ksck prints cluster information, a success message, and returns 0.
- If there are errors, ksck returns a non-zero status code.
Example: ksck Output

If an error occurs—for example, when a tablet server is down—ksck
returns a non-zero status code and displays output similar to the following:
x
Master Summary
UUID | Address | Status
----------------------------------+--------------------------+---------
0382cc3a47ba4dcf84abb15fe4740775 | kudu2.acceldata.dvl:7051 | HEALTHY
41bd768443104bcbad3b9c28a6485208 | kudu1.acceldata.dvl:7051 | HEALTHY
Flags of checked categories for Master:
Flag | Value | Master
---------------------+-------------------------------------------------------------+-------------------------
builtin_ntp_servers | 0.pool.ntp.org,1.pool.ntp.org,2.pool.ntp.org,3.pool.ntp.org | all 2 server(s) checked
time_source | system | all 2 server(s) checked
Tablet Server Summary
UUID | Address | Status | Location | Tablet Leaders | Active Scanners
----------------------------------+--------------------------+-------------+----------+----------------+-----------------
203f859ed1f14e729e9ef223d8af3bdc | kudu3.acceldata.dvl:7050 | HEALTHY | <none> | 3 | 0
a159de83d21b40efae306c419e04719d | kudu2.acceldata.dvl:7050 | HEALTHY | <none> | 0 | 0
84e1942cf050461d8c14552c4fbd0431 | kudu1.acceldata.dvl:7050 | UNAVAILABLE | <none> | n/a | n/a
Tablet Server Location Summary
Location | Count
----------+---------
<none> | 3
Error from kudu1.acceldata.dvl:7050: Network error: could not get status from server: Client connection negotiation failed: client connection to 10.100.11.22:7050: connect: Connection refused (error 111) (UNAVAILABLE)
Flags of checked categories for Tablet Server:
Flag | Value | Tablet Server
---------------------+-------------------------------------------------------------+----------------------------------------------------
builtin_ntp_servers | 0.pool.ntp.org,1.pool.ntp.org,2.pool.ntp.org,3.pool.ntp.org | kudu3.acceldata.dvl:7050, kudu2.acceldata.dvl:7050
time_source | system | kudu3.acceldata.dvl:7050, kudu2.acceldata.dvl:7050
Version Summary
Version | Servers
------------------+---------------------------------------------------------------------------------------------------------------------------
1.17.0.3.3.6.1-1 | master@kudu1.acceldata.dvl:7051, master@kudu2.acceldata.dvl:7051, tserver@kudu3.acceldata.dvl:7050, and 1 other server(s)
Tablet Summary
Tablet 952c5219eac34b268cf619d507b6cb4b of table 'kudu_test_cases' is under-replicated: 1 replica(s) not RUNNING
203f859ed1f14e729e9ef223d8af3bdc (kudu3.acceldata.dvl:7050): RUNNING [LEADER]
84e1942cf050461d8c14552c4fbd0431 (kudu1.acceldata.dvl:7050): TS unavailable
a159de83d21b40efae306c419e04719d (kudu2.acceldata.dvl:7050): RUNNING
All reported replicas are:
A = 203f859ed1f14e729e9ef223d8af3bdc
B = 84e1942cf050461d8c14552c4fbd0431
C = a159de83d21b40efae306c419e04719d
The consensus matrix is:
Config source | Replicas | Current term | Config index | Committed?
---------------+------------------------+--------------+--------------+------------
master | A* B C | | | Yes
A | A* B C | 2 | -1 | Yes
A | A* B C | 2 | -1 | Yes
B | [config not available] | | |
C | A* B C | 2 | -1 | Yes
Tablet f762102c567f4edeaf9458a79b6c5916 of table 'kudu_test_cases' is under-replicated: 1 replica(s) not RUNNING
203f859ed1f14e729e9ef223d8af3bdc (kudu3.acceldata.dvl:7050): RUNNING [LEADER]
a159de83d21b40efae306c419e04719d (kudu2.acceldata.dvl:7050): RUNNING
84e1942cf050461d8c14552c4fbd0431 (kudu1.acceldata.dvl:7050): TS unavailable
All reported replicas are:
A = 203f859ed1f14e729e9ef223d8af3bdc
B = a159de83d21b40efae306c419e04719d
C = 84e1942cf050461d8c14552c4fbd0431
The consensus matrix is:
Config source | Replicas | Current term | Config index | Committed?
---------------+------------------------+--------------+--------------+------------
master | A* B C | | | Yes
A | A* B C | 2 | -1 | Yes
B | A* B C | 2 | -1 | Yes
C | [config not available] | | |
The cluster doesn't have any matching system tables
Summary by table
Name | RF | Status | Total Tablets | Healthy | Recovering | Under-replicated | Unavailable
-----------------+----+------------------+---------------+---------+------------+------------------+-------------
kudu_test_cases | 3 | UNDER_REPLICATED | 3 | 0 | 0 | 3 | 0
Tablet Replica Count Summary
Statistic | Replica Count
----------------+---------------
Minimum | 3
First Quartile | 3
Median | 3
Third Quartile | 3
Maximum | 3
Total Count Summary
| Total Count
----------------+-------------
Masters | 2
Tablet Servers | 3
Tables | 1
Tablets | 3
Replicas | 9
==================
Warnings:
==================
tserver unusual flags check error: 1 of 3 tservers were not available to retrieve unusual flags
tserver diverged flags check error: 1 of 3 tservers were not available to retrieve time_source category flags
==================
Errors:
==================
Network error: error fetching info from tablet servers: failed to gather info for all tablet servers: 1 of 3 had errors
Corruption: table consistency check error: 1 out of 1 table(s) are not healthy
FAILED
Runtime error: ksck discovered errors
Verify Cluster Data Consistency Using ksck
You can use the --checksum_scan
option to verify data consistency across the cluster. This option scans the tablets and compares their data.
- Use
--tables
to limit the scan to specific tables. - Use
--tablets
to limit the scan to specific tablets.
--tablets
option refers to tablet IDs, not tablet servers.
To retrieve tablet IDs, run:
kudu cluster ksck --checksum_scan --tables kudu_test_cases,kudu_test_cases2 kudu1.acceldata.dvl:7051,kudu2.acceldata.dvl:7051
...
Checksum Summary
-----------------------
kudu_test_cases
-----------------------
T 2e93daf36d6444ccb47467030dced13d P 203f859ed1f14e729e9ef223d8af3bdc (kudu3.acceldata.dvl:7050): Checksum: 71467469766368
T 2e93daf36d6444ccb47467030dced13d P 84e1942cf050461d8c14552c4fbd0431 (kudu1.acceldata.dvl:7050): Checksum: 71467469766368
T 2e93daf36d6444ccb47467030dced13d P a159de83d21b40efae306c419e04719d (kudu2.acceldata.dvl:7050): Checksum: 71467469766368
T 31b8c9032cb24f33a3c87bf480c8dc9a P 203f859ed1f14e729e9ef223d8af3bdc (kudu3.acceldata.dvl:7050): Checksum: 71709626373140
T 31b8c9032cb24f33a3c87bf480c8dc9a P 84e1942cf050461d8c14552c4fbd0431 (kudu1.acceldata.dvl:7050): Checksum: 71709626373140
T 31b8c9032cb24f33a3c87bf480c8dc9a P a159de83d21b40efae306c419e04719d (kudu2.acceldata.dvl:7050): Checksum: 71709626373140
T a05f3b7a68e64ed69880ba18827c9089 P 203f859ed1f14e729e9ef223d8af3bdc (kudu3.acceldata.dvl:7050): Checksum: 71767527903577
T a05f3b7a68e64ed69880ba18827c9089 P 84e1942cf050461d8c14552c4fbd0431 (kudu1.acceldata.dvl:7050): Checksum: 71767527903577
T a05f3b7a68e64ed69880ba18827c9089 P a159de83d21b40efae306c419e04719d (kudu2.acceldata.dvl:7050): Checksum: 71767527903577
-----------------------
kudu_test_cases2
-----------------------
T 1bfed8820f044fd186a4ba8a05599c8c P 203f859ed1f14e729e9ef223d8af3bdc (kudu3.acceldata.dvl:7050): Checksum: 71709626373140
T 1bfed8820f044fd186a4ba8a05599c8c P 84e1942cf050461d8c14552c4fbd0431 (kudu1.acceldata.dvl:7050): Checksum: 71709626373140
T 1bfed8820f044fd186a4ba8a05599c8c P a159de83d21b40efae306c419e04719d (kudu2.acceldata.dvl:7050): Checksum: 71709626373140
T 2e7bfb4cb69548a3b8662d1837ca463e P 203f859ed1f14e729e9ef223d8af3bdc (kudu3.acceldata.dvl:7050): Checksum: 71767527903577
T 2e7bfb4cb69548a3b8662d1837ca463e P 84e1942cf050461d8c14552c4fbd0431 (kudu1.acceldata.dvl:7050): Checksum: 71767527903577
T 2e7bfb4cb69548a3b8662d1837ca463e P a159de83d21b40efae306c419e04719d (kudu2.acceldata.dvl:7050): Checksum: 71767527903577
T bda2d804309c48aea0d690de9099c141 P 203f859ed1f14e729e9ef223d8af3bdc (kudu3.acceldata.dvl:7050): Checksum: 71467469766368
T bda2d804309c48aea0d690de9099c141 P 84e1942cf050461d8c14552c4fbd0431 (kudu1.acceldata.dvl:7050): Checksum: 71467469766368
T bda2d804309c48aea0d690de9099c141 P a159de83d21b40efae306c419e04719d (kudu2.acceldata.dvl:7050): Checksum: 71467469766368
#Here we take the id of two of the tablets from kudu_test_cases:
kudu cluster ksck --checksum_scan --tablets 2e93daf36d6444ccb47467030dced13d,31b8c9032cb24f33a3c87bf480c8dc9a kudu1.acceldata.dvl,kudu2.acceldata.dvl
The first column after the T is the ID of each tablet.
For more details, see Apache Documentation.
Was this page helpful?