Kudu provides the ksck tool to check cluster health and gather detailed information. It can detect issues such as under-replicated tablets, unreachable tablet servers, or tablets without a leader.
Run ksck to Check the Cluster Health
- Run the command as the
kuduuser. - To see all available commands and options, use:
ksck --help- If the cluster is healthy, ksck prints cluster information, a success message, and returns 0.
- If there are errors, ksck returns a non-zero status code.
Example: ksck Output

If an error occurs—for example, when a tablet server is down—ksck returns a non-zero status code and displays output similar to the following:
x
Master Summary UUID | Address | Status----------------------------------+--------------------------+--------- 0382cc3a47ba4dcf84abb15fe4740775 | kudu2.acceldata.dvl:7051 | HEALTHY 41bd768443104bcbad3b9c28a6485208 | kudu1.acceldata.dvl:7051 | HEALTHYFlags of checked categories for Master: Flag | Value | Master---------------------+-------------------------------------------------------------+------------------------- builtin_ntp_servers | 0.pool.ntp.org,1.pool.ntp.org,2.pool.ntp.org,3.pool.ntp.org | all 2 server(s) checked time_source | system | all 2 server(s) checkedTablet Server Summary UUID | Address | Status | Location | Tablet Leaders | Active Scanners----------------------------------+--------------------------+-------------+----------+----------------+----------------- 203f859ed1f14e729e9ef223d8af3bdc | kudu3.acceldata.dvl:7050 | HEALTHY | <none> | 3 | 0 a159de83d21b40efae306c419e04719d | kudu2.acceldata.dvl:7050 | HEALTHY | <none> | 0 | 0 84e1942cf050461d8c14552c4fbd0431 | kudu1.acceldata.dvl:7050 | UNAVAILABLE | <none> | n/a | n/aTablet Server Location Summary Location | Count----------+--------- <none> | 3Error from kudu1.acceldata.dvl:7050: Network error: could not get status from server: Client connection negotiation failed: client connection to 10.100.11.22:7050: connect: Connection refused (error 111) (UNAVAILABLE)Flags of checked categories for Tablet Server: Flag | Value | Tablet Server---------------------+-------------------------------------------------------------+---------------------------------------------------- builtin_ntp_servers | 0.pool.ntp.org,1.pool.ntp.org,2.pool.ntp.org,3.pool.ntp.org | kudu3.acceldata.dvl:7050, kudu2.acceldata.dvl:7050 time_source | system | kudu3.acceldata.dvl:7050, kudu2.acceldata.dvl:7050Version Summary Version | Servers------------------+--------------------------------------------------------------------------------------------------------------------------- 1.17.0.3.3.6.1-1 | master@kudu1.acceldata.dvl:7051, master@kudu2.acceldata.dvl:7051, tserver@kudu3.acceldata.dvl:7050, and 1 other server(s)Tablet SummaryTablet 952c5219eac34b268cf619d507b6cb4b of table 'kudu_test_cases' is under-replicated: 1 replica(s) not RUNNING 203f859ed1f14e729e9ef223d8af3bdc (kudu3.acceldata.dvl:7050): RUNNING [LEADER] 84e1942cf050461d8c14552c4fbd0431 (kudu1.acceldata.dvl:7050): TS unavailable a159de83d21b40efae306c419e04719d (kudu2.acceldata.dvl:7050): RUNNINGAll reported replicas are: A = 203f859ed1f14e729e9ef223d8af3bdc B = 84e1942cf050461d8c14552c4fbd0431 C = a159de83d21b40efae306c419e04719dThe consensus matrix is: Config source | Replicas | Current term | Config index | Committed?---------------+------------------------+--------------+--------------+------------ master | A* B C | | | Yes A | A* B C | 2 | -1 | Yes A | A* B C | 2 | -1 | Yes B | [config not available] | | | C | A* B C | 2 | -1 | YesTablet f762102c567f4edeaf9458a79b6c5916 of table 'kudu_test_cases' is under-replicated: 1 replica(s) not RUNNING 203f859ed1f14e729e9ef223d8af3bdc (kudu3.acceldata.dvl:7050): RUNNING [LEADER] a159de83d21b40efae306c419e04719d (kudu2.acceldata.dvl:7050): RUNNING 84e1942cf050461d8c14552c4fbd0431 (kudu1.acceldata.dvl:7050): TS unavailableAll reported replicas are: A = 203f859ed1f14e729e9ef223d8af3bdc B = a159de83d21b40efae306c419e04719d C = 84e1942cf050461d8c14552c4fbd0431The consensus matrix is: Config source | Replicas | Current term | Config index | Committed?---------------+------------------------+--------------+--------------+------------ master | A* B C | | | Yes A | A* B C | 2 | -1 | Yes B | A* B C | 2 | -1 | Yes C | [config not available] | | |The cluster doesn't have any matching system tablesSummary by table Name | RF | Status | Total Tablets | Healthy | Recovering | Under-replicated | Unavailable-----------------+----+------------------+---------------+---------+------------+------------------+------------- kudu_test_cases | 3 | UNDER_REPLICATED | 3 | 0 | 0 | 3 | 0Tablet Replica Count Summary Statistic | Replica Count----------------+--------------- Minimum | 3 First Quartile | 3 Median | 3 Third Quartile | 3 Maximum | 3Total Count Summary | Total Count----------------+------------- Masters | 2 Tablet Servers | 3 Tables | 1 Tablets | 3 Replicas | 9==================Warnings:==================tserver unusual flags check error: 1 of 3 tservers were not available to retrieve unusual flagstserver diverged flags check error: 1 of 3 tservers were not available to retrieve time_source category flags==================Errors:==================Network error: error fetching info from tablet servers: failed to gather info for all tablet servers: 1 of 3 had errorsCorruption: table consistency check error: 1 out of 1 table(s) are not healthyFAILEDRuntime error: ksck discovered errorsVerify Cluster Data Consistency Using ksck
You can use the --checksum_scan option to verify data consistency across the cluster. This option scans the tablets and compares their data.
- Use
--tablesto limit the scan to specific tables. - Use
--tabletsto limit the scan to specific tablets.
--tablets option refers to tablet IDs, not tablet servers.
To retrieve tablet IDs, run:
kudu cluster ksck --checksum_scan --tables kudu_test_cases,kudu_test_cases2 kudu1.acceldata.dvl:7051,kudu2.acceldata.dvl:7051...Checksum Summary-----------------------kudu_test_cases-----------------------T 2e93daf36d6444ccb47467030dced13d P 203f859ed1f14e729e9ef223d8af3bdc (kudu3.acceldata.dvl:7050): Checksum: 71467469766368T 2e93daf36d6444ccb47467030dced13d P 84e1942cf050461d8c14552c4fbd0431 (kudu1.acceldata.dvl:7050): Checksum: 71467469766368T 2e93daf36d6444ccb47467030dced13d P a159de83d21b40efae306c419e04719d (kudu2.acceldata.dvl:7050): Checksum: 71467469766368T 31b8c9032cb24f33a3c87bf480c8dc9a P 203f859ed1f14e729e9ef223d8af3bdc (kudu3.acceldata.dvl:7050): Checksum: 71709626373140T 31b8c9032cb24f33a3c87bf480c8dc9a P 84e1942cf050461d8c14552c4fbd0431 (kudu1.acceldata.dvl:7050): Checksum: 71709626373140T 31b8c9032cb24f33a3c87bf480c8dc9a P a159de83d21b40efae306c419e04719d (kudu2.acceldata.dvl:7050): Checksum: 71709626373140T a05f3b7a68e64ed69880ba18827c9089 P 203f859ed1f14e729e9ef223d8af3bdc (kudu3.acceldata.dvl:7050): Checksum: 71767527903577T a05f3b7a68e64ed69880ba18827c9089 P 84e1942cf050461d8c14552c4fbd0431 (kudu1.acceldata.dvl:7050): Checksum: 71767527903577T a05f3b7a68e64ed69880ba18827c9089 P a159de83d21b40efae306c419e04719d (kudu2.acceldata.dvl:7050): Checksum: 71767527903577-----------------------kudu_test_cases2-----------------------T 1bfed8820f044fd186a4ba8a05599c8c P 203f859ed1f14e729e9ef223d8af3bdc (kudu3.acceldata.dvl:7050): Checksum: 71709626373140T 1bfed8820f044fd186a4ba8a05599c8c P 84e1942cf050461d8c14552c4fbd0431 (kudu1.acceldata.dvl:7050): Checksum: 71709626373140T 1bfed8820f044fd186a4ba8a05599c8c P a159de83d21b40efae306c419e04719d (kudu2.acceldata.dvl:7050): Checksum: 71709626373140T 2e7bfb4cb69548a3b8662d1837ca463e P 203f859ed1f14e729e9ef223d8af3bdc (kudu3.acceldata.dvl:7050): Checksum: 71767527903577T 2e7bfb4cb69548a3b8662d1837ca463e P 84e1942cf050461d8c14552c4fbd0431 (kudu1.acceldata.dvl:7050): Checksum: 71767527903577T 2e7bfb4cb69548a3b8662d1837ca463e P a159de83d21b40efae306c419e04719d (kudu2.acceldata.dvl:7050): Checksum: 71767527903577T bda2d804309c48aea0d690de9099c141 P 203f859ed1f14e729e9ef223d8af3bdc (kudu3.acceldata.dvl:7050): Checksum: 71467469766368T bda2d804309c48aea0d690de9099c141 P 84e1942cf050461d8c14552c4fbd0431 (kudu1.acceldata.dvl:7050): Checksum: 71467469766368T bda2d804309c48aea0d690de9099c141 P a159de83d21b40efae306c419e04719d (kudu2.acceldata.dvl:7050): Checksum: 71467469766368#Here we take the id of two of the tablets from kudu_test_cases: kudu cluster ksck --checksum_scan --tablets 2e93daf36d6444ccb47467030dced13d,31b8c9032cb24f33a3c87bf480c8dc9a kudu1.acceldata.dvl,kudu2.acceldata.dvlThe first column after the T is the ID of each tablet.
For more details, see Apache Documentation.
Was this page helpful?