Check Kudu Cluster Health with ksck
Kudu provides the ksck tool to check cluster health and gather detailed information. It can detect issues such as under-replicated tablets, unreachable tablet servers, or tablets without a leader.
Run ksck to Check the Cluster Health
- Run the command as the 
kuduuser. - To see all available commands and options, use:
 
ksck --help- If the cluster is healthy, ksck prints cluster information, a success message, and returns 0.
 - If there are errors, ksck returns a non-zero status code.
 
Example: ksck Output

If an error occurs—for example, when a tablet server is down—ksck returns a non-zero status code and displays output similar to the following:
x
    Master Summary               UUID               |         Address          | Status----------------------------------+--------------------------+--------- 0382cc3a47ba4dcf84abb15fe4740775 | kudu2.acceldata.dvl:7051 | HEALTHY 41bd768443104bcbad3b9c28a6485208 | kudu1.acceldata.dvl:7051 | HEALTHYFlags of checked categories for Master:        Flag         |                            Value                            |         Master---------------------+-------------------------------------------------------------+------------------------- builtin_ntp_servers | 0.pool.ntp.org,1.pool.ntp.org,2.pool.ntp.org,3.pool.ntp.org | all 2 server(s) checked time_source         | system                                                      | all 2 server(s) checkedTablet Server Summary               UUID               |         Address          |   Status    | Location | Tablet Leaders | Active Scanners----------------------------------+--------------------------+-------------+----------+----------------+----------------- 203f859ed1f14e729e9ef223d8af3bdc | kudu3.acceldata.dvl:7050 | HEALTHY     | <none>   |       3        |       0 a159de83d21b40efae306c419e04719d | kudu2.acceldata.dvl:7050 | HEALTHY     | <none>   |       0        |       0 84e1942cf050461d8c14552c4fbd0431 | kudu1.acceldata.dvl:7050 | UNAVAILABLE | <none>   | n/a            | n/aTablet Server Location Summary Location |  Count----------+--------- <none>   |       3Error from kudu1.acceldata.dvl:7050: Network error: could not get status from server: Client connection negotiation failed: client connection to 10.100.11.22:7050: connect: Connection refused (error 111) (UNAVAILABLE)Flags of checked categories for Tablet Server:        Flag         |                            Value                            |                   Tablet Server---------------------+-------------------------------------------------------------+---------------------------------------------------- builtin_ntp_servers | 0.pool.ntp.org,1.pool.ntp.org,2.pool.ntp.org,3.pool.ntp.org | kudu3.acceldata.dvl:7050, kudu2.acceldata.dvl:7050 time_source         | system                                                      | kudu3.acceldata.dvl:7050, kudu2.acceldata.dvl:7050Version Summary     Version      |                                                          Servers------------------+--------------------------------------------------------------------------------------------------------------------------- 1.17.0.3.3.6.1-1 | master@kudu1.acceldata.dvl:7051, master@kudu2.acceldata.dvl:7051, tserver@kudu3.acceldata.dvl:7050, and 1 other server(s)Tablet SummaryTablet 952c5219eac34b268cf619d507b6cb4b of table 'kudu_test_cases' is under-replicated: 1 replica(s) not RUNNING  203f859ed1f14e729e9ef223d8af3bdc (kudu3.acceldata.dvl:7050): RUNNING [LEADER]  84e1942cf050461d8c14552c4fbd0431 (kudu1.acceldata.dvl:7050): TS unavailable  a159de83d21b40efae306c419e04719d (kudu2.acceldata.dvl:7050): RUNNINGAll reported replicas are:  A = 203f859ed1f14e729e9ef223d8af3bdc  B = 84e1942cf050461d8c14552c4fbd0431  C = a159de83d21b40efae306c419e04719dThe consensus matrix is: Config source |        Replicas        | Current term | Config index | Committed?---------------+------------------------+--------------+--------------+------------ master        | A*  B   C              |              |              | Yes A             | A*  B   C              | 2            | -1           | Yes A             | A*  B   C              | 2            | -1           | Yes B             | [config not available] |              |              | C             | A*  B   C              | 2            | -1           | YesTablet f762102c567f4edeaf9458a79b6c5916 of table 'kudu_test_cases' is under-replicated: 1 replica(s) not RUNNING  203f859ed1f14e729e9ef223d8af3bdc (kudu3.acceldata.dvl:7050): RUNNING [LEADER]  a159de83d21b40efae306c419e04719d (kudu2.acceldata.dvl:7050): RUNNING  84e1942cf050461d8c14552c4fbd0431 (kudu1.acceldata.dvl:7050): TS unavailableAll reported replicas are:  A = 203f859ed1f14e729e9ef223d8af3bdc  B = a159de83d21b40efae306c419e04719d  C = 84e1942cf050461d8c14552c4fbd0431The consensus matrix is: Config source |        Replicas        | Current term | Config index | Committed?---------------+------------------------+--------------+--------------+------------ master        | A*  B   C              |              |              | Yes A             | A*  B   C              | 2            | -1           | Yes B             | A*  B   C              | 2            | -1           | Yes C             | [config not available] |              |              |The cluster doesn't have any matching system tablesSummary by table      Name       | RF |      Status      | Total Tablets | Healthy | Recovering | Under-replicated | Unavailable-----------------+----+------------------+---------------+---------+------------+------------------+------------- kudu_test_cases | 3  | UNDER_REPLICATED | 3             | 0       | 0          | 3                | 0Tablet Replica Count Summary   Statistic    | Replica Count----------------+--------------- Minimum        | 3 First Quartile | 3 Median         | 3 Third Quartile | 3 Maximum        | 3Total Count Summary                | Total Count----------------+------------- Masters        | 2 Tablet Servers | 3 Tables         | 1 Tablets        | 3 Replicas       | 9==================Warnings:==================tserver unusual flags check error: 1 of 3 tservers were not available to retrieve unusual flagstserver diverged flags check error: 1 of 3 tservers were not available to retrieve time_source category flags==================Errors:==================Network error: error fetching info from tablet servers: failed to gather info for all tablet servers: 1 of 3 had errorsCorruption: table consistency check error: 1 out of 1 table(s) are not healthyFAILEDRuntime error: ksck discovered errorsVerify Cluster Data Consistency Using ksck
You can use the --checksum_scan option to verify data consistency across the cluster. This option scans the tablets and compares their data.
- Use 
--tablesto limit the scan to specific tables. - Use 
--tabletsto limit the scan to specific tablets. 
--tablets option refers to tablet IDs, not tablet servers.
To retrieve tablet IDs, run:
    kudu cluster ksck --checksum_scan --tables kudu_test_cases,kudu_test_cases2 kudu1.acceldata.dvl:7051,kudu2.acceldata.dvl:7051...Checksum Summary-----------------------kudu_test_cases-----------------------T 2e93daf36d6444ccb47467030dced13d P 203f859ed1f14e729e9ef223d8af3bdc (kudu3.acceldata.dvl:7050): Checksum: 71467469766368T 2e93daf36d6444ccb47467030dced13d P 84e1942cf050461d8c14552c4fbd0431 (kudu1.acceldata.dvl:7050): Checksum: 71467469766368T 2e93daf36d6444ccb47467030dced13d P a159de83d21b40efae306c419e04719d (kudu2.acceldata.dvl:7050): Checksum: 71467469766368T 31b8c9032cb24f33a3c87bf480c8dc9a P 203f859ed1f14e729e9ef223d8af3bdc (kudu3.acceldata.dvl:7050): Checksum: 71709626373140T 31b8c9032cb24f33a3c87bf480c8dc9a P 84e1942cf050461d8c14552c4fbd0431 (kudu1.acceldata.dvl:7050): Checksum: 71709626373140T 31b8c9032cb24f33a3c87bf480c8dc9a P a159de83d21b40efae306c419e04719d (kudu2.acceldata.dvl:7050): Checksum: 71709626373140T a05f3b7a68e64ed69880ba18827c9089 P 203f859ed1f14e729e9ef223d8af3bdc (kudu3.acceldata.dvl:7050): Checksum: 71767527903577T a05f3b7a68e64ed69880ba18827c9089 P 84e1942cf050461d8c14552c4fbd0431 (kudu1.acceldata.dvl:7050): Checksum: 71767527903577T a05f3b7a68e64ed69880ba18827c9089 P a159de83d21b40efae306c419e04719d (kudu2.acceldata.dvl:7050): Checksum: 71767527903577-----------------------kudu_test_cases2-----------------------T 1bfed8820f044fd186a4ba8a05599c8c P 203f859ed1f14e729e9ef223d8af3bdc (kudu3.acceldata.dvl:7050): Checksum: 71709626373140T 1bfed8820f044fd186a4ba8a05599c8c P 84e1942cf050461d8c14552c4fbd0431 (kudu1.acceldata.dvl:7050): Checksum: 71709626373140T 1bfed8820f044fd186a4ba8a05599c8c P a159de83d21b40efae306c419e04719d (kudu2.acceldata.dvl:7050): Checksum: 71709626373140T 2e7bfb4cb69548a3b8662d1837ca463e P 203f859ed1f14e729e9ef223d8af3bdc (kudu3.acceldata.dvl:7050): Checksum: 71767527903577T 2e7bfb4cb69548a3b8662d1837ca463e P 84e1942cf050461d8c14552c4fbd0431 (kudu1.acceldata.dvl:7050): Checksum: 71767527903577T 2e7bfb4cb69548a3b8662d1837ca463e P a159de83d21b40efae306c419e04719d (kudu2.acceldata.dvl:7050): Checksum: 71767527903577T bda2d804309c48aea0d690de9099c141 P 203f859ed1f14e729e9ef223d8af3bdc (kudu3.acceldata.dvl:7050): Checksum: 71467469766368T bda2d804309c48aea0d690de9099c141 P 84e1942cf050461d8c14552c4fbd0431 (kudu1.acceldata.dvl:7050): Checksum: 71467469766368T bda2d804309c48aea0d690de9099c141 P a159de83d21b40efae306c419e04719d (kudu2.acceldata.dvl:7050): Checksum: 71467469766368#Here we take the id of two of the tablets from kudu_test_cases: kudu cluster ksck --checksum_scan --tablets 2e93daf36d6444ccb47467030dced13d,31b8c9032cb24f33a3c87bf480c8dc9a kudu1.acceldata.dvl,kudu2.acceldata.dvlThe first column after the T is the ID of each tablet.
For more details, see Apache Documentation.
Was this page helpful?