In my talk on Percona Live (download the presentation), I spoke about how we can use Percona Kubernetes Operators to deploy our own Database-as-Service, based on fully OpenSource components and independent from any particular cloud provider.
Today I want to mention an important tool that I use to test our Operators: ChaosMesh, which actually is part of CNCF and recently became GA version 1.0.
ChaosMesh seeks to deploy chaos engineering experiments in Kubernetes deployments which allows it to test how deployment is resilient against different kinds of failures.
Obviously, this tool is important for Kubernetes Database deployments, and I believe this also can be very useful to test your application deployment to understand how the application will perform and handle different failures.
ChaosMesh allows to emulate:
- Pod Failure: kill pod or error on pod
- Network Failure: network partitioning, network delays, network corruptions
- IO Failure: IO delays and IO errors
- Stress emulation: stress memory and CPU usage
- Kernel Failure: return errors on system calls
- Time skew: Emulate time drift on pods
For our Percona Kubernetes Operators, I found Network Failure especially interesting, as clusters that rely on network communication should provide enough resiliency against network issues.
Let’s review an example of how we can emulate a network failure on one of the pods. Assume we have cluster2 running:
kubectl get pods NAME READY STATUS RESTARTS AGE cluster2-haproxy-0 2/2 Running 1 12d cluster2-haproxy-1 2/2 Running 2 12d cluster2-haproxy-2 2/2 Running 2 12d cluster2-pxc-0 1/1 Running 0 12d cluster2-pxc-1 1/1 Running 0 12d cluster2-pxc-2 1/1 Running 0 12d
And we will isolate cluster2-pxc-1 from the rest of the cluster, by using the following Chaos Experiment:
apiVersion: chaos-mesh.org/v1alpha1 kind: NetworkChaos metadata: name: pxc-network-delay spec: action: partition # the specific chaos action to inject mode: one # the mode to run chaos action; supported modes are one/all/fixed/fixed-percent/random-max-percent selector: # pods where to inject chaos actions pods: pxc: # namespace of the target pods - cluster2-pxc-1 direction: to target: selector: pods: pxc: # namespace of the target pods - cluster2-pxc-0 mode: one duration: "3s" scheduler: # scheduler rules for the running time of the chaos experiments about pods. cron: "@every 1000s" --- apiVersion: chaos-mesh.org/v1alpha1 kind: NetworkChaos metadata: name: pxc-network-delay2 spec: action: partition # the specific chaos action to inject mode: one # the mode to run chaos action; supported modes are one/all/fixed/fixed-percent/random-max-percent selector: # pods where to inject chaos actions pods: pxc: # namespace of the target pods - cluster2-pxc-1 direction: to target: selector: pods: pxc: # namespace of the target pods - cluster2-pxc-2 mode: one duration: "3s" scheduler: # scheduler rules for the running time of the chaos experiments about pods. cron: "@every 1000s"
This will isolate the pod cluster2-pxc-1 for three seconds. Let’s see what happens with the workload which we directed on cluster2-pxc-0 node (the output is from sysbench-tpcc benchmark):
1041,56,1232.46,36566.42,16717.16,17383.33,2465.93,90.78,4.99,0.00 1042,56,1305.42,35841.03,16295.74,16934.44,2610.84,71.83,6.01,0.00 1043,56,1084.73,30647.99,14056.49,14422.06,2169.45,68.05,5.99,0.00 1044,56,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00 1045,56,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00 1046,56,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00 1047,56,129.00,4219.97,1926.99,2034.98,258.00,4683.57,0.00,0.00 1048,56,939.41,25800.68,11706.55,12215.31,1878.82,960.30,2.00,0.00 1049,56,1182.09,34390.72,15708.49,16318.05,2364.18,66.84,4.00,0.00
And the log from cluster2-pxc-1 pod:
2020-11-05T17:36:27.962719Z 0 [Warning] WSREP: Failed to report last committed 133737, -110 (Connection timed out) 2020-11-05T17:36:29.962975Z 0 [Warning] WSREP: Failed to report last committed 133888, -110 (Connection timed out) 2020-11-05T17:36:30.243902Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers: ssl://192.168.66.9:4567 ssl://192.168.71.201:4567 2020-11-05T17:36:31.161485Z 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://192.168.66.9:34760 local endpoint ssl://192.168.61.137:4567 cipher: ECDHE-RSA-AES256-GCM-SHA384 compression: none 2020-11-05T17:36:31.162325Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') connection established to 0008bac8 ssl://192.168.66.9:4567 2020-11-05T17:36:31.162694Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') reconnecting to 448e265d (ssl://192.168.71.201:4567), attempt 0 2020-11-05T17:36:31.174019Z 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://192.168.71.201:4567 local endpoint ssl://192.168.61.137:47252 cipher: ECDHE-RSA-AES256-GCM-SHA384 compression: none 2020-11-05T17:36:31.176521Z 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://192.168.71.201:56892 local endpoint ssl://192.168.61.137:4567 cipher: ECDHE-RSA-AES256-GCM-SHA384 compression: none 2020-11-05T17:36:31.177086Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') connection established to 448e265d ssl://192.168.71.201:4567 2020-11-05T17:36:31.177289Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') connection established to 448e265d ssl://192.168.71.201:4567 2020-11-05T17:36:34.244970Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') turning message relay requesting off
We can see that the node lost communication for three seconds and then recovered.
There is a variable evs.suspect_timeout with default five sec which defines the limit of how long the nodes will wait till forming a new quorum without the affected node. So let’s see what will happen if we isolate cluster2-pxc-1 for nine seconds:
369,56,1326.66,38898.39,17789.62,18462.43,2646.33,77.19,5.99,0.00 370,56,1341.82,38812.61,17741.30,18382.65,2688.65,74.46,5.01,0.00 371,56,364.33,11058.76,5070.72,5256.38,731.66,68.05,0.00,0.00 372,56,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00 373,56,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00 374,56,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00 375,56,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00 376,56,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00 377,56,613.56,17233.62,7862.12,8139.38,1232.12,6360.91,5.00,0.00 378,56,1474.66,43070.96,19684.16,20439.47,2947.33,75.82,4.00,0.00
The workload was stalled for five seconds but continued after that. And we can see from the log what happened with node cluster2-pxc-1. The log is quite verbose, but to describe what is happening:
- After 5 sec node declared that it lost connection to other nodes
- Figured out it is in minority and can’t form a quorum, declared itself NON-PRIMARY
- After the network restored, the node reconnected with cluster
- The node caught up with other nodes using IST (incremental state transfer) method
- Cluster became 3-nodes cluster
2020-11-05T17:39:18.282832Z 0 [Warning] WSREP: Failed to report last committed 334386, -110 (Connection timed out) 2020-11-05T17:39:19.283066Z 0 [Warning] WSREP: Failed to report last committed 334516, -110 (Connection timed out) 2020-11-05T17:39:20.768879Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers: ssl://192.168.66.9:4567 ssl://192.168.71.201:4567 2020-11-05T17:39:21.769154Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') reconnecting to 0008bac8 (ssl://192.168.66.9:4567), attempt 0 2020-11-05T17:39:21.769544Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') reconnecting to 448e265d (ssl://192.168.71.201:4567), attempt 0 2020-11-05T17:39:24.769604Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') connection to peer 00000000 with addr ssl://192.168.66.9:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout), socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 2949502432 cwnd: 1 last_queued_since: 2949803921272502 last_delivered_since: 2949803921272502 send_queue_length: 0 send_queue_bytes: 0 2020-11-05T17:39:25.269672Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') connection to peer 00000000 with addr ssl://192.168.71.201:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout), socket stats: rtt: 0 rttvar: 250000 rto: 4000000 lost: 1 last_data_recv: 2949502932 cwnd: 1 last_queued_since: 2949804421325209 last_delivered_since: 2949804421325209 send_queue_length: 0 send_queue_bytes: 0 2020-11-05T17:39:25.879338Z 0 [Note] WSREP: declaring node with index 0 suspected, timeout PT5S (evs.suspect_timeout) 2020-11-05T17:39:25.879373Z 0 [Note] WSREP: declaring node with index 2 suspected, timeout PT5S (evs.suspect_timeout) 2020-11-05T17:39:25.879399Z 0 [Note] WSREP: evs::proto(11fdd640, OPERATIONAL, view_id(REG,0008bac8,3)) suspecting node: 0008bac8 2020-11-05T17:39:25.879414Z 0 [Note] WSREP: evs::proto(11fdd640, OPERATIONAL, view_id(REG,0008bac8,3)) suspected node without join message, declaring inactive 2020-11-05T17:39:25.879431Z 0 [Note] WSREP: evs::proto(11fdd640, OPERATIONAL, view_id(REG,0008bac8,3)) suspecting node: 448e265d 2020-11-05T17:39:25.879445Z 0 [Note] WSREP: evs::proto(11fdd640, OPERATIONAL, view_id(REG,0008bac8,3)) suspected node without join message, declaring inactive 2020-11-05T17:39:26.379920Z 0 [Note] WSREP: declaring node with index 0 inactive (evs.inactive_timeout) 2020-11-05T17:39:26.379956Z 0 [Note] WSREP: declaring node with index 2 inactive (evs.inactive_timeout) 2020-11-05T17:39:26.791118Z 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://192.168.66.9:4567 local endpoint ssl://192.168.61.137:51672 cipher: ECDHE-RSA-AES256-GCM-SHA384 compression: none 2020-11-05T17:39:26.791958Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') connection established to 0008bac8 ssl://192.168.66.9:4567 2020-11-05T17:39:26.879766Z 0 [Note] WSREP: Current view of cluster as seen by this node view (view_id(NON_PRIM,0008bac8,3) memb { 11fdd640,0 } joined { } left { } partitioned { 0008bac8,0 448e265d,0 } ) 2020-11-05T17:39:26.879962Z 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1 2020-11-05T17:39:26.879975Z 0 [Note] WSREP: Current view of cluster as seen by this node view (view_id(NON_PRIM,11fdd640,4) memb { 11fdd640,0 } joined { } left { } partitioned { 0008bac8,0 448e265d,0 } ) 2020-11-05T17:39:26.880029Z 0 [Note] WSREP: Flow-control interval: [100, 100] 2020-11-05T17:39:26.880066Z 0 [Note] WSREP: Received NON-PRIMARY. 2020-11-05T17:39:26.880076Z 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 334573) 2020-11-05T17:39:26.880095Z 0 [Warning] WSREP: FLOW message from member 139968689209344 in non-primary configuration. Ignored. 2020-11-05T17:39:26.880121Z 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1 2020-11-05T17:39:26.880134Z 0 [Note] WSREP: Flow-control interval: [100, 100] 2020-11-05T17:39:26.880140Z 0 [Note] WSREP: Received NON-PRIMARY. 2020-11-05T17:39:26.880255Z 2 [Note] WSREP: New cluster view: global state: f2d3cb29-1578-11eb-857b-624f681f446d:334573, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3 2020-11-05T17:39:26.880287Z 2 [Note] WSREP: Setting wsrep_ready to false 2020-11-05T17:39:26.880310Z 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2020-11-05T17:39:26.880428Z 2 [Note] WSREP: New cluster view: global state: f2d3cb29-1578-11eb-857b-624f681f446d:334573, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3 2020-11-05T17:39:26.880438Z 2 [Note] WSREP: Setting wsrep_ready to false 2020-11-05T17:39:26.880445Z 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2020-11-05T17:39:27.193945Z 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://192.168.71.201:57892 local endpoint ssl://192.168.61.137:4567 cipher: ECDHE-RSA-AES256-GCM-SHA384 compression: none 2020-11-05T17:39:27.194926Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') connection established to 448e265d ssl://192.168.71.201:4567 2020-11-05T17:39:27.305150Z 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://192.168.71.201:4567 local endpoint ssl://192.168.61.137:48990 cipher: ECDHE-RSA-AES256-GCM-SHA384 compression: none 2020-11-05T17:39:27.306328Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') connection established to 448e265d ssl://192.168.71.201:4567 2020-11-05T17:39:27.882743Z 0 [Note] WSREP: declaring 0008bac8 at ssl://192.168.66.9:4567 stable 2020-11-05T17:39:27.882774Z 0 [Note] WSREP: declaring 448e265d at ssl://192.168.71.201:4567 stable 2020-11-05T17:39:27.883565Z 0 [Note] WSREP: Node 0008bac8 state primary 2020-11-05T17:39:27.884475Z 0 [Note] WSREP: Current view of cluster as seen by this node view (view_id(PRIM,0008bac8,5) memb { 0008bac8,0 11fdd640,0 448e265d,0 } joined { } left { } partitioned { } ) 2020-11-05T17:39:27.884499Z 0 [Note] WSREP: Save the discovered primary-component to disk 2020-11-05T17:39:27.885430Z 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3 2020-11-05T17:39:27.885465Z 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID. 2020-11-05T17:39:27.886654Z 0 [Note] WSREP: STATE EXCHANGE: sent state msg: da55f2d8-1f8d-11eb-80cf-075e56823087 2020-11-05T17:39:27.887174Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: da55f2d8-1f8d-11eb-80cf-075e56823087 from 0 (cluster2-pxc-0) 2020-11-05T17:39:27.887194Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: da55f2d8-1f8d-11eb-80cf-075e56823087 from 1 (cluster2-pxc-1) 2020-11-05T17:39:27.887208Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: da55f2d8-1f8d-11eb-80cf-075e56823087 from 2 (cluster2-pxc-2) 2020-11-05T17:39:27.887225Z 0 [Note] WSREP: Quorum results: version = 6, component = PRIMARY, conf_id = 4, members = 2/3 (primary/total), act_id = 338632, last_appl. = 334327, protocols = 0/9/3 (gcs/repl/appl), group UUID = f2d3cb29-1578-11eb-857b-624f681f446d 2020-11-05T17:39:27.887244Z 0 [Note] WSREP: Flow-control interval: [173, 173] 2020-11-05T17:39:27.887252Z 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 338632) 2020-11-05T17:39:27.887439Z 2 [Note] WSREP: State transfer required: Group state: f2d3cb29-1578-11eb-857b-624f681f446d:338632 Local state: f2d3cb29-1578-11eb-857b-624f681f446d:334573 2020-11-05T17:39:27.887476Z 2 [Note] WSREP: REPL Protocols: 9 (4, 2) 2020-11-05T17:39:27.887486Z 2 [Note] WSREP: REPL Protocols: 9 (4, 2) 2020-11-05T17:39:27.887504Z 2 [Note] WSREP: New cluster view: global state: f2d3cb29-1578-11eb-857b-624f681f446d:338632, view# 5: Primary, number of nodes: 3, my index: 1, protocol version 3 2020-11-05T17:39:27.887516Z 2 [Note] WSREP: Setting wsrep_ready to true 2020-11-05T17:39:27.887524Z 2 [Warning] WSREP: Gap in state sequence. Need state transfer. 2020-11-05T17:39:27.887530Z 2 [Note] WSREP: Setting wsrep_ready to false 2020-11-05T17:39:27.887540Z 2 [Note] WSREP: You have configured 'xtrabackup-v2' state snapshot transfer method which cannot be performed on a running server. Wsrep provider won't be able to fall back to it if other means of state transfer are unavailable. In that case you will need to restart the server. 2020-11-05T17:39:27.887556Z 2 [Note] WSREP: Auto Increment Offset/Increment re-align with cluster membership change (Offset: 2 -> 2) (Increment: 3 -> 3) 2020-11-05T17:39:27.887563Z 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2020-11-05T17:39:27.907686Z 2 [Note] WSREP: Assign initial position for certification: 338632, protocol version: 4 2020-11-05T17:39:27.908853Z 0 [Note] WSREP: Service thread queue flushed. 2020-11-05T17:39:27.909023Z 2 [Note] WSREP: Check if state gap can be serviced using IST 2020-11-05T17:39:27.909165Z 2 [Note] WSREP: IST receiver addr using ssl://192.168.61.137:4568 2020-11-05T17:39:27.909236Z 2 [Note] WSREP: IST receiver using ssl 2020-11-05T17:39:27.910176Z 2 [Note] WSREP: Prepared IST receiver, listening at: ssl://192.168.61.137:4568 2020-11-05T17:39:27.910195Z 2 [Note] WSREP: State gap can be likely serviced using IST. SST request though present would be void. 2020-11-05T17:39:27.922651Z 0 [Note] WSREP: Member 1.0 (cluster2-pxc-1) requested state transfer from '*any*'. Selected 0.0 (cluster2-pxc-0)(SYNCED) as donor. 2020-11-05T17:39:27.922679Z 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 338687) 2020-11-05T17:39:27.922744Z 2 [Note] WSREP: Requesting state transfer: success, donor: 0 2020-11-05T17:39:27.922791Z 2 [Note] WSREP: GCache history reset: f2d3cb29-1578-11eb-857b-624f681f446d:334573 -> f2d3cb29-1578-11eb-857b-624f681f446d:338632 2020-11-05T17:39:27.956992Z 2 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): discarded 133734664 bytes 2020-11-05T17:39:27.957016Z 2 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): found 1/56 locked buffers 2020-11-05T17:39:27.958791Z 2 [Note] WSREP: Receiving IST: 4059 writesets, seqnos 334573-338632 2020-11-05T17:39:27.958908Z 0 [Note] WSREP: 0.0 (cluster2-pxc-0): State transfer to 1.0 (cluster2-pxc-1) complete. 2020-11-05T17:39:27.958929Z 0 [Note] WSREP: Member 0.0 (cluster2-pxc-0) synced with group. 2020-11-05T17:39:27.958946Z 0 [Note] WSREP: Receiving IST... 0.0% ( 0/4059 events) complete. 2020-11-05T17:39:30.770542Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') turning message relay requesting off 2020-11-05T17:39:31.851914Z 0 [Note] WSREP: Receiving IST...100.0% (4059/4059 events) complete. 2020-11-05T17:39:31.853178Z 2 [Note] WSREP: IST received: f2d3cb29-1578-11eb-857b-624f681f446d:338632 2020-11-05T17:39:31.854358Z 0 [Note] WSREP: 1.0 (cluster2-pxc-1): State transfer from 0.0 (cluster2-pxc-0) complete. 2020-11-05T17:39:31.854396Z 0 [Note] WSREP: SST leaving flow control 2020-11-05T17:39:31.854406Z 0 [Note] WSREP: Shifting JOINER -> JOINED (TO: 344195) 2020-11-05T17:40:17.927370Z 0 [Warning] WSREP: Trying to continue unpaused monitor 2020-11-05T17:40:26.972878Z 0 [Note] WSREP: Member 1.0 (cluster2-pxc-1) synced with group. 2020-11-05T17:40:26.972913Z 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 407602) 2020-11-05T17:40:27.062892Z 4 [Note] WSREP: Synchronized with group, ready for connections 2020-11-05T17:40:27.062911Z 4 [Note] WSREP: Setting wsrep_ready to true 2020-11-05T17:40:27.062922Z 4 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Conclusion
ChaosMesh is a great tool to test the resiliency of a deployment, and in my opinion, it can be useful not only for database clusters but also for the testing of general applications to make sure the application is able to sustain different failure scenarios.