Wednesday, 27 June 2012

CheckPoint HA: How to force a failover (ClusterXL/VRRP)

Hi Everyone,

Based on some recent conversations I've had, it seems most people don't know how to force or test a failover with Check Point HA.

There is a single requirement for non-SPLAT/GAIA systems; FW-1 Monitoring State needs to be enabled. If you're running IPSO, you can do this via the VRRP configuration page.

To force a failover, run the following commands on the current cluster master:

This creates a pnote (problem notification) that is in problem state:
cphaprob -d fail -s problem -t 0 register
Verify it's in problem state with
cphaprob stat
and
cphaprob -i list
(you should see 'fail' in problem state)

Once you've finished your testing, run these two to reset it:
cphaprob -d fail -s ok report
cphaprob -d fail unregister

Make sure to verify that the pnote has been removed correctly before you log off.

That's it!

14 comments:

  1. There is clusterXL_admin up/down command, that makes all that automatically.

    ReplyDelete
  2. Hi Max,

    There's a few problems with that.
    1) It doesn't work on anything not running ClusterXL (GAIA/IPSO VRRP etc)
    2) It simulates an unrealistic failure. Having a failed pnote actually happens, whereas CXL downing itself is extremely rare.

    Since this works on all platforms and is probably the most 'realistic' way of simulating a CP induced fail over, it's what most people standardize on.

    Cheers,

    ReplyDelete
    Replies
    1. Well, I'll agree on the first item.
      Regarding the second, AFAIK clusterXL_admin doesn't bring down CXL, it just registers new pnote (just as in your example) and puts it into problem state.

      Delete
  3. Hey, well look at that. You're right :)

    Device Name: admin_down
    Registration number: 4
    Timeout: none
    Current state: problem
    Time since last report: 13.7 sec

    Very cool - thanks Maxim

    ReplyDelete
  4. Max +1 , pnote it's a pnote :) if clustering mechanism monitores pnotes then it will failover

    ReplyDelete
  5. clusterXL_admin up/down command works fine but be careful - doing this in multi-context mode (VSX) will force all of your active VS's to fail over to the standby node. There is also a way to failover ClusterXL through dashboard by changing the priority and pushing the policy to the cluster. As for VRRP , it was discussed already, change the VRRP priority through web UI on Nokia. There's a gotcha if you forget to enable Firewall monitoring and do a "cpstop" on the active node you will not have stateful failover.

    ReplyDelete
  6. Max +2
    Very nice explanation and example :)

    ReplyDelete
  7. There's an sk that goes into it in some more detail:

    http://supportcontent.checkpoint.com/solutions?id=sk55081

    ReplyDelete
  8. Thanks! This was extremely helpful :)

    ReplyDelete
  9. I agree running VSX R67.10 and was looking exactly for this kind of explanation vrrp vs clusterxl failover. Thank mate!

    ReplyDelete
  10. This comment has been removed by the author.

    ReplyDelete
  11. Why make things so sophisticated? For someone who want to force a failover is obviously to test that HA works well. The easiest way to do this is to check witch gateway is active and shut down it's interface internal/external by accessing the gateway at the CLI and type: set interface eth0 state off. Assuming eth0 is the interface that is monitored for failover. That's it! Check your HA status again and then bring the interfaces up by typing: set interface eth0 state on.

    ReplyDelete
    Replies
    1. Because this is how it is done *correctly*. Downing an interface does not necessarily trigger a failure condition on many clusters, nor is it necessarily what you want to be testing when you introduce an issue via pnote

      Delete