Wednesday, 27 June 2012

CheckPoint HA: How to force a failover (ClusterXL/VRRP)

Hi Everyone,

Based on some recent conversations I've had, it seems most people don't know how to force or test a failover with Check Point HA.

There is a single requirement for non-SPLAT/GAIA systems; FW-1 Monitoring State needs to be enabled. If you're running IPSO, you can do this via the VRRP configuration page.

To force a failover, run the following commands on the current cluster master:

This creates a pnote (problem notification) that is in problem state:
cphaprob -d fail -s problem -t 0 register
Verify it's in problem state with
cphaprob stat
and
cphaprob -i list
(you should see 'fail' in problem state)

Once you've finished your testing, run these two to reset it:
cphaprob -d fail -s ok report
cphaprob -d fail unregister

Make sure to verify that the pnote has been removed correctly before you log off.

That's it!

11 comments:

  1. There is clusterXL_admin up/down command, that makes all that automatically.

    ReplyDelete
  2. Hi Max,

    There's a few problems with that.
    1) It doesn't work on anything not running ClusterXL (GAIA/IPSO VRRP etc)
    2) It simulates an unrealistic failure. Having a failed pnote actually happens, whereas CXL downing itself is extremely rare.

    Since this works on all platforms and is probably the most 'realistic' way of simulating a CP induced fail over, it's what most people standardize on.

    Cheers,

    ReplyDelete
    Replies
    1. Well, I'll agree on the first item.
      Regarding the second, AFAIK clusterXL_admin doesn't bring down CXL, it just registers new pnote (just as in your example) and puts it into problem state.

      Delete
  3. Hey, well look at that. You're right :)

    Device Name: admin_down
    Registration number: 4
    Timeout: none
    Current state: problem
    Time since last report: 13.7 sec

    Very cool - thanks Maxim

    ReplyDelete
  4. Max +1 , pnote it's a pnote :) if clustering mechanism monitores pnotes then it will failover

    ReplyDelete
  5. clusterXL_admin up/down command works fine but be careful - doing this in multi-context mode (VSX) will force all of your active VS's to fail over to the standby node. There is also a way to failover ClusterXL through dashboard by changing the priority and pushing the policy to the cluster. As for VRRP , it was discussed already, change the VRRP priority through web UI on Nokia. There's a gotcha if you forget to enable Firewall monitoring and do a "cpstop" on the active node you will not have stateful failover.

    ReplyDelete
  6. Max +2
    Very nice explanation and example :)

    ReplyDelete
  7. There's an sk that goes into it in some more detail:

    http://supportcontent.checkpoint.com/solutions?id=sk55081

    ReplyDelete
  8. Thanks! This was extremely helpful :)

    ReplyDelete
  9. I agree running VSX R67.10 and was looking exactly for this kind of explanation vrrp vs clusterxl failover. Thank mate!

    ReplyDelete