Exercise #6: Creating a callback

 

In this exercise you will create a GPFS callback that is triggered any time GPFS is either shut down or fails on a node in the cluster. In this exercise the callback script creates a log file listing each time the GPFS daemon goes down on a node with some data about the event.

 

Objectives:

 
  • Create a callback that notifies when a node goes down
  • Test the callback using a simple log script
 

Requirements:

 
  1. Complete Exercise 1: Installing the cluster
  2. At least 2 nodes in the cluster
 

Step 1: Creating the callback script

 
  1. Create a script directory. By default callbacks are executed on all nodes in the cluster so begin by creating a local directory on each node. The callback script should not be stored in a GPFS file system.
    mkdir /callback
  2. Determine the information required to create the callback:
    1. The Event on which to trigger the callback
    2. Use nodeLeave because that is the event type that is triggered when a GPFS node leaves the cluster for any reason
  3. Event Data- GPFS provides state data to the script when the event occurs. In this case we collect the name of the node that had the issue (eventNode) and a list of quorum nodes (quorumNodes) so you can see if th failed node was a quorum node.
    You can list as many parameters as you wish to use but these two are sufficient for this example.
  4. The parameter values are passed to the script separated by white space in the order they are listed in the mmaddcallback command. For example if node1 failed and node1 and node2 are quorum nodes the script is called as follows:
    /callback/nodedown.ksh node1 node1,node2
  5. Now you create a script (/callback/nodedown.ksh) and make it executable. This simple kshell script logs the events to a file /callback/nodedown.log
    #!/bin/ksh
    echo "Logging a node leave event at: `date` " >> /callback/nodedown.log
    echo "  The event occurred on node: " $1 >> /callback/nodedown.log
    echo "  The quorum nodes are: " $2 >> /callback/nodedown.log
  6. Copy this script to all nodes in the cluster.
    scp /callback/nodedown.ksh node2: /callback/nodedown.ksh
  7. Make sure the script is executable on all of the nodes.
    chmod +x /callback/nodedown.ksh
 

Step 2: Creating the callback

 
  1. Based on that information you can now create the callback
    mmaddcallback NodeDownCallback --command  /callback/nodedown.ksh --event nodeLeave \
       --parms %eventNode --parms %quorumNodes
  2. Use the mmlscallback command to view the callback definition.
    # mmlscallback
    NodeDownCallback
            command       = /callback/nodedown.ksh
            event         = nodeLeave
            parms         = %eventNode %quorumNodes
 

Step 3: Test the callback

 
  1. Now you are ready to test the callback. To test this callback shut down GPFS on one of the nodes.
    mmshutdown -N node2
  2. To verify that an event was generated on one of the other nodes take a look at the log file.
    cat /callback/nodedown.log

    It should look similar to:

    # cat /callback/nodedown.log
    Logging a node leave event at: Thu Aug  5 16:45:21 PDT 2010
      The event occurred on node:  node1.test.net
      The quorum nodes are:  node2.test.net,node3.test.net

Now you can modify the script to fit your environment, for example, the script could generate an email when a node fails. Two implementation details to keep in mind when creating callback scripts:

  • By default the event is sent to all of the nodes in the cluster, so your script should handle this situation. You probably don't want 20 emails from 20 nodes for each event. You can limit which nodes handle alerts using the -N parameter to the mmaddcallback command.
  • Some events trigger more than once. For example a lowDiskSpace is triggered every 2 minutes until utilization falls below the policy threshold.