Corrupted Node Removal

Standard Red5 Pro Streammanager node monitoring is performed by the nodes reporting to the stream manager via RTMP. Optionally, you may also want to monitor http response of the node via the stream manager. Note this is specifically recommended if you have heavy WebRTC traffic, which relies on http and tomcat for websockets.

NODE CONTROLLER CONFIGURATION SECTION

To enable HTTP node monitoring, modify the following lines in the NODE CONTROLLER CONFIGURATION SECTION - MILLISECONDS section:

instancecontroller.checkCorruptedNodes=false
instancecontroller.corruptedNodeCheckInterval=300000
instancecontroller.corruptedNodesEndPoint=live
instancecontroller.httptimeout=30000
  • Change instancecontroller.checkCorruptedNodes=false to instancecontroller.checkCorruptedNodes=true
  • The default check interval (instancecontroller.corruptedNodeCheckInterval) is set to 300,000 milliseconds (5 minutes). You can make this more or less aggressive, keeping in mind that the more nodes you have active, the more load this will place on your stream manager.
  • instancecontroller.corruptedNodesEndPoint is the webapp to target checking. The default webapp is the live webapp, but if you have a custom webapp and/or are not using the default live webapp, you can change this to target a different webapp
  • instancecontroller.httptimeout is set to 30,000 (30 seconds) by default. This means that when it is checked, the node has 30 seconds to respond to the http request. If the stream manager doesn't get any response, or gets an error response, then the node will be terminated and replaced. As with the checkInterval, you can make this more or less aggressive as you wish.

Testing

You can test this functionality by ssh'ing into a node and removing the following directories: {red5pro}/work/red5Engine/0.0.0.0/live and {red5pro}/webapps/live. At the next scheduled check time, you should see something like the following in the red5.log file:

2020-02-07 20:51:34,658 [pool-19-thread-7] WARN c.r.s.s.n.t.CorruptedNodeChecker - HOST: Node [id=9, host=77.79.71.160, info=NodeInfo [id=9, nodeId=9, clientCount=0, publisherCount=0, restreamerCount=0, origins=[159.203.71.0, 165.227.127.59], edges=[], connectionCapacity=300, extendedClientCount=0, lastTrafficTime=1581108660089, lastPing=1581108687980], state=INSERVICE, type=EDGE, availabilityZone=nyc3, name=qanode-nyc3-1581027653741] http/websocket service is down. Removing from cluster.