143

Automatic failover is here!

July 18, 2017 / 0 comments / in General  / by Peter Franklin

We’ve been hard at work here to provide you with the redundancy features that you’ve been wanting, and we’re happy to announce that in the new 8.5.1 release we have added support for automatic failover. You no longer need to manually initiate a failover when a catastrophe occurs. If this sounds interesting to you, read on!

Before we dive into the details, you’ll need to understand some of the basics of CygNet redundancy. If you aren’t familiar with it, you might want to first check out these past blog posts:

A look at redundancy for the 8.5 release

CygNet Redundancy Webinar Summary

How automatic failover works

Let’s get some terminology out of the way first. We use the term “active domain” to indicate the domain containing the services that clients are connected to, and “standby domain” to indicate the domain containing the services that are being replicated from the active domain, waiting to replace the active domain in the event of a failure.

Automatic failover is initiated when the Remote Service Manager (RSM) in a standby domain detects that the RSM in an active domain is stopped or no longer responding. When failing over, the standby RSM restarts its services on the active domain, becoming the active RSM itself, and taking over the responsibility for managing the services in the active domain.

One important concept to understand is that automatic failover is all about RSMs. It isn’t monitoring the other individual services that are running on the active domain. That is the job of the RSM on the active domain itself, and the feature Automatic Service Recovery (ASR) exists to recover individual services locally monitored by an RSM. So, the active RSM monitors the active services via ASR, and the standby RSM monitors the active RSM via automatic failover. I’ll have more to say about configuring ASR later on in this post.

Configuring automatic failover

The configuration for automatic failover is conveniently located in the redundancy editor along with all of your other redundancy configuration. Let’s take a look.

In the RSM view in CygNet Explorer, right-click and select “Configure RSM Redundancy”. In this dialog you’ll notice a new tab called “Auto-failover”. This is where you’ll define the conditions under which you want a failover to start.

Automatic failover is here

The important items to note here are the Test mode, Delay, and Period options.

  • Test mode – indicates that this trigger will not actually initiate a failover, but only log that it was triggered. This is for testing and ensuring that your system is set up correctly without actually causing a failover to occur. Once everything seems okay, you can deselect this and go “live” with it.
  • Delay – How long do you want to wait after detecting a problem until starting a failover? Maybe the network just had a hiccup and is going to recover in a few seconds or a minute, and in that case you don’t want to trigger the failover. Specify how long you are willing to wait before deciding to start the failover. If the problem is resolved within the specified delay, then no failover will be initiated.
  • Period – How often do you want the standby RSM to ask the active RSM if everything is okay? The default value of 0.5 seconds should work fine for most customers.

Once you’ve configured your trigger, you can then assign it to a standby RSM on the Domain tab by selecting the desired domain, then choosing the trigger from the “Failover trigger” drop-down.

Automatic failover is here - Domain Tab

As you can see here Trigger1 is assigned to the local standby domain 27002. You can only assign failover triggers to standby domains. It won’t allow you to assign a trigger to an active domain because only standby monitors other domains.

Once you’ve applied these settings you should be ready to auto-failover. You can test it by stopping your active RSM. After the configured delay you should see your standby RSM stop its services and restart them as the active domain. You can observe this in CygNet Explorer in the RSM view for your standby domain.

Automatic Service Recovery

So, you might be asking, what about when an individual service that isn’t an RSM stops or fails for some reason? This won’t cause an auto-failover because the active RSM is still up and running telling the standby RSM everything is fine and dandy. As I mentioned earlier this is what the Automatic Service Recovery (ASR) feature is for.

ASR allows you to choose what actions the RSM should take if it detects one of the services it is managing stops unexpectedly for some reason. This feature is not new in 8.5.1 however there have been some enhancements made. Prior to 8.5.1, to configure ASR, you would go to the RSM view of CygNet Explorer, right-click an individual service, select “Properties”, and click on the “Automatic Service Recovery” tab. There, you could enable ASR on that service and configure what actions you want to take if the service is down.

If you have 200 services you’d like to configure this could be very tedious and time consuming. Luckily in CygNet 8.5.1 you have a new option to configure all of the services that are part of a redundancy set at once.

On the Auto-failover tab of the CygNet Redundancy Editor click the button labeled, “Automatic recovery options…”.

You’ll see a list of all of the redundant services managed by all of the RSMs that are part of the redundancy configuration, as well as information about what ASR options are currently configured for each service.

Automatic failover is here

The list of services contains the following information:

  • Enabled – Does this service currently have any ASR options enabled?
  • Include restart action – Is the restart action currently enabled for the service? The restart action means that if the service is down unexpectedly, the RSM will try to restart it.
  • Include failover action – Is the failover action currently enabled for the service? The failover action means that, after trying a restart action (if configured), and the service still doesn’t successfully start, then the RSM will perform a failover on that service, so that the standby RSM will restart it and become its active RSM.
  • Min runtime – After attempting the restart action, how long should should the service run before it is considered successful? If it stops again before this amount of time, then the failover action will be executed (if configured).

To modify the ASR settings for any of the services, select them using the checkboxes on the left, or click the checkbox at the top of the column to select all services if you want to apply the same settings to all services. Then in the settings under the list of services, configure the desired ASR behavior you want, and click the “Apply” button.

There are a few items to be aware of when using these ASR settings. This will overwrite any existing ASR settings on the selected services. Also, when the “Enable service recovery” checkbox is deselected, that means that clicking “Apply” will remove all existing ASR settings from the selected services. There is a prompt warning you about these when you click “Apply” to make sure you are aware of the changes you are making.

Well, that covers it for today. I hope that this has been helpful. Get out there and start configuring your redundant services for automatic failover!

See the CygNet help topic, Redundancy, for more information about this and other related topics.

Share this entry
Share on Facebook Share on Twitter Share on Linkedin Share by Mail



Comments

Blog post currently doesn't have any comments.

Leave Comment

Subscribe to this blog post

Enter your email address to subscribe to this post and receive notifications of new comments by email.


Search the CygNet Blog

Subscribe to the CygNet Blog

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Tags