When it comes to your environment, you cannot really just hope that it is going to work when you need it. You actually need to test your plan regularly to make sure that you are truly covered. Now this does not need to be a weekly or monthly burden on you or your staff, but there is a list of things that you should be doing at least one of each quarter on a scheduled basis.
Audit Backup Coverage
Take a look at the production systems in your environment and make sure that the file folders, databases, etc. are actually making it to offsite media at their prescribed intervals. Systems change over time and it is very easy to complete an upgrade or other systems change and forget to update the backup job to cover any additional folders or servers. In larger environments you may need to involve the application owners to validate what is needed for restoring their applications.
Test Restore of the Backup System Itself
In many environments, it is not that important to test the restore process of individual files, folders, or databases as the user community's requests typically take care of that task for you. What many organizations fail to do though is test restoring the backup system itself. This is often a nontrivial task as these systems often have a "catalog" of backups that needs to be restored first before any production data can be restored. Make sure that your DR kit includes these instructions as well as a plan for how to address replacing the primary media device used by your backups.
Test Fail-Over Systems
Many organizations, even in the Small and Medium Business space, are now using redundancy technologies such as clustering, load-balancing, secondary sites, etc. The viability of these secondary systems should not simply be assumed; these systems also need to be tested. The techniques involved do not need to be complicated or even executed during primary business hours. It can be as simple as rolling your cluster to its secondary node, shutting down one of the servers in your NLB group, or simply temporarily changing firewall rules to prevent access to a primary system. When doing these tests, do not forget to test the physical infrastructure pieces as well such as backup generators, backup cooling plans, physical space access, etc.
Test Communications and Staffing Plans
In the "lessons learned" stage of nearly every disaster recovery event I have ever been a part of, whether it was a test or an actual emergency; this is the one portion of the plan that usually got the most scrutiny. First responders were often confused on who needed to be contacted, when, how operations updates should be provided, and who secondary contacts were for the various areas. This confusion was often the single biggest factor contributing to slower recovery times and as a result lost productivity for the organization as a whole. Contact sheet updates are often overlooked as staffing changes occur. Additionally, many organization's plans for updating the staff at large relies heavily on internal electronic systems that may or may not be available in times of emergency.
Test DR Kit access
With many organizations relying on media couriers such as Iron Mountain for the offsite storage of critical DR materials such as their DR kits and backup media, it is prudent to test these vendors ability to bring you the needed materials in the timeframe that they have promised.
For those organizations not using a courier service, but rather relying on either bank safety deposit boxes or employees taking materials home, it is prudent to confirm that the "first responder" staff knows who and how to gain access to these critical DR materials.
Summary
While none of what I have posted is particularly revolutionary, these are critical items that are often overlooked by many organizations in the hustle and bustle of their daily operations. Working with your DR plan is NEVER exciting, it is simply stressful. In fact, it is like your first aid kit: Something you know you need to have and hope to never actually use.
Thomas A. Owen has over 15 years of consulting and direct employment experience at organizations ranging from the single site "Mom and Pop" to the multinational Fortune 100. In that time, he has had to face and correct many of the common demons that plague IT Managers/Administrators everyday. For other topics relevant to the Small and Medium sized business IT department, please see his daily blog, http://smartit4smb.blogspot.com, where a different topic is covered each week.
Article Source: http://EzineArticles.com/?expert=Thomas_A_Owen
Platinum Author