EUP

Enterprise Agile: Operations and Support

Follow @scottwambler on Twitter!



NOTE: This article will soon be updated to reflect the evolution of EUP to be based on Disciplined Agile Delivery (DAD). Please stay tuned.

The Enterprise Unified ProcessTM (EUP) extends iterative/agile processes such as Disciplined Agile Delivery (DAD), Extreme Programming (XP), or Scrum with an Operations and Support discipline, which reflects many of the best practices described within the Information Technology Infrastructure Library (ITIL) for IT Service Management (ITSM). Like any phase or discipline within the EUP, your organization will apply the activities contained within this discipline in a manner that reflects your environment. Organizations that develop and deploy systems in-house do more in terms of operational support than companies who produce shrink-wrapped software. The latter may spend more time and effort on support terms to keep their diversified base of paying customers satisfied and therefore have no operational staff, while the former may have an entire team dedicated just to operations.

Table of Contents

  1. Overview
  2. Plan operations and support deployment
  3. Support users
  4. Operate systems
  5. Prepare for disaster
  6. Recover from disaster
  7. Translations

1. Overview

The high-level workflow for the Operations and Support discipline is depicted in Figure 1 and the detailed amalgamated workflow in Figure 2. The primary goal of the Operations and Support discipline is to operate and support your software in a production environment. The focus of operations is to ensure that software is running properly, that the network is available and monitored, and that the appropriate data is backed up and restored as needed. Disaster plans are created, and in the event a disaster occurs, they are executed to restore primary systems. The focus of support is to assist end users by answering their questions, analyzing the problems that they are encountering with production systems, recording requests for new functionality, and making and applying fixes. Furthermore, an important message of this discipline is that in order for it to succeed your organization must be as agile as possible: it is possible for enterprise-level professionals (including operations and support staff) to work in an agile manner, but they must choose to do so and be allowed to do so.

Figure 1. The Operations and Support discipline workflow.

Figure 2. The amalgamated workflow of the Operations and Support discipline.

2. Plan Operations and Support Deployment

A critical success factor within this discipline is planning the deployment of a system into your production environment. This effort augments the Deployment discipline to include planning how a system will be operated and supported after it is deployed. The support manager must define the support plan prior to deploying a system into production. The system support plan should address:

  1. How support will be provided. There are several methods you can use to implement support, such as email, online chat, and a live call center. You should consider how support will be paid for by customers (either internal or external): will you bill to a specific cost center or to credit cards? You must also define how different types, or tiers, of support for paying and non-paying customers will be implemented.
  2. System contact personnel. It is crucial to keep contact information up-to-date because without a current list, the calls at 2 a.m. may go unanswered, leaving you in a bind. Contact lists should include: Whom should be contacted, the times that a person should be and should not be contacted, the circumstances under which the person should be contacted, and how the person should be contacted (telephone, pager, ...).
  3. Defect reporting and enhancement request strategy. Support staff are the ones who take the initial call and work with the end user throughout the process. Enhancement requests and low severity defects are submitted to the Enterprise Change Control Board (ECCB), which is a part of your Portfolio Management discipline efforts.
  4. The support Service-Level Agreement (SLA). The support SLA addresses the support issues for a system within a production environment.
  5. Defect prioritization and resolution time periods. As the severity levels for defects (or enhancements) are already defined in the support plan, the organization must establish what constraints (time, money, and other resources) are put into each severity level to meet SLA requirements.
  6. Defect escalation criteria. Define the proper escalation path based on the severity and agreed-upon resolution time periods.
  7. How to deliver fixes into production outside the scope of an official release. For each system, you must determine the following: the release schedule (defined date or on a per-customer basis), deployment approach (via e-mail, web, physical media, or a combination thereof), and the priority order (if any) for clients to receive fixes.

Similarly, the operations manager, working closely with the project teams, creates the operations plan. The system operations plan defines how the system will be operated while it is in production.

3. Support Users

There are two basic strategies for delivering support: an escalation strategy and a touch-and-hold strategy. The escalation strategy is based on the idea that most support requests are fairly basic and therefore can be handled quickly, whereas small minorities of requests are complicated and must be assigned or escalated to more knowledgeable staff members. This approach scales well, although hand-offs between support staff can prove to be frustrating for the person being supported. With the touch-and-hold strategy, the initial person who took the support request follows it through to the end, although this person may have to work with other people to fulfill the request. The touch-and-hold strategy typically results in greater customer satisfaction because the support requester only needs to deal with one support engineer. However, this approach requires highly skilled support people and is difficult to scale because it can be hard to hire and retain such staff. More information about incident management, problem reporting, and service desks can be found in their associated ITIL Fact Sheets. ITIL

4. Operate Systems

The goal of this activity, depicted in Figure 3, is to operate systems in a production environment. Two main roles are associated with this activity:

  1. Operator. This person is responsible for keeping systems running, backing up and restoring data based on the operations plan and requirements of the system, managing any problems, performing periodic cleanup, performing fine tuning and any system reconfigurations, monitoring systems, and redeploying systems as necessary. Because operations support is usually a 24/7 activity, a hand-off protocol should be defined and followed to address the process of handing off a problem if it occurs between shifts, along with a definition of what each team is responsible for completing both prior to and at the completion of a shift.
  2. Support Developer. This person is responsible for applying maintenance fixes to the system via hot fixes (also know as service packs or patches). This is often a member of the development team. As with your development environment, always test any fine tuning or reconfiguration of programs or systems in a test area prior to deploying it to the production environment and have your back-out plan ready to put into place if serious system problems occur.

Figure 3. Operate systems workflow details.

5. Prepare for Disaster

Disaster recovery defines the steps you will follow to get your critical systems back up and running in case something catastrophic happens. Disasters in this context could be a natural disaster such as a hurricane or a tornado that destroys your entire Network Operations Center (NOC). It also includes man-made disasters like the electrical blackout that struck the American northeast in the autumn of 2003. More information about IT Service Continuity Management can be found in the associated ITIL Fact Sheet.

6. Recover from Disaster

Your organization may ultimately have to execute the disaster recovery plan. Be sure to have the disaster recovery plan in hard-copy form kept with multiple people in multiple locations: when a disaster occurs you may not have access to the electronic versions. The Operations Manager is responsible for executing the disaster recovery plan, and that person will work with the various project teams when necessary in order to recover the systems. The final output of a successful disaster recovery is to have your systems running, according to the plan, on a contingency platform. The operations manager is also responsible for reviewing the recovery effort to identify and then act on the lessons learned.

7. Translations