NOTE: This article will soon be updated to reflect the evolution of EUP to be based on the
Disciplined Agile (DA) tool kit. Please stay tuned.
The Enterprise Unified ProcessTM
(EUP) extends iterative/agile processes such as Disciplined Agile Delivery (DAD),
Extreme Programming (XP), or Scrum with an Operations and Support discipline, which reflects many of the strategies described
within the Information
Technology Infrastructure Library (ITIL) for IT Service Management (ITSM).
Like any phase or discipline within the EUP,
your organization will apply the activities contained within this discipline in
a manner that reflects your environment. Organizations that develop and deploy
systems in-house do more in terms of operational support than companies who
produce shrink-wrapped software. The latter may spend more time and effort on
support terms to keep their diversified base of paying customers satisfied and
therefore have no operational staff, while the former may have an entire team
dedicated just to operations.
Table of Contents
- Plan operations
and support deployment
- Support users
- Operate systems
- Prepare for disaster
- Recover from disaster
The high-level workflow for the Operations and Support discipline is depicted in
Figure 1 and the detailed
workflow in Figure 2. The primary goal of the Operations and Support discipline is to operate and
support your software in a
production environment. The focus of operations is to ensure that software
is running properly, that the network is available and monitored, and that the
appropriate data is backed up and restored as needed. Disaster plans are
created, and in the event a disaster occurs, they are executed to restore
primary systems. The focus of support is to assist end users by answering their
questions, analyzing the problems that they are encountering with production
systems, recording requests for new functionality, and making and applying
fixes. Furthermore, an important message of this discipline is that in
order for it to succeed your organization must be as agile as possible: it is
enterprise-level professionals (including operations and support staff) to work
in an agile manner, but they must choose to do so and be allowed to do so.
Figure 1. The Operations and
Support discipline workflow.
Figure 2. The amalgamated workflow of the
Operations and Support discipline.
A critical success factor within this discipline is planning the deployment
of a system into your production environment. This effort augments the
Deployment discipline to include planning how a system will be operated and
supported after it is deployed. The support manager must define the
support plan prior to deploying a system into production. The system support
plan should address:
- How support will be provided. There are several methods you can
use to implement support, such as email, online chat, and a live call
center. You should consider how support will be paid for by customers
(either internal or external): will you bill to a specific cost center or to
credit cards? You must also define how different types, or tiers, of support
for paying and non-paying customers will be implemented.
- System contact personnel. It is crucial to keep contact
information up-to-date because without a current list, the calls at 2 a.m.
may go unanswered, leaving you in a bind. Contact lists should include: Whom
should be contacted, the times that a person should be and should not be
contacted, the circumstances under which the person should be contacted, and
how the person should be contacted (telephone, pager, ...).
- Defect reporting and enhancement request strategy. Support
staff are the ones who take the initial call and work with the end user
throughout the process. Enhancement requests and low severity defects are
submitted to the Enterprise Change Control Board (ECCB), which is a part of
Portfolio Management discipline efforts.
- The support Service-Level Agreement (SLA). The support SLA
addresses the support issues for a system within a production environment.
- Defect prioritization and resolution time periods. As the
severity levels for defects (or enhancements) are already defined in the
support plan, the organization must establish what constraints (time, money,
and other resources) are put into each severity level to meet SLA
- Defect escalation criteria. Define the proper escalation
path based on the severity and agreed-upon resolution time periods.
- How to deliver fixes into production outside the scope of an official
release. For each system, you must determine the following: the
release schedule (defined date or on a per-customer basis), deployment
approach (via e-mail, web, physical media, or a combination thereof), and
the priority order (if any) for clients to receive fixes.
Similarly, the operations manager, working closely with the project teams,
creates the operations plan. The system operations plan defines how the system
will be operated while it is in production.
There are two basic strategies for delivering support: an escalation strategy
and a touch-and-hold strategy. The escalation strategy is based on the idea that
most support requests are fairly basic and therefore can be handled quickly,
whereas small minorities of requests are complicated and must be assigned or
escalated to more knowledgeable staff members. This approach scales well,
although hand-offs between support staff can prove to be frustrating for the
person being supported. With the touch-and-hold strategy, the initial person who
took the support request follows it through to the end, although this person may
have to work with other people to fulfill the request. The touch-and-hold
strategy typically results in greater customer satisfaction because the support
requester only needs to deal with one support engineer. However, this approach
requires highly skilled support people and is difficult to scale because it can
be hard to hire and retain such staff. More information about incident
management, problem reporting, and service desks can be found in their
associated ITIL Fact Sheets.
The goal of this activity, depicted in
Figure 3, is to operate systems in a production environment. Two main roles
are associated with this activity:
Figure 3. Operate systems workflow
Operator. This person is responsible for
keeping systems running, backing up and restoring data based on the
operations plan and requirements of the system, managing any problems,
performing periodic cleanup, performing fine tuning and any system
reconfigurations, monitoring systems, and redeploying systems as necessary.
Because operations support is usually a 24/7 activity, a hand-off protocol
should be defined and followed to address the process of handing off a
problem if it occurs between shifts, along with a definition of what each
team is responsible for completing both prior to and at the completion of a
Support Developer. This person is
responsible for applying maintenance fixes to the system via hot fixes (also
know as service packs or patches). This is often a member of the development
team. As with your development environment, always test any fine tuning or
reconfiguration of programs or systems in a test area prior to deploying it
to the production environment and have your back-out plan ready to put into
place if serious system problems occur.
Disaster recovery defines the steps you will follow to get your critical
systems back up and running in case something catastrophic happens. Disasters in
this context could be a natural disaster such as a hurricane or a tornado that
destroys your entire Network Operations Center (NOC). It also includes man-made
disasters like the electrical blackout that struck the American northeast in the
autumn of 2003. More information about IT Service Continuity Management can
be found in the associated ITIL Fact Sheet.
Your organization may ultimately have to execute the
disaster recovery plan. Be sure to have the disaster recovery plan in hard-copy
form kept with multiple people in multiple locations: when a disaster occurs you
may not have access to the electronic versions. The Operations Manager is
responsible for executing the disaster recovery plan, and that person will work
with the various project teams when necessary in order to recover the systems.
The final output of a successful disaster recovery is to have your systems
running, according to the plan, on a contingency platform. The operations
manager is also responsible for reviewing the recovery effort to identify and
then act on the lessons learned.