Loading...
 

The incident was resolved by replacing the hardware. Should the incident be kept open to manage the repair process

By ITIL® from Experience© with contribution by John Gabel

Here is a scenario:

  • A piece of hardware caused an incident1
  • To resolve the incident the hardware is replaced with a spare from storage
  • To manage the repair of the defective hardware…


1. The incident is kept open but a stop-the-clock action is taken, or

2. The incident is closed and:
a) A Problem Record is logged, or
b) A Request For Change (RFC) is logged, or
c) A Service Request is logged

Let us examine each option to determine which one is best.

1. The incident is kept open but a stop-the-clock action is taken
Although this avoids creating multiple records for the same Event2, the incident management process is actually completed as it is responsible to: “…ensure that normal service operation is restored as quickly as possible and the business impact is minimized.”3. Keeping the incident open also adversely affects the metrics of the Incident Management process such as total duration even though it would not breach since the SLA clock has been stopped.

Also, in practical terms, not all ITSM tools can launch a workflow from an incident record, especially after the incident has been resolved. In addition, if the supplier is involved in conducting the repair, the ITSM tool may not be able to add a new SLA to a resolved incident in order to account for the supplier’s agreement.

2a) The incident is closed and a Problem Record is opened
The incident management metrics are not affected since the incident is closed but Problem4 Management is primarily an investigative process. It identifies problems, determines their root cause, prepares known issues for the knowledge base or fixes the cause by initiating a change. In this scenario, the objective is not to determine the cause of the incident, but to manage the repair of the defective hardware. In addition, it is not feasible for most organizations to log a problem for every unexplained incident.

If a lot of incidents are generated by the failure of this type of hardware, a problem may be opened to investigate the cause of these problems. This is discussed further, following the analysis of the options.

2b) The incident is closed and an RFC is logged
The incident management metrics are not affected since the incident is closed, but in this case repairing the hardware is not related to changing the infrastructure. The hardware may not generate a change immediately since after being repaired it would be returned to storage until ready to be deployed.

Also depending on the scope of the change management process, this Configuration Item (CI) may not be under control of the change management process yet the repair process still needs to be managed.

2c) The incident is closed and a Service Request is logged
The incident management metrics are not affected since the incident is closed, and the Request Fulfillment process can be used to manage the repair, with a supplier SLA if this is the case. A workflow may also be available. If the defective hardware cannot be repaired, an acquisition process could be launched to replace the hardware and ensure that it is available in storage to address a future failure.

The recommendation is that the Service Request process (2c.) is the most appropriate process to handle the repair of the replaced hardware. There is no need to open a Problem Record or an RFC (although the Service Request logged in 2c. may lead to an RFC to put the repaired hardware back in service).

Let’s continue the story to understand when a problem would be logged…

  • The repaired or new hardware comes in and is tested but fails during testing. An incident would not be logged for this Event since the hardware was simply being tested – it didn’t even have time to generate an incident. In this case a problem or a change would not be logged, it would simply be returned to the supplier using the original service request or a new one to take advantage of a workflow and/or a supplier SLA.

Or…

  • The repaired or new hardware is functional and deployed (via an RFC) but fails the following month. A problem would be logged if this hardware is important enough to investigate why there have been many failures. If this hardware is not mission critical or highly visible, using trend analysis problem management may identify the situation and “proactively prevent incidents from happening and minimizes the impact of incidents that cannot be prevented.”5



Related:

More on Incident Management


From Around the Web:



Category:
ITIL Process > Incident Management


1. To simplify reading, the article refers to “hardware” as a generic term. It can be substituted by anything to make it more relevant to your situation such as a PC, router, air conditioning unit in the data center, etc.
2. Event: (ITIL® Service Operation) A change of state that has significance for the management of

an IT service or other configuration item. The term is also used to mean an alert or notification created by any IT service, configuration item or monitoring tool. Events typically require IT operations personnel to take actions, and often lead to incidents being logged.

Source: ITIL® glossary and abbreviations, English, 2011 www.itil-officialsite.com/InternationalActivities/TranslatedGlossaries.aspx
3. Source: Ibid
4. Problem: (ITIL® Service Operation) A cause of one or more incidents. The cause is not usually known at the time a problem record is created, and the problem management process is responsible for further investigation. Source: Ibid
5. Problem Management: (ITIL® Service Operation) proactively prevents incidents from happening and minimizes the impact of incidents that cannot be prevented. Source: Ibid




Disclaimer


Copyright 2013 - ITIL® from Experience - D.Matte