Processing unit status remains INTACT while ESM requests to start a new machine after failover

Description

We have an application running, and one of the machines got destroyed.
The ESM discovered this and tried to amend by starting a new machine.
But, the processing unit status remains INTACT until an actual GSC is loaded on the new machine and the GSM tries to deploy the USM instance. Only when this occurs the status changes to SCHEDULED. This is too late in the lifecycle to see as an indication.

I propose that the status will change based on the ESM state of the processing unit.

Activity

Show:
Meron Avigdor
November 12, 2012, 9:16 AM

Rest call output on Status of one of the services we are now destroying its machine.

Sun Nov 11 14:49:15 IST 2012 :"INTACT"} - requested to destroy machine at 14:49:15
Sun Nov 11 14:49:16 IST 2012 :"INTACT"}
Sun Nov 11 14:49:17 IST 2012 :"INTACT"}
...
Sun Nov 11 14:50:27 IST 2012 :"INTACT"}
Sun Nov 11 14:50:28 IST 2012 :"INTACT"}
Sun Nov 11 14:50:29 IST 2012 :"INTACT"}
Sun Nov 11 14:50:30 IST 2012 :"INTACT"}
Sun Nov 11 14:50:31 IST 2012 :"INTACT"}
...
Sun Nov 11 14:50:32 IST 2012 :"BROKEN"} - identified application is broken (1.17 min)
Sun Nov 11 14:50:33 IST 2012 :"BROKEN"}
Sun Nov 11 14:50:35 IST 2012 :"BROKEN"}
Sun Nov 11 14:50:36 IST 2012 :"BROKEN"}
...
Sun Nov 11 14:50:58 IST 2012 :"BROKEN"}
Sun Nov 11 14:50:59 IST 2012 :"BROKEN"}
...
Sun Nov 11 14:51:15 IST 2012 :"BROKEN"}
Sun Nov 11 14:51:16 IST 2012 :"BROKEN"}
Sun Nov 11 14:51:17 IST 2012 :"BROKEN"}
Sun Nov 11 14:51:18 IST 2012 :"BROKEN"}
Sun Nov 11 14:51:19 IST 2012 :"BROKEN"}
Sun Nov 11 14:51:20 IST 2012 :"BROKEN"}
...
Sun Nov 11 14:52:16 IST 2012 :"BROKEN"}
Sun Nov 11 14:52:17 IST 2012 :"BROKEN"}
Sun Nov 11 14:52:18 IST 2012 :"BROKEN"}
Sun Nov 11 14:52:19 IST 2012 :"BROKEN"}
Sun Nov 11 14:52:20 IST 2012 :"BROKEN"}
Sun Nov 11 14:52:21 IST 2012 CHEDULED"} - machine is up and CF agent is running - attempt to repair application (3.06 min)
Sun Nov 11 14:52:22 IST 2012 CHEDULED"}
Sun Nov 11 14:52:23 IST 2012 CHEDULED"}
Sun Nov 11 14:52:24 IST 2012 CHEDULED"}

Sun Nov 11 14:53:30 IST 2012 CHEDULED"}
Sun Nov 11 14:53:31 IST 2012 :"INTACT"} - application is running (4.16 min)
Sun Nov 11 14:53:32 IST 2012 :"INTACT"}
Sun Nov 11 14:53:33 IST 2012 :"INTACT"}
Sun Nov 11 14:53:34 IST 2012 :"INTACT"}
Sun Nov 11 14:53:35 IST 2012 :"INTACT"}

We noticed it takes about 1-1.5 min to identify that the machine was destroyed before the status changes from INTACT to BROKEN.
When I opened the bug, I stated that it takes a long time until the status moves to SCHEDULED - which happens after 3 min.
I am lowering the priority of this bug since Alcatel-Lucent is satisfied as long as the status does not report INTACT - which happens in a 1-1.5 min time frame.

Assignee

Unassigned

Reporter

Meron Avigdor

Labels

Priority

Medium
Configure