Wayne D. Roesner

Home

Recommendations

PDF Resume

Word DOC Resume

LinkedIn

STAR
 

Wayne D. Roesner


15331 W 49th Ave    ◆    Golden, CO 80403    ◆    (563) 505-3489    ◆    wayne@wayneroesner.com

 
 

Stress Tolerance
Maintaining stable performance under pressure or opposition (such as time pressure or job ambiguity); relieving stress in a manner that is acceptable to the person, others, and the organization. “Tell me about a situation (personal, academic or professional) in which you were responsible for planning and organizing an event.” Hint: How did you get the assignment? How did you approach the task? How did you keep track of things? What tools did you use (to-do list, organizer, etc.) to help you? What was the first thing you did? What steps followed? How did you feel when the event took place?


08/11/2009 Worst Catastrophic Failure (Down three days)


Situation:

On Tuesday August 11, 2009 at 3:49pm, I was away on vacation and received a call from employee #1, they said, “ALL SYSTEMS ARE DOWN, employee #3 went home right after it happened, employee #2 is trying to get their server up and running, what do you want me to do?”

Task:

Immediately notify Upper Management, evaluate the situation, resolve issues, make sure everything was running again.

Action:

I cancelled my vacation plans, took my family home and made it to work by 4am the following morning. emp #1 and #2 were still trying to see what happened. I reviewed the situation with emp #1 and emp #2, as a team we found that the virtual disk containing the virtual drives for all the servers had been removed. I contacted emp #3 and asked them to come in right away to help. I contacted our support vendor for the Compellent, requesting help to possible restore. It did not look promising. In the meantime I created a new virtual disk and drives to start rebuilding the servers, if the restore did not work. Emp #1 and #2 were then able to start creating new servers so I could restore from backups. Compellent restore did not work it was deleted. Emp #3 came in and said “I did not do anything, the consultant told me to do it”. I stated I was not looking for blame, we need to get the issues resolved and get production and the company back up and running ASAP. We will review the whole situation after things are back up. By 4pm Wednesday servers were ready for restore. All of them restored properly except one, Fourth Shift, the Enterprise Resource Planning system. I worked with emp #2 to find the problem. The server application had become corrupted, but data was ok. Emp #3 copied the ERP CD, so the application could be reinstalled (a 6 hour process). I notified upper management of the status and said we should be up by Thursday morning. Since everything was running I sent Emp #1 and #3 home. I stayed with emp #2 to offer any support I could while everything was being reinstalled and to recover the data once the application was re-installed. System was up and running by 6am, emp #2 started testing and strange things were happening. Emp #2 contacted the ERP vendor, they had never seen the issue. After another full day of being down it was realized that the copy of the CD that was made, did not copy all files and was missing some .dll files. Made a new copy, reinstalled, recovered data, tested. server was running again by 10am Friday. I sent emp #2 home. I had an overview meeting with upper management and explained I would document all the issues that happened and what I would do to make sure it did not happen again. I left at 1pm.

Result:

The GOOD NEWS, the backup and recovery process is working as planned. Other departments tested their disaster recovery plans in case backups did not work, they found flaws in their processes and resolved them. All the areas had time to cleanup their areas. Things not aware of, the ERP vendor stated that the application has to be installed as it looks at machine id’s and MAC addresses when installing. The BAD NEWS, the company was down in ALL forms with a minimum of $1M loss from being down. Upgraded procedures were put in place that involved having at least two people aware of the upgrade and the steps being done, along with the vendors validating the steps before actually being done.