One of the big things I had to get done by mid-September was a data migration project that involved working to decommission an EMC CX4-960 array after moving all of its data to a VNX 7500 system. This took many months for several reasons:
- At the beginning, we wanted to do a full set of testing of different VNX drive configurations since the VNX gave us more options than were available on the old CX4 (and we wanted to be quite certain of performance before committing).
- It took some time to move the data as non-intrusively as possible. There was only one time that several production server reboots were necessary (for upgrading HBA drivers and multipathing software) before those servers could connect to the VNX.
- A whole lotta little daily projects and firefights that came up in the meantime while this was going on.
Fortunately I work with a great bunch of people who were involved in the data migration and once a date was set for the CX4 to be powered down, we worked together to make it happen. And it was a great feeling to turn that sucker off… well, after the lurking, incriminating doubts subsided (“Wait, wait, am I absolutely SURE there’s nothing else connected to the array?!?”). I learned a lot about array-based migration tools for EMC arrays like SAN Copy and MirrorView, and especially about how to automate them via NaviCLI scripting! This will come in handy in the near future as I’m currently working on the second part of this VNX migration project; we have a second older CX4 that will be migrated to a second new VNX 7500. Fun times!
—-
The other huge project in September involved removing two big UPS systems from our computer room and replacing them with newer ones. It started as a meeting with the electricians, the UPS company rep, and several of us from the company. Since this outage was going to take several days, it was decided early on to happen over a weekend to minimize impact and we scheduled it a few months out. Because of how we have our computer room powered, we were going to lose power to half of the hardware in the computer room twice… once while the power feed was cut over to our generator, then a second time to bring power back over to the UPS lines. For my group, we were responsible for identifying the systems that would have to be gracefully powered down before their power was cut. Wow, that took some time because we haven’t been the best at, er, keeping internal documentation up-to-date and the power sockets all labeled (but it’s much better now! :D).
Once the hardware was identified, all the other groups here at work that used those hardware systems had to go over the list and decide the order that the systems were to be brought down. To date, this was probably our biggest internal change that affected the most number of people in our office (save the times we’ve physically moved from one office space to another). Our Change Management group took on this part of the project and did a great job organizing the sequence of events and all the inter-dependencies. It took MANY meetings and much explanation and clarification along the way to be sure everyone was fully aware of and understood the extent of this outage.
All of that organization was worth it. Starting Friday morning, non-essential systems were taken offline. This progressed through the day with orderly shutdowns of the affected components, then all power circuits for the UPSs were shut off by 7pm. The electricians moved these circuits over to generator power and we got everything started up again in reverse order. Of course there were a few minor hiccups, but it did go very smoothly. One thing that kept going through my mind, over and over… “No surprises!”, meaning that if I missed something major early in in the project, like misidentifying what hardware was on which circuit and then having an UNplanned outage… well, I was breathing a lot easier by this point.
By Sunday morning the electricians and UPS engineers were ready for us to cut over our power from generator to the new UPS power. They were about eight or so hours ahead of schedule so people got called in early and the whole shutdown/startup process happened again. Again, probably a few hiccups, but it seemed to go swimmingly.
It was quite a feeling of accomplishment being involved with so many people working together in such coordination to make a huge project come together. There were some who put in a lot more hours than than me on this and I hope they got the credit they deserved. I feel that sense of accomplishment every time we have worked on things like this in the past… but this one felt an order of magnitude larger. I’m certain there will be more like it on the horizon, but it’s sure nice that they don’t happen very frequently!