Monday, December 1, 2014

Upgrading Dell PowerEdge R710 firmware without an OS installed (how hard could it be?!)

I've been automating the firmware update process for the Dell PowerEdge R710-series of servers. The intent of this automation is to ensure that all servers in the data centre have the exact same firmware levels, and to ensure that the automated installation of VMware ESXi on the servers will successfully complete without human intervention.

Before automating this process, I first had to understand how the manual Dell firmware update process was performed. I was disappointed to find that the firmware update process for Dell servers was poorly documented, not reliably reproducible (the anathema of scripting and process automation) and simply downright buggy.

The process was not as straightforward as I thought it would be: how hard could it be to update the firmware of a commodity Dell server? Well, it turns out that many Dell R710s ship with an expired Lifecycle Manager certificate, which prevents the application of Dell updates signed after a certain date! The process involved:

1) Updating the iDRAC firmware
2) Updating the expired Lifecycle Manager certificate using a Lifecycle Manager Repair Package
3) Updating other firmware within the server

There are bugs in the installation of Dell Update Packages (DUPs). If at first the DUP doesn't apply, just try again! I've pointed out where this occurs to help you script around it. It's fairly disappointing from Dell: after eleven generations of servers, Dell still haven't figured out how to streamline the firmware update process. Oh well. If Nutanix eats your lunch, don't act surprised.

To proceed, you'll need to have made an update repository using the Dell Repository Manager.

Step 1. Download the latest iDRAC6 firmware

If you go to the iDRAC6 page on the Dell TechCenter, you'll have a choice between downloading a monolithic or blade version of iDRAC. Because you are upgrading firmware on an R710 (rackmount), you'll want the monolothic version. Monolithic is Dell's term for standalone server as opposed to blade server.

The latest version of the Dell iDRAC 6 is v1.98 and the filename is firmimg.d6. You can download it here.

Step 2. Download the Lifecycle Manager Repair Package (only for Dell R710)

If you have a Dell PowerEdge R710, the certificates used by the Dell Lifecycle Manager have expired. Lifecycle Manager is a component on Dell servers that manages the application of firmware updates to the BIOS, motherboard, network adapters, et cetera. If you try to apply any updates without applying the Lifecycle Manager Repair Package, you'll get the error message "The updates you are trying to apply are not Dell-authorized updates."

The latest version of the Dell Repair Package is V 1.5.5, A0 and the filename is BDF_1.5.5_BIN-12.usc. You can download it here.

Step 3. Update the iDRAC firmware

The iDRAC firmware needs to be updated to at least 1.97 so the Lifecycle Manager Repair Package can be applied. Updating iDRAC firmware can be done remotely or via the console (if you feel like freezing to death in your data centre/server closet/broom closet).

Step 3.1. Log into the iDRAC

If you don't know the password for your Dell iDRAC, try the default password combination.
Username: root
Password: calvin
I'm not sure who Calvin at Dell is. I might check on LinkedIn later when I'm waiting 40 minutes for a firmware update to complete.

Step 3.2. In the iDRAC, click iDRAC Settings (in the left menu bar)

On this page, verify the iDRAC firmware version.

Step 3.3. Click on the Update tab

For the record, Google Chrome on Mac works for uploading files.

Step 3.4. Select the iDRAC update package.

Click Choose File, and select the iDRAC update package downloaded in step 1.
The latest iDRAC 6 update package should be called firmimg.d6

Step 3.5. Confirm the old and new version, then click Next

Verify that the New Version is newer than the Current Version, then click Next.

Step 3.6. Wait for the iDRAC Firmware Image to be updated

This typically takes less than 5 minutes. After the iDRAC firmware is updated, the iDRAC will restart and may become unresponsive for a minute. You will need to login again.

Step 3.7. Verify the new iDRAC version has been installed

Once the firmware update is complete, log into the iDRAC again and verify that the existing iDRAC version matches the new version.

Step 4. Repair the Lifecycle Manager (for R710 only)

Updating the Lifecycle Manager will allow you to apply firmware updates to the rest of the system. You must have an iDRAC firmware version of at least 1.97 to continue.

Step 4.1. Upload the Lifecycle Repair Package

In the iDRAC interface, go to the Firmware Update screen and upload the Lifecycle Repair package. The filename should be BDF_1.5.5_BIN-12.usc.

Step 4.2. Confirm the package name

The package name should be System Services Recovery Image. Click Next to continue.

Step 4.3. Confirm upload

Click OK to proceed with the update.

Step 4.4. Wait for the Lifecycle Manager to update

It is common for the update to be stuck at 10% for approximately 3-4 minutes.

Step 4.5. If the upload fails, restart the iDRAC.

It is common for the update to fail. If this is the case, try applying the update multiple times. It is not uncommon for the update to take 3-4 attempts. If applying the update still fails, restart the iDRAC and try again. The link to restart the iDRAC is in the Quick Links section on the System Summary page.

Step 4.6. Complete update

When the update is complete, leave the iDRAC open. You may need to use it.

Step 5. Update the remainder of the server firmware

Lifecycle Controller allows you to update the other firmware in the server. This includes
  • Diagnostic utilities
  • Dell Lifecycle Controller
  • BIOS
  • PERC 6/i Integrated (Embedded)
  • Broadcom NetXtreme II Gigabit Ethernet (Embedded)
Here's an image of the typical firmware components that can be upgraded on a Dell server.

Step 5.1. Boot the server to the Unified Server Configurator

When the server is booting, press F10 to boot to the Unified Server Configurator. Dell also labels this as System Services.
If you have pressed F10 in time, you will see the message Entering System Services. To cancel, enter the IDRAC6 Configuration Utility
You can skip the memory test by pressing Esc.

Step 5.2. Wait for Unified Server Configurator to start

This can take several minutes.

Step 5.3. Start the Platform Update

You see the message reading "Warning: A system update is recommended since some components are potentially out of date. Please go to Platform Update to view and run availabile updates."? It's useless. It always appears due to a bug in the way Dell compares version numbers for the PERC 6/i.
At the Unified Server Configurator screen, click Platform Update.

Step 5.4. Launch the Platform Update

On the Platform Update screen, click Launch Platform Update.

Step 5.5. Select the update repository source

If you have a small number of servers (less than 5), it is easier to update via USB. Updating via FTP server or network share is possible, but introduces complexity: there needs to be appropriate network connectivity and credentials configured.

Step 5.6. Select the source

You need to have a repository file or folder that contains all the Dell updates relevant to your server. Repositories are created using Dell Repository Manager.

Step 5.7. Confirm use of the existing catalog file

This error is normal and will appear for any ISO created by the Dell Repository Manager. Click Yes to continue.

Step 5.8. Wait for the image to be verified

This can take up to 2 minutes. They're not lying.

Step 5.9. Review the list of firmware updates to be applied

When you have reviewed the list of firmware updates being applied, click Apply to begin.

Step 5.10. Wait for all Dell Update Packages (DUP) to be copied and verified

Step 5.11. Wait for the updates to be applied

This can take up to 45 minutes. The elapsed time may freeze: this is normal. During this process, there will be multiple reboots. Do not interrupt the reboots. You may click Esc to cancel the memory test during the reboots to speed the process.

Step 5.12. Wait while the server reboots multiple times

During the reboots, the screen may be blank for several minutes. This is normal.

Step 5.13. Wait to be returned to the Unified Server Configurator screen

Wait to be returned to the Unified Server Configurator screen.

Step 5.14. Verify that all updates have been applied

When all updates have been applied, the server will return to the Unified Server Configurator screen. You can verify that updates have been applied by comparing the Current version with the Available version. These should be the same, with the exception of the PERC 6/i Integrated (Embedded). Due to a bug in the way Dell compares the versions, it will appear as requiring an update (the PERC 6/i reports it version as, while the update package has the version 6.3.3-0002 which it thinks is older). A messaging saying everything is up to date would have been nice, but hey, that'd require a focus on the user experience!

If all the updates have been applied successfully, click the Cancel button.

Step 5.15. Exit the USC

At the Unified Server Configurator screen, click Exit and Reboot to boot the server normally.

Step 5.16 Confirm the exit

Click Yes to exit the USC.

And there you have it: an updated Dell PowerEdge R710 server! Next step: automate it.

Tuesday, March 11, 2014

An irreverent look at VMware's Software-Defined Data Centre (SDDC)

The intent of this blog post is to explain the SDDC in plain language. I get a lot of questions about SDDC so I'll address them here in an irreverent manner and hopefully you'll find it entertaining or educational. Preferably both, but I'll settle for the former.

What the heck is the Software-Defined Data Centre?
The Software-Defined Data Centre is VMware's strategy for delivering data centre services as a set of capabilities implemented in software. In VMware's SDDC vision, compute is delivered with vSphere, networking is delivered with NSX, management is delivered with vCenter and vCloud, and storage with vSAN. The SDDC is distinctly different from competing data centre architectures where network capabilities (such as VLANs, security, load balancing, etc) and storage capabilities (VMDK storage, storage replication, storage availability, etc.) are implemented in hardware.

The goal of the SDDC is to deliver a "fully automated, zero-downtime infrastructure for any application, and any hardware, now and in the future". While it is possible to deliver these goals in hardware (using orchestration and integration), VMware believe that software is a more appropriate mechanism and delivers higher levels of flexibility. And I tend to agree.

The SDDC consists of green and blue boxes.
Can I buy the SDDC?
The SDDC is a state your data centre can achieve, rather than a product. Don't worry, you'll be buying VMware licenses as your data centre matures from an SDDC 1.0 "basic virtualization" state to an SDDC 3.0 "Fully Cloud Ready" state. As you progress through your SDDC journey, you'll be buying licenses to unlock the capabilities your data centre requires (whether it be multi-tenancy, chargeback, self-service). If in doubt, just buy the vCloud Enterprise Suite.

Ignore the word "SAP" on the slide. I did and my life improved.
rom VMware Consulting blog article SDDC + SAP = CapEx/OpEx Savings)
Is the SDDC cheaper?
In most cases, SDDC will reduce and shift spending. Virtualization of servers and network devices can result in incredible reductions in capital and operational spending. For organisations transitioning to an SDDC model, network and storage infrastructure refresh spending will shift to vendors which support SSDC. An example is Nutanix customers who have consolidated their storage and compute spending into "converged infrastructure" spending. Another example is Amazon Web Services (AWS) using SDN to slash a $1b Cisco spend to $11m.

Sorry Cisco.
The other benefit of the SDDC is the increased agility of the IT organisation: people can actually get the infrastructure they need, when they need it. A case could be made that the capability and flexibility of AWS is not feasible to implement in hardware.

Where does cloud fit in with SDN?
The SDDC is one method of achieving cloud. The NIST definition of cloud computing includes on-demand self-service, broad network access, resource pooling, rapid elasticity and measured service. As long as the service you provide has those qualities, you have a cloud (regardless of the underlying technology). In fact, it's entirely possible to implement "as a service" offerings without any virtualization at all (I'd hate to do it though!). VMware believe the easiest way for enterprises to provide a cloud-like service is to pursue an SDDC architecture.
There's more than one way to implement an SDDC architecture.
Let's not talk about the other ways.

Isn't the SDDC just server virtualization?
Server virtualization is one component of the SDDC and it's neat for delivering more virtual servers with less spending and management overhead. But the delivery chain is only as strong as the weakest link: delivering a server in 10 minutes is of no use if it takes two weeks for firewall changes to be applied to make the server active. Provisioning a VM is just one part of delivering usable infrastructure.

So to deliver the network quicker, we virtualize the network?
Yes. This is known as Software-Defined Networking (SDN). Generally speaking, data centre capabilities which exist purely in software are more flexible, simpler, easier to test and can be integrated more seamlessly than hardware-defined solutions. This is also true with networks: many existing network architectures are device-centric and don't easily provide the provisioning flexibility and ease of integration required to implement on-demand cloud services such as rapid spin-up and teardown of networks.

Because software-defined networking solutions aren't constrained by physical network topology and are more programmable, more cloud-style flexible and programmatic approaches to networks are possible. This enables data centres to become less device-centric and more service-centric. VMware's SDN product is called VMware NSX.

But network devices can be orchestrated to provide what I need!
An alternative to SDN is to use an orchestration system to orchestrate VM and network changes (an example could be updating the perimeter firewall when a VM is provisioning/deprovisioning, or the ability to spin-up a new test network). If the orchestration system is implemented well, you'll get the same result as the SDDC: infrastructure services delivered quickly. If it isn't, you'll have a Rube Goldberg frankencloud. I'm not discounting the completeness or capability of physical network devices over SDN, I'm saying that SDN enables organisations to provide network capabilities (such as firewalls, site-to-site VPN, load balancing) in the hypervisor (which is more flexible and cost-effective) rather than the physical network.

A market-leading orchestration platform.
Why should I virtualize storage?
While vSAN has amazing infrastructure benefits (which I'll outline in another blog post), the strategic importance of vSAN is for storage to be managed with same flexibility and integration as compute. Storage today is a pain: storage administrators are either struggling to keep up with providing the amount of storage the data centre needs, and they're struggling to manage it. The presentation of "as a Service" IT models which enable the business to consume IT more easily make this problem worse. Instead of trying to optimise your storage procurement, provisioning and management processes, vSAN allows you to manage them the same way you would your compute capacity. When you run out of storage, simply buy another server.

But storage can be orchestrated today using robust interfaces provided by storage vendors!
Yes, it can. The majority of storage vendors have SDKs you can use to enable integration with orchestration tools or monitoring tools. If you already have this level of integration in your environment, you are already experiencing the benefits of the SDDC. If you are struggling with integration, or find that your home-grown integration doesn't deliver the feature completeness present with out-of-the-box solutions such as vSAN, it may be worth pursuing another strategy. Implementing technologies (like VAAI and VASA) which bring storage closer to compute aren't as easy as they should be. With the amazing capabilities of SANs, it feels strange that configuring array integration requires reading 30 pages guides, deploying vApps, create service accounts, configuring certificates, etc. You don't need to worry about any of this with vSAN, or any hyper-converged infrastructure. It just works seamlessly.

I followed a 32 page guide, submitted two firewall change requests, one storage change,
and one VMware change so that VASA provider would provide
a single concatenated string of disk capabilities. I guess it's a start.

Physical SANs are more fully featured than vSAN.
Horses for courses. Tradeoffs are involved with all data centre architectural decisions. In the majority of cases, choosing vSAN over a traditional physical SAN will be involve a tradeoff between features and seamless integration. Some customers may consider the lack of a deduplication capability in vSAN to be a glaring omission. Other customers are willing to choose vSAN over a physical SAN for the ease of management. I expect that over time, VMware will make vSAN feature-competitive with physical offerings (as they already have with VMware NSX and physical networks).

How will I know when I achieve the SDDC?
The CEO of VMware will personally hand you a key which will unlock over 600 airport lounges worldwide. SDDC is the journey and delivery of IT as a Service is the destination. Just because a data centre uses an SDDC architecture doesn't mean it's any good; it could be atrocious!

There's all the other usual KPIs for measuring success: amount of administrators per VM, current versus historical infrastructure spend, turnaround time on VM/firewall change request, etc. A good barometer of your ability to deliver IT as a Service is the stress level of project managers whose projects require IT infrastructure. In every organisation I've worked in, project managers are acutely aware of the lead times for delivery of IT infrastructure. Buy them a coffee and ask what they think about delivery of IT infrastructure. Another barometer is whether your developers use Amazon Web Services. Buy them a coffee as well, but understand that they'll likely not admit to using AWS!