Monday 10 May 2010

Know your underlying infrastructure

So, you're looking to deploy and support a Windows HPC Server cluster, but you have a sneaky suspicion that there's more under the covers that you bargained for. What do you know, your instincts are correct, and you're suddenly in a world of Microsoft technologies which you should at least be aware of. 
The good news is (and this is a big advantage) that these underlying technologies are common, and it may be that your company/organisation has experience in those areas, for example in a corporate IT team. If those skills can be leveraged for your Windows HPC deployment, then it's all gravy - you can stick to what you do best & eke out tip top performance, and write some killer submission scrips for your cluster users.
But wait, what if you have no such skills in house? Well, here's a quick rundown of some of the things involved...


Windows Server 2008
The base Operating System. Note that Windows HPC Server can run on various editions of the OS (e.g. HPC Edition, Standard Edition, Enterprise Edition), which should show that the HPC Pack is a separate entity to the OS. License wise it's important to note that you can only run Windows HPC related technologies on 2008 HPC Edition - no chance of saving a few notes by using it as a base OS for your corporate Exchange system ;)
When thinking about the OS, pay particular attention to driver versions and settings. Use the built in reliability, performance & logging tools to your advantage.


Active Directory
There's no getting round the Active Directory thing. It's at the core of everything Windows HPC Server does, from deployment to running jobs to data authorisation. There are several potential options here depending on your environment. If you have a corporate AD I would strongly suggest that you work with your corp IT guys to integrate the cluster. This type of configuration will smooth the wheels of progress significantly. Allowing an existing authentication regime, one in which users already have accounts set up, can save a bunch of user admin overhead. If this is not an option, it's worth spending at least a bit of time pondering your AD architecture. It's nice and easy to simply promote your headnode to a domain controller, but I would suggest that you also run up another, separate Domain Controller, as losing your AD can be a royal PITA to recover from.
Either way, take some time to get this bit right, and to learn the basics of AD operation & you'll likely see payback in future.


SQL Server
I'm planning to dive into SQL Server a bit deeper in another post, but suffice to say it's well worth picking up some knowledge in this area.


Windows Deployment Services
WDS provides the platform for the super slick node deployment mechanism within Windows HPC Server. It's wrapped so nicely that you may never need to poke about under the covers, but definitely pay attention to imagex, diskpart, the \\<headnode>\reminst share and it's contents.


DHCP and DNS
OK so these are not necessarily Windows specific, but getting your DNS and DHCP knowledge down is very useful. I'm going to post about network configuration in another post, so will try to include some DNS / DHCP tips there too.


RRAS
Routing and Remote Access Services is an umbrella for bunch of useful Windows features. These include Dialup; VPN (both client server and site to site); IP subnet routing, and Network Address Translation (NAT). In the case of Windows HPC Server the NAT part is of interest. It plays an integral part in operation of those network topologies which rely on compute nodes only having connection to the private network. In these cases traffic destined for hosts on subnets other then the private network travel via the headnode (as gateway), and NAT out onto the enterprise network. This may be pertinent e.g. for Windows Updates and the like.


Windows Failover Clustering
If you're going for a High Availability head node solution, take some time to become acquainted with how Windows Failover Clustering works. Behaviour and types of shared disks; should you used disk share, quorum, node majority; Failover Cluster network configuration; Cluster resource DNS registration; verification and support of cluster components. This is a big subject in itself, and an awareness of how the technology works is important.

No comments:

Post a Comment