Thursday 29 April 2010

It's all in the name





I like names, they have a funny way of telling a story in a single word. Take my name for instance, I'm sure you read the word 'Dan' and assume that I'm a strong, intelligent, handsome guy who's no end of fun to hang out with, right? Or maybe it's the other way round, and the person defines your perception of a name. I mean, if all people named Dan are super cool, does that make the name Dan super cool?
Anyway, what I really want to talk about here is how Windows HPC handles name resolution, particularly across private and application networks.
First off let's think a little about general Windows host name resolution order. This looks like this, listed in order:

1. Checks it's own name.
2. Looks in the Local DNS cache (you can list entries using ipconfig /displaydns).
3. Local HOSTS file (C:\Windows\System32\Drivers\etc\Hosts)
4. Adds the Search Suffix configured on the machine (if not FQDN), and query DNS
5. WINS (NetBios name resolution)
6. Broadcast on local subnet
7. Local LMHOSTS file (C:\Windows\System32\Drivers\etc\Hosts)

Pretty thorough I'm sure you'll agree.
Now let's look at this with our high performance and management hats on. We want to be able to get an  answer to our name resolution queries as quickly as possible, while ensuring consistency across all nodes in the cluster. We also need to resolve a host name to the appropriate network address for the cluster network we're after. Running through the resolution order above we can ignore 1. for obvious reasons. 2. may be interesting performance wise,  but as the cache may not contain records for all nodes, it's an inconsistent choice. 3. hmmm, the good old local hosts file, sounds kinda antiquated and simplistic don't you think? But it's at number 3 on the list, crucially checked before DNS resolution. And maybe it can be managed by one of the HPC services running on all nodes? Oh, this is starting to sound decent. Just to be sure though let's continue. 4. is our old friend DNS, which sounds like the way to go. But each DNS lookup can take a relatively long time. Seems like it'd be good for management, but as good as a cluster managed solution? Once we get to 5, 6 and 7 things are drifting off into desperation, so let's not say too much about those guys.

Well what do you know, Hosts seems to be a very good choice here, and lo and behold that's how it works in practice! Check out this example hosts file taken from a handy dev cluster... 

# Copyright (c) 1993-1999 Microsoft Corp.
# This host file is maintained by the Compute Cluster Configuration
# Management Service. Changes made to the file that match the netbios names
# for existing nodes in the cluster will be removed and replaced by entries
# calculated by the management service.
# Modify the following line to set the property to 'false' to disable this
# behavior. This will prevent the management service from making any
# further modifications to the file
# ManageFile = true

127.0.0.1                localhost
192.168.5.23             HPCDEV-HN02                    #HPC
192.168.100.11           HPCDEV-HN02                    #HPC
192.168.0.11             HPCDEV-CN001                   #HPC
192.168.0.10             HPCDEV-CN002                   #HPC
192.168.0.134            HPCDEV-CN003                   #HPC
192.168.0.1              HPCDEV-HN01                    #HPC
192.168.0.2              HPCDEV-HN02                    #HPC
192.168.0.3              HPCDEV-VHN01                   #HPC
192.168.1.1              HPCDEV-HN01                    #HPC

192.168.1.2              HPCDEV-HN02                    #HPC
192.168.1.3              HPCDEV-VHN01                   #HPC
192.168.5.22             Enterprise.HPCDEV-HN01         #HPC
192.168.5.23             Enterprise.HPCDEV-HN02         #HPC

192.168.5.28             Enterprise.HPCDEV-VHN01        #HPC
192.168.0.11             Private.HPCDEV-CN001           #HPC
192.168.0.10             Private.HPCDEV-CN002           #HPC
192.168.0.134            Private.HPCDEV-CN003           #HPC
192.168.0.1              Private.HPCDEV-HN01            #HPC
192.168.0.2              Private.HPCDEV-HN02            #HPC
192.168.0.3              Private.HPCDEV-VHN01           #HPC
192.168.1.11             Application.HPCDEV-CN001       #HPC
192.168.1.10             Application.HPCDEV-CN002       #HPC
192.168.1.134            Application.HPCDEV-CN003       #HPC
192.168.1.1              Application.HPCDEV-HN01        #HPC
192.168.1.2              Application.HPCDEV-HN02        #HPC
192.168.1.3              Application.HPCDEV-VHN01       #HPC






This file reflects the current addressing of a HA head node cluster configuration which has three compute nodes. Network topology is 3. Compute nodes isolated on private and application networks. Interesting to note that only the active head node addresses are detailed in the standard format listing (HPCDEV-HN02) for networks other than private and application (in this case Failover cluster heartbeat network and enterprise).
Check out those funky Enterprise. Private. and Application. entries. This allows the cluster service to be very specific in its address resolution requests, assuring it will always get back the address on an appropriate network.

But what if you want to host non HPC Server managed machines on your private network, therefore requiring hosts to register in DNS (they do not by default)? Well, you can use the awesomeness that is powershell...
Set-HpcNetwork -PrivateDnsRegistrationType WithConnectionDnsSuffix

One thing to beware of - check out the warning at the top of the hosts file. If you set  Managefile = False and manually alter entries previously managed by HPC Server things may get a little broken.

Monday 26 April 2010

The business is (nearly) always right

I don't know about you, but I enjoy kicking the tyres of new products in my sphere. The only problem is that it's often quite difficult to find time to dedicate to the simple pleasure of learning something new, as real work, (and of course family) tend to, um, get in the way! On balance I can see that employers are quite justified in their time demands, and naturally the missus and son are top of the list :) so how to find that perfect balance and find time to follow up interesting developments?
Well, I have a few manoeuvres here:
1. Report report report. Managers like to know what's going on, so show them the money. Graphs, charts and pictures are particularly well received, and can buy you some play time. Speaking of reporting If you've've found some time to look into a new piece of tech, it's good to let the boss know your thoughts on it. Spread the word!
2. Stress the business benefits. Management types don't necessarily know anything about IOPS, or whether more FLOPS is a good or bad thing. What they will understand is that something might increase productivity, or even better save money.
3. Show those guys a good time... I'm talking about the family here of course ;)

Friday 16 April 2010

Digging through the versions

One thing I enjoy about Windows HPC server (and previously CCS) is a good discussion on versions, naming and compatibility. It's a veritable cauldron of confusion!
Here's my quick naming and compatibility matrix for head nodes / compute nodes, hope it helps more than hinders :)

OS version along the top, HPC Pack version down the left.



Windows Server 2003 x64 Std / Ent Edition (+SP2; +R2)
Windows Server 2003 x64 HPC Edition (+SP2; +R2)
Windows Server 2008 Std / Ent Edition x64 (+SP2)
Windows Server 2008 HPC Edition x64 (+SP2)
Windows Server 2008 R2 Std / Ent Edition
Windows Server 2008 R2 HPC Edition BETA
2003 Compute Cluster Pack (+SP1)
Head node
Compute nodes
Headnode
Compute nodes
 Not Supported
 Not Supported
 Not Supported
 Not Supported
2008 HPC Pack (+SP1)
 Not Supported
 Not Supported
Head node
Compute nodes
Head node
Compute nodes
 Not Supported
 Not Supported
2008 HPC Pack R2 BETA
 Not Supported
 Not Supported
Compute nodes only
Compute nodes only
Head node
Compute nodes
Head node
Compute nodes


There's more of this type of thing surrounding SDK and Client component versions, I'll post about that later...

Tuesday 13 April 2010

Don't forget the basics

I have a few rules which I try to abide by in my working life, one of which is 'Keep it simple, stupid!'. There is a great deal to be said for cutting out overcomplicated and unnecessary configuration, particularly when it comes to troubleshooting an existing system.
Don't get me wrong, I enjoy the cut and thrust of deep technical development and configuration as much as the next (slightly geeky) guy, but experience has shown that the business will thank you for a well designed, simple yet efficient solution, and that there are diminishing returns but increased risks as complexity increases.
This isn't to say that some services don't need in depth configuration, more that it's important to know when to stop.

Friday 9 April 2010

Windows HPC Server 2008 R2 Beta 2

It's been out for a short while, but the HPC team blog has just announced availability of Windows HPC Server 2008 R2 Beta 2. I've been running it on a test rig and first impressions are extremely favourable. I'm particularly liking the additional diagnostic features, and inclusion of a node image capture will save a bit of manual  imagex stress. Some of the new scheduling features (particularly enhanced activation filter) are looking really nice too!
One other thumbs up for R2 is the ability to use a remote SQL Server instance. There are quite a few technical and management reasons why this is a great addition, and the financial benefit when running HA headnodes in an environment where a SQL cluster is already in place is an obvious plus point.

Try to get to SuperComputing.

I was fortunate enough to attend the SuperComputing conference in Portland, OR last year, and can honestly say it was a blast. I've been to several similar large events in my time, and SC09 was right up there amongst the best. I'm not sure if it was the location (I loved Portland, what an awesome city!), the people (hung out with some fun individuals), or the conference itself, but it was fantastic all round.
I'll be hoping to get out to New Orleans for SC10, maybe I'll see you there...

Wednesday 7 April 2010

Do you really need it?

The world of HPC is studded with cutting edge technology, all of it at a price. The question is do you need it?
This sounds like a simple issue, but in reality it's often quite a challenge to determine whether something will provide increased performance for your (or your users) jobs.
To cut a long story short, the most useful thing you can do to come up with the answer is to determine the requirements of your problem space. Are your tasks particularly reliant on disk IO? Do you have a sensitivity to latency? Maybe your cluster is used to run lots of single processor jobs and does not require high MPI performance at all. Once you know this it's easier to cut through the admittedly shiny technology and focus on what would be of most benefit to your environment.