Scaling out SharePoint using VMware ESX Server
Now here's the thing; I love SharePoint Portal Server, and I love VMware ESX Server. This combination just allowed me to seriously scale-out my SPS infrastructure without having to entirely rebuild the server farm. However, there were one or two little gotchas along the way, hence this blog!
First, the challenge:
I have a 'small farm' SharePoint environment running with web front end, search, index, and job, components running on one server and SQL 2000 on another. This works fine until the web front end server dies for any reason, or is in need of patching. So, I wanted to upgrade my small farm topology to a medium-large farm design. I did not, however, want to entirely rebuild my SharePoint infrastructure but instead wanted to leverage the benefits of virtualisation technology to take a short-cut.
The objective, therefore, is to extend the topology to two web/search servers plus a dedicated index & job server. The database server would remain untouched.
Please note: this guide presumes that you are an expert administrator for Windows Server 2003, SharePoint Portal Server, SQL Server, and VMware ESX 3. If not, gather the right people and resources because you will need help!
The entire farm is to be hosted on my vmware ESX 3.0.2 host.
Well that's easy, I hear you say, just clone off the current web server and create an network load balanced cluster. Very true; that's exactly what I've done but you do need to watch your step...
- Get the current web server (esx01) into a very clean and stable state. Apply all latest patches and ensure there are no lingering issues in the event logs.
- Fully document your search configuration, including indexes, search scopes, content rules, crawling accounts, content sources, etc. Much of this will be lost I'm afraid. (Don' t blame me, I'm just the messenger!)
- Ensure that you have 'newsid.exe' available on the local disk. (Available from Microsoft SysInternals)
- Shut down this server.
- Take a 'snapshot' of the server using the vmware Virtual Infrastructure client. (This will be our roll-back in the event of catastrophic failure later on.
- Clone the web server to a new virtual machine. (esx02)
- Use the VI client to install a new NIC to both machines. (NLB requires two NICs)
- Power up esx01 and install the new NIC. Ensure that you change the network settings such that it is bound only to TCP/IP, it has a static IP, and has the 'Register with DNS' check box cleared.
- Follow the Windows Server guidelines for configuring an NLB cluster and attach this first host to the new cluster.
- Change the DNS for your SharePoint portal to point to the new cluster IP instead of the old machine IP address.
- Verify that your portal site still works. If not, debug it now since it will only get harder later.
- Use the VI client to disconnect esx02 from the network. (Both NICs)
- Power up the machine and logon as local Administrator.
- Install the new NIC as above but do not attempt to add it to the cluster. (After all, you should not be connected to the network!)
- Run newsid.exe to create a new SID and rename the machine.
- Reboot the server (newsid will do this for you)
- Connect the network adapters via the VI client.
- Grab a coffee and take a breath; you should now have two machines with different names, SID, and two IP addresses each. If not, walk back up the list and find out where you went wrong.
- Before we connect esx02 to the network we also need to change the SharePoint serverID. This is a GUID used only by SharePoint to identify machines on the farm. Why they didn't use the sid or machine name escapes me but this one took a little while to figure out.
- Open SQL Enterprise Manage and locate your configuration database.
- Open the 'Servers' table and note the list of servers of serverIDs.
- On esx02 open regedit (all the usual disclaimers apply) and locate the
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\SharePoint Portal Server hive - Increment the data value of the 'ServerID' parameter. (Check to ensure this value is not listed in your 'servers' table - it's most unlikely. Also, manually creating this guid is pretty safe since it is only created and used by SharePoint.)
- Now it's safe to connect this machine to the network.
- Once connected you will need to add the server to the domain. I usually follow this process although there are other ways to do it.
- Open 'Start My Computer Properties Computer Name Change' and set the domain to 'Workroup' (I usually name it "Local"). Enter domain admin credentials to complete the transaction.
- DO NOT reboot as prompted, instead immediately change the membership back to Domain 'yourdomain'. Then reboot.
- After reboot login as a domain user and verify that the machine is properly joined to the domain and has no errors.
- Next, join it to the NLB cluster you create above and verify proper cluster operations as detailed in the NLB configuration guide.
- Then, join the new SharePoint server to the farm using the 'Start All Programs SharePoint Portal Server SharePoint Central Adminstration'
- Connect the server to an existing configuration database and provide the necessary domain credentials.
- Next, build a fresh SharePoint Portal Server machine (esx03) which will become your Index & Job server.
- Ensure that your default crawling account is added to the local DCOM Users group or else you will not be able to connect to the index server and all crawls will fail.
- Add this machine to the farm and transfer the Index and Job components from esx01
- Your farm should now have esx01 (Web & Search), esx03 (Index & Job), and esx02 (Unused).
- Edit the config to apply web & search to esx02.
- You should now have a fully operational cluster with a load balanced front-end and separate Index & Job server.
- At this point you will need to recreate your search scopes and index rules then re-crawl your content.
- Pop the top off a well earned beverage of your choice and Enjoy!
Scaling out the front-end even further is just a repeat of the middle section of this list.
Now you have a SharePoint infrastructure that never needs to be unavailable during regular maintenance and patching. (SQL server aside but that's another issue) Plus, you will have significantly increased your capacity and performance much to the delight of your users. (Yeah right!)
Labels: performance, scalability, SharePoint Portal Server, VMWare

