Happy Halloween 🎃 Tonight I am writing about connecting to a computer cluster for the first time. Whether this is for work or for academia (like me!), the steps should be relatively similar for any simple, bare-metal computer cluster and hopefully you can adapt this for your use case.
Some Background
A node is a single computer doing some stuff. A computer cluster is multiple nodes doing some stuff together. Even if each individual node has limited CPU resources, together the cluster might be a high-performance cluster. This is great for users who want to run tasks that require more powerful machines than your standard home PC but at a low cost. To manage the cluster and perform tasks on the cluster, workload managers like Slurm are used. As a side, there are also a lot of cool tools out there to manage and orchestrate containerised applications on clusters (like kubernetes) but we're not going to go down that rabbit hole tonight.🐰
The rest of this article assumes that you aren't setting up the cluster but rather that you are a (privileged) beneficiary of someone else's hard work 😋 On another day, I might write about setting up a cluster for the first time with slurm and you're welcome to check out a basic intro to setting up a minikube cluster with kubernetes over here.
First Time - Connect to the Head Node
The first time you find out there's a cluster you can connect to, you'll need a few things:
- The node IP/host name of the "head node". This is your entrypoint into the cluster.
- A username and temporary password generated by the cluster admin.
- A basic understanding of SSH
- SSH into the node using the username and IP address provided by your administrator:
# With host name ssh YOUR_USER_NAME@HOST_NAME # With ip ssh YOUR_USER_NAME@IP_ADDRESS
- Continue connecting by typing 'yes'
- Enter your temporary password
- You should now be in the node
- Proceed to changing your password
Change Your Password
Another password to remember 😜 But better to change that temp password than to not.
- To change your password on the node, run
passwd
- You will be prompted to enter your current "LDAP" password
- Once entered, you will be prompted to enter a new password and confirm that password
- Test that you can ssh into the node with your new password
SSH Without Password
- Ensure you have an SSH key pair generated on your PC
- Use
ssh-copy-id
to copy your public SSH key to the node:ssh-copy-id -i ~/.ssh/id_rsa.pub YOUR_USER_NAME@HOST_NAME
- You might be prompted for your password once more but thereafter, you should be able to SSH without being prompted for your password.
SSH between Nodes
Once you're in the head node, you should be able to SSH to other nodes in the cluster. Get a list of node host names/IP addresses from your administrator. These will likely be private in which case you will SSH from the head node into a sibling node. At this point, the nice thing is your user name will probably be shared between the nodes, so you can simply SSH with the host name alone. For example:
ssh SIBLING_HOST_NAME
Once again, you will be prompted for a password. Later, running jobs on different nodes using something like MPIMPI
will require the connection to be passwordless. To do this, check that you have an SSH key pair on the head node and that the same public key is in the ~/.ssh/authorized_keys
file. If not, make sure to add it. Once that is added, you should now be able to SSH between nodes without a password.
Conclusion
Hopefully, this has given you a brief intro into connecting to a cluster. Next up, I will be working through scheduling some jobs on a cluster.