Hello, and thank you for this amazing tool.
I have read through all of the Docs pages, but have not found a lot of information about how I should expect AutoScaling to work. I understand that basically if there are jobs in the queue then the cluster is scaled up, and if nodes are idle they are shut down. I am wondering, is there any documentation about the specifics of the scaling algorithm?
For example here are some behaviors I have noticed that I would like to understand better, and some specific questions:
– My selected compute node is a c4.8xlarge (36 cores). I queue up 70 jobs (using SLURM), and one compute node is created. So, I have 36 running and 34 pending jobs. I would expect that since I have pending jobs, a second node would be created (my max nodes is 32) but there is consistently only one node. Why does this happen?
– When first I queue up my jobs, it takes a relatively long time for the cluster to react and start allocating nodes - I have to wait ~5 minutes for any nodes to be created if I start with a cluster that only has a login node (I’m not counting the time to actually boot up the compute node here). Is there any way to make this delay shorter?
– If I start my cluster with, e.g., 5 initial compute nodes, will they ever be autoscaled down (removed)? If yes, is there a way to force the original nodes to persist? If no, is there a way to force them to autoscale down if they’ve been idle for too long?
– Is there any way to mix and match compute nodes or am I limited to only one type per cluster? E.g. can I have two c4.8xlarge nodes (36 CPUs each) and one c4.large node (2 CPUs) (total 74 CPUs) if I have 74 single-threaded jobs to run?
– Is there a way to scale across multiple regions to overcome the 20-instance limit for spot pricing?
Thank you for the help.