Scaling and Scheduling Services
By the end of this exercise, you should be able to:
- Define the desired number of containers running as part of a service via the
deployblock in a docker compose file - Schedule services in replicated mode to ensure exactly one replica runs on every node in your swarm
Scaling up a Service
If we've written our services to be stateless, we might hope for linear performance scaling in the number of replicas of that service. For example, our worker service requests a random number from the rng service and hands it off to the hasher service; the faster we make those requests, the higher our throughput of dockercoins should be, as long as there are no other confounding bottlenecks.
Modify the
workerservice definition instack.ymlto set the number of replicas to create using thereplicaskey:worker: image: training/dc_worker:1.0 networks: - dockercoins deploy: endpoint_mode: dnsrr replicas: 2Update your app by running the same command you used to launch it in the first place, and check to see when your new worker replica is up and running:
PS: node-0 orchestration-workshop-net> docker stack deploy -c 'stack.yml' dc PS: node-0 orchestration-workshop-net> docker service ps dc_workerOnce both replicas of the
workerservice are live, check the web frontend; you should see about double the number of hashes per second, as expected.Scale up even more by changing the
workerreplicas to 10. A small improvement should be visible, but certainly not an additional factor of 5. Something else is bottlenecking dockercoins.
Scheduling Services
Something other than worker is bottlenecking dockercoins's performance; the first place to look is in the services that worker directly interacts with.
First, we need to expose ports for the
rngandhasherservices on their hosts, so we can probe their latency. Update their definitions instack.ymlwith aportskey:rng: image: training/dc_rng:1.0 networks: - dockercoins deploy: endpoint_mode: dnsrr ports: - target: 80 published: 8001 mode: host hasher: image: training/dc_hasher:1.0 networks: - dockercoins deploy: endpoint_mode: dnsrr ports: - target: 80 published: 8002 mode: hostUpdate the services by redeploying the stack file:
PS: node-0 orchestration-workshop-net> docker stack deploy -c 'stack.yml' dcIf this is successful, a
docker stack ps dcshould showrngandhasherexposed on the appropriate ports.Check your Dockercoins web frontend. You may find that your mining speed has dropped to zero! When you reconfigured and rescheduled your
rngandhasherservices, their containers may have received new IPs. Ifworkerdoesn't re-check what IPs the DNS entriesrngandhasherare meant to resolve to, it can end up trying to send traffic to dead containers after such a reschedule. Long term, we should update our application logic to be smart about re-polling IPs, but for now we can force a re-poll by scaling our worker service down and up again:PS: node-0 orchestration-workshop-net> docker service scale dc_worker=0 PS: node-0 orchestration-workshop-net> docker service scale dc_worker=10Double check your web frontend to make sure your mining speed is what it was before you rescheduled
rngandhasher.With
rngandhasherexposed, we can usehttpingto probe their latency; in both cases,<public IP>is the public IP of the nodes exposingrngon 8001 andhasheron 8002, respectively:PS: node-0 orchestration-workshop-net> httping -c 5 <public IP>:8001 PS: node-0 orchestration-workshop-net> httping -c 5 <public IP>:8002rngis much slower to respond, suggesting that it might be the bottleneck. If this random number generator is based on an entropy collector (random voltage microfluctuations in the machine's power supply, for example), it won't be able to generate random numbers beyond a physically limited rate; we need more machines collecting more entropy in order to scale this up. This is a case where it makes sense to run exactly one copy of this service per machine, viaglobalscheduling (as opposed to potentially many copies on one machine, or whatever the scheduler decides as in the defaultreplicatedscheduling).Modify the definition of our
rngservice instack.ymlto be globally scheduled:rng: image: training/dc_rng:1.0 networks: - dockercoins deploy: endpoint_mode: dnsrr mode: global ports: - target: 80 published: 8001 mode: hostScheduling can't be changed on the fly, so we need to stop our app and restart it:
PS: node-0 orchestration-workshop-net> docker stack rm dc PS: node-0 orchestration-workshop-net> docker stack deploy -c='stack.yml' dcCheck the web frontend again (note it may be on a different node); you may still not see much improvement in overall performance, depending on how worker traffic is getting distributed across random number generators. Try scaling your worker service down to one replica and then back up to ten, and you should finally see the factor of 10 improvement in performance versus a single worker container, from 3-4 coins per second to around 35.
Conclusion
In this exercise, you explored the performance gains a distributed application can enjoy by scaling a key service up to have more replicas, and by correctly scheduling a service that needs to be replicated across different hosts. Also, bear in mind the behavior of DNSRR service name resolution; it's up to your application logic to periodically check the list of IPs being returned by the DNS lookup, and rebalance traffic across new instances as those services scale up (or down). Alternatively, rescaling the service originating the request can cause that service's replicas to rebalance their requests across the new replicas of the destination service.