

Therefore, optimal distribution of load focuses on optimal resource utilization and protecting a single server from overloading. The video upload stream is routed via a different path-perhaps to a link that is currently underutilized-to maximize the throughput at the expense of latency.īut on the local level, inside a given datacenter, we often assume that all machines within the building are equally distant to the user and connected to the same network.

The search request is sent to the nearest available datacenter-as measured in round-trip time (RTT)-because we want to minimize the latency on the request.

The differing needs of the two requests play a role in how we determine the optimal distribution for each request at the global level: On the other hand, users expect video uploads to take a non-negligible amount of time, but also want such requests to succeed the first time, so the most important variable for the video upload is throughput. Users want to get their query results quickly, so the most important variable for the search request is latency. Let’s start by reviewing two common traffic scenarios: a basic search request and a video upload request. The nature of the traffic we’re dealing with The technical level at which we evaluate the problem (hardware versus software) The hierarchical level at which we evaluate the problem (global versus local) But what does "optimal" mean in this context? There’s actually no single answer, because the optimal solution depends heavily on a variety of factors: Ideally, traffic is distributed across multiple network links, datacenters, and machines in an "optimal" fashion. Traffic load balancing is how we decide which of the many, many machines in our datacenters will serve a particular request. In reality, Google has thousands of machines and even more users, many of whom issue multiple requests at a time. Even in an ideal world, relying on an infrastructure with a single point of failure is a bad idea. For example, the speed of light is a limiting factor on the communication speeds for fiber optic cable, which creates an upper bound on how quickly we can serve data based upon the distance it has to travel. Even this configuration would still be limited by the physical constraints associated with our networking infrastructure. Would that configuration be sufficient to meet Google’s needs? No. Power Isn’t the Answerįor the sake of argument, let’s assume we have an unbelievably powerful machine and a network that never fails. The following chapter zooms in to explore how we implement load balancing inside a datacenter. This chapter focuses on high-level load balancing-how we balance user traffic between datacenters. But even if we did have a supercomputer that was somehow able to handle all these requests (imagine the network connectivity such a configuration would require!), we still wouldn’t employ a strategy that relied upon a single point of failure when you’re dealing with large-scale systems, putting all your eggs in one basket is a recipe for disaster. We serve many millions of requests every second and, as you may have already guessed, we use more than a single computer to handle this demand.
