Distributed Computing in Java 9
上QQ阅读APP看书,第一时间看更新

Amdahl's law

Amdahl's law is frequently considered in parallel computing to forecast the improvement in process speedup when increasing the use of multiple system processors. Amdahl’s Law is named after the famous computer scientist Gene Amdahl; it was submitted at the American Federation of Information Processing Societies (AFIPS) during the Spring Joint Computer Conference in the year 1967.

The standard formula for Amdahl’s Law is as follows:

where:

  • Slatency is the calculated improvement of the latency (execution) of the complete task.
  • s is the improvement in execution of the part of the task that benefits from the improved system resources.
  • p is the proportion of the execution time that the part benefiting from improved resources actually occupies.

Let's consider an example of a single task that can be further partitioned into four subtasks: each of their execution time percentages are p1 = 0.11, p2 = 0.18, p3 = 0.23, and p4 = 0.48, respectively. Then, it is observed that the first subtask is improved in speed, so s1 = 1. The second subtask is observed to be improved in speed by five times, so s2 = 5. The third subtask is observed to be improved in speed by 20 times, so s3 = 20. Finally, the fourth subtask is improved in speed by 1.6 times, so s4 = 1.6.

By using Amdahl's law, the overall speedup is follows:

Notice how the 20 times and 5 times speedup on the second and third parts, respectively, don't have much effect on the overall speedup when the fourth part (48% of the execution time) is sped up only 1.6 times.

The following formula demonstrates that the theoretical speedup of the entire program execution improves with the increase of the number/capacity of resources in the system and that, regardless with the magnitude of the improvement, the calculated improvement of the entire program is always expected to be limited by that particular task that cannot benefit from the resource improvement.

Consider if a program is expected to need about 20 hours to complete the processing with the help of a single processor. A specific sub task of the entire program that is expected to consume an hour to execute cannot be executed in parallel, while the remaining program of about 19 hours processing (p = 0.95) of the total execution time can be executed in parallel. In such scenarios, regardless of how many additional processors are dedicated to be executed in parallel of such program, the execution time of the program cannot be reduced to anytime less than that minimum 1 hour. Obviously, the expected calculated improvement of the execution speed is limited to, at most, 20 times (calculated as 1/(1 − p) = 20). Hence, parallel computing is applicable only for those processors that have more scope for having the capability of splitting them into subtasks/parallel programs as observed in the diagram below.

However, Amdahl's law is applicable only to scenarios where the program is of a fixed size. In general, on larger problems (larger datasets), more computing resources tend to get used if they are available, and the overall processing time in the parallel part usually improves much faster than the by default serial parts.