Comment 33 for bug 513848

Revision history for this message
Chase Douglas (chasedouglas) wrote :

@tgabi:

I found that the calc_load_tasks counter is updated in two areas: once every 5 seconds before the load avg is calculated, and every time a cpu enters the idle task. The latter occurs very frequently if the system isn't loaded very much, which seems to be the case for your server. Every time the cpu enters the idle task it is almost always decrementing the count by at least one because it is going from executing a running task to having no running task available for scheduling (unless the task goes to the uninterruptible state, but that doesn't occur very often). Thus, the counter is almost always incremented only when the load avg is calculated once every 5 seconds.

So every 5 seconds each cpu updates the global calc_load_tasks counter. However, the load avg calculation is not calculated until 10 ticks later. I'm guessing the reason for the delay in the calculation is that each cpu has its own APIC timer which is used for scheduling ticks. If the calculation were done at the same time the calc_load_tasks counter was updated on one of the cpus, the other cpus may not have updated the global counter yet. The 10 tick delay ensures that all the processors have had a chance to update the global counter before the new load avg is calculated.

Here's where I think things are going "wrong" for you: the 5 second interval expires and the cpus update the global counter, likely incrementing the counter to the number of tasks currently runnable. Between now and the time the load avg is calculated 10 ticks later each running task sleeps and all the processors pass through the idle task at least once. The calc_load_tasks counter is decremented each time until it hits 0. Now the 10 ticks expire and we calculate the load avg, but the value of calc_load_tasks is 0.

This is most likely to occur in a situation where you have only a few tasks that run, but they sleep often. I'm going to build a new kernel with some extra print outs that will tell us what the calc_load_tasks value is between the 5 second mark and 10 ticks later.