This explain the implications of HT (Hyper-Threading Technology) to the OS. Following is a summary of enhancements recommended in the OS.
Detection of HT – The OS needs to detect both the logical and processor packages if HT is available for that processor(s).
hlt at idle loop – The IA-32 Intel Architecture has an instruction call hlt (halt) that stops processor execution and normally allows the processor to go into a lower-power mode. On a processor with HT, executing hlt transitions from a multi-task mode to a single-task mode, giving the other logical processor full use of all processor execution resources.
pause instruction at spin-waits – The OS typically uses synchronization primitives, such as spin locks in multiprocessor systems. The pause is equivalent to “rep; nop” for all known Intel architecture prior to Pentium4 or Intel Xeon processors. The instruction in spin-waits can avoid severe penalty generated when a processor is spinning on a synchronization variable at full speed.
Special handling for shared physical resources – MTRRs (Memory Type Range Registers) and the microcode are shared by the logical processors on a processor package. The OS needs to ensure the up-date to those registers is synchronized between the logical processors and it happens just once per processor package, as opposed to once per logical processor, if required by the spec.
Preventing excessive eviction in first-level data cache – Cached data in the first-level data cache are tagged and indexed by virtual addresses. This means two processes running on a different logical processors on a processor package can cause repeated evictions and allocations of cache lines when they are accessing the same virtual address or near in a competing fashion (e.g. user stack).
The original Linux kernel, for example, sets the same value to the initial user stack pointer in every user process. In our enhancement, we offset the stack pointer simply by a multiple of 128 bytes using the mod 64, i.e. ((pid%64) << 7) of the unique process ID to resolve this issue.
Scalability issues – The current Linux, for example, is scalable in most cases, at least up to 8 CPUs. How-ever, enabling HT means doubling the number of processors in the system, thus it can expose scalability issues, or it does not show performance enhancements when HT is enabled.
Linux (2.4.17 or higher) supports HT, and it has all the above changes in it. We developed and identified the essential code (about just 1000 lines code) for those changes (except scalability issues) based on performance measurements, and then improved the code with Linux community.