Customizable Computing. Yu-Ting Chen
3.4 CUSTOMIZED INSTRUCTION SET EXTENSIONS
In a conventional, general-purpose processor design, each time an instruction is executed, it must pass through a number of stages of a processor pipeline. Each of these stages incurs a cost, which is dependent on the type of processor. Figure 3.1 showed the energy consumed in various stages of the processor pipeline. In terms of the core computational requirement of an application, the energy spent in the execute stage is energy spent doing productive compute work, and everything else (i.e., instruction fetch, renaming, instruction window allocation, wakeup and select logic) is overhead required to support and accelerate general-purpose instruction processing for a particular architecture. The reason for execution constituting such a small portion of energy consumed is that for most instructions, each performs a small amount of work.
Extending the instruction set of an otherwise conventional compute core to increase the amount of work done per instruction is one way of improving both performance and energy efficiency for particular tasks. This is accomplished by merging the tasks that would have otherwise been performed by multiple instructions, into a single instruction. This is valuable because this single large instruction still only requires a single pass through the fetch, decode, and commit phases, and thus requires a reduced amount of bookkeeping to be maintained to perform the same task. In addition to reducing the overhead associated with processing an instruction, ISA extensions enable access to custom compute engines to implement these composite operations more efficiently than could be implemented otherwise.
The strategy of instruction set customization ranges from very simple (e.g., [6, 95, 111]) to complex (e.g., [63, 66]). Simplistic but effective instruction set extensions are now common in commodity processors in the form of specialized vector instructions: SSE and AVX instructions. Section 3.4.1 discusses vector instructions, which allow for simple operations, mostly floating point operations, to be packed into a single instruction and operate over a large volume of data, potentially simultaneously. While these vector instructions are restricted to use in regular, compute-dense code, they lend a large enough performance advantage that processor manufacturers are continuing to push toward more feature-rich vector extensions [55].
In addition to vector instructions, there has also been work proposed by both industry [95] and academia [63] that ties multiple operations together into a single compute engine that operates over a single element of data. These custom compute engines are discussed in Section 3.4.2, and differ from vector instructions in that they describe a group of operations over a small set of data, rather than the reverse. Thus, they can be tied more tightly into the critical path of a conventional core [136].
Конец ознакомительного фрагмента.
Текст предоставлен ООО «ЛитРес».
Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.
Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.