Author here for your 8087 questions. I find adders and ALUs interesting because they are key to the performance of a system and every system implements them differently.
Do you know about how many transistors are needed to implement the adder (or the FPU as a whole)? And how it scales with the width of the numbers (16 bit, 32 bit, etc)?
I've been curious about transistor counts for floating point units for a while, but it's hard to find information about them.
No immediate questions, but happy to have some great weekend reading. A quick pass through finds one of the best and clearest explainers I've seen. Thanks for this and all the materials you produce.
How does the clocking work exactly? The circuit is fed A and B and up down up down clock and then the output appears? How does the consumer (circuit) know when to read the result? Is there a "result is ready" flag? How long does the result stay stable? One full clock cycle? So many questions...
The adder is not clocked. You can see from the diagrams that there are no clock inputs. The clock cycles comment is more an expression of the length of time that it takes before all of the carry rippling and whatnot settles down.
In more detail, the microcode engine normally executes one micro-instruction per cycle. For addition, the engine is blocked for one extra cycle to give the result time to percolate through the adder.
There is some complicated timing within a clock cycle with slightly delayed clocks and whatnot, for instance, to precharge the carry lines at the beginning of the operation. The 8087 is mostly synchronous with the clock, but they "cheat" in many places.
It is interesting that over the years people have produced synthesizable RTL HDL for the 8086/8088 and later, with varying degrees of fidelity, but no-one seems to have produced similar for the 8087.
AIUI, the 8087 was essentially at the extreme cutting edge of what was possible to produce with the technology of the time, and even Intel at the time was largely treating it as a likely-to-fail project.
And largely irrelevant if the goal is synthesizing something that can be put onto an FPGA to be a workalike for the 8087. It's almost never about synthesizing the exact original hardware to be fabricated. See CPU86 and Zet, for examples:
Do you have any insights on how power was delivered to these circuits? Maybe it's done in the metal layers that were dissolved? Also, is it correct that there is no on die capacitance surrounding these circuits?
Author here for your 8087 questions. I find adders and ALUs interesting because they are key to the performance of a system and every system implements them differently.
Do you know about how many transistors are needed to implement the adder (or the FPU as a whole)? And how it scales with the width of the numbers (16 bit, 32 bit, etc)?
I've been curious about transistor counts for floating point units for a while, but it's hard to find information about them.
I count approximately 2014 transistors (including pull-ups) for the 69-bit adder. Each block of four bits takes approximately 117 transistors.
Any idea how much adder designs changed on modern CPUs compared to back then? I mean there's only so much you can optimize in those, I think...
Even by the time of the Pentium, they had moved to much more complicated adders like Kogge-Stone. I wrote about it here: https://www.righto.com/2025/01/pentium-carry-lookahead-rever...
No immediate questions, but happy to have some great weekend reading. A quick pass through finds one of the best and clearest explainers I've seen. Thanks for this and all the materials you produce.
> take two clock cycles to complete an addition.
How does the clocking work exactly? The circuit is fed A and B and up down up down clock and then the output appears? How does the consumer (circuit) know when to read the result? Is there a "result is ready" flag? How long does the result stay stable? One full clock cycle? So many questions...
The adder is not clocked. You can see from the diagrams that there are no clock inputs. The clock cycles comment is more an expression of the length of time that it takes before all of the carry rippling and whatnot settles down.
In more detail, the microcode engine normally executes one micro-instruction per cycle. For addition, the engine is blocked for one extra cycle to give the result time to percolate through the adder.
There is some complicated timing within a clock cycle with slightly delayed clocks and whatnot, for instance, to precharge the carry lines at the beginning of the operation. The 8087 is mostly synchronous with the clock, but they "cheat" in many places.
It is interesting that over the years people have produced synthesizable RTL HDL for the 8086/8088 and later, with varying degrees of fidelity, but no-one seems to have produced similar for the 8087.
AIUI, the 8087 was essentially at the extreme cutting edge of what was possible to produce with the technology of the time, and even Intel at the time was largely treating it as a likely-to-fail project.
The ROM used different sized transistors to store two bits per transistor. That's pure analog territory, which most HDLs don't touch.
And largely irrelevant if the goal is synthesizing something that can be put onto an FPGA to be a workalike for the 8087. It's almost never about synthesizing the exact original hardware to be fabricated. See CPU86 and Zet, for examples:
* https://github.com/nsauzede/cpu86/wiki
* https://github.com/marmolejo/zet
/* It's a bummer that there is addition but no vipition. */
I knew a guy who bred snakes but could never really get much out of his adders.
Turns out what he needed to do was saw up some tree trunks to make rough platforms for them, and they bred like crazy.
Adders can multiply really efficiently with log tables.
slow clap
Do you have any insights on how power was delivered to these circuits? Maybe it's done in the metal layers that were dissolved? Also, is it correct that there is no on die capacitance surrounding these circuits?
Thanks for the great article.