yoy.be Idea: assembly that flags when to release virtual registers

yoy.be "Why-o-Why"

Idea: assembly that flags when to release virtual registers

2017-05-31 23:52 asmvrr [permalink]

I just had a fragment of an idea. I want to write it down, just to let it go for now as I've got other things to do, and to be sure I can pick it up later exactly where I left off.

First situating what it's about: I have been reading up on WebAssembly, and to my surprise the intermediate representation is stack based (just like Java's JVM and .Net's CIL). I'm not sure why because it feels to me this makes registry assigning when constructing the effective platform-dependent instructions harder, but I may be wrong. Finding out objectively is a project on it's own, but sits on the pile 'lots of work, little gain'.

I also went through the great set of MIT 6.004 lectures by Chris Terman which really gives you a good view of 'the other side' of real assembly since it's actually born out of designing these processing units built out of silicon circuits. It prompted me to make this play thing, but again pushing that through with a real binary encoding of the instructions made it a 'lots of work, little gain' project, and I really don't have access to any kind of community that routinely handles circuit design, so it stalled there.

Before that, I read something about hyper-threading, and what it's really about. It turns out modern CPU cores actually handle two streams of incoming instructions, have a set of instruction decoding logic for each stream (and perhaps branch prediction), but share a lot of the other stuff, like the L1 cache, and especially a set of virtual registers that the logical registers the instruction stream thinks it's using is mapped on to. Mapping used registers freely over physical slots makes sense when you're making two (or more?) streams of instructions work, but it's important to know when the value in the register is no longer needed. Also if the register is only needed for just a few instructions, pipelining comes in to play and could speed up processing a great deal. But for now the CPU has to guess about all this.

When playing around with a virtual machine of my own, I instinctively made the stack grow up, since you request just another block of memory, plenty of those, and start filling it from index 0. It shows I haven't really done much effective assembler myself, as most systems have the stack grow down. What's everybody seems to have forgotten is that this is an ugly trick from old days, where you would have (very!) limited memory and use (end of) the same block for the stack, and with more work going on the stack could potentially grow into your data, or even worse your code, producing garbled output or even crashing the system. (Pac-man kill screen comes to mind, although that's technically a range overflow.) Modern systems still have stack growing down, but virtually allocate a bit of the address-space at the start of that stack-data-block to invalid memory, so stack-overflows cause a hardware exception and have the system intervene. It's a great trick for operating system (and compilers alike) to have checks and balances happen at zero cost to performance.

The consensus nowadays is that nobody writes assembler any more. It's important to know about it, it's important to have access to it, but there is so much of it, it's best left to compilers to write it for you. In the best case it may find optimizations for you you didn't even think about yourself. But this works both ways. Someone writes the compiler(s), and need to teach it about all the possible optimizations. I can imagine the CPU's instruction set manual comes in handy, but that's written by someone also, right? I hope these people talk to eachother. Somewhere. Someday. But I guess they do as with x86-64 they've kind of agreed on a single ABI... and they've also added some registers. Knowing about the virtual registers allocation going on behind the scenes, it could be that that was just raising an arbitrarily imposed limit.

So this is where I noticed a gap. When performing all kinds of optimizations and static analysis on the code when compiling, and especially with register allocation, it's already known when a register's value is no longer relevant to future instructions. What if the compiler could encode this into the instruction bits? If I were ever to pick up where I left, and have a try at a binary encoding for a hypothetical processing core, the instruction set would have bits flagging when the data in registers becomes obsolete. Since this would be a new instruction set, and I guess it's more common to need the value in a register only once, I might make it the default that a value in a register becomes obsolete by default, and you'd use a suffix in assembler to denote you want to use the value for something extra later as well.

→