What's new

Binary Translation Using Peephole Superoptimizers

Exophase

Emulator Developer
I stumbled upon this paper today. It's an extended appliaction of "superoptimization" work that the same person did a couple years prior as part of his dissertation.

http://theory.stanford.edu/~sbansal/pubs/osdi08_html/index.html

Personally I'm a little surprised it works as well as it does - not sure if this is a testimony to how powerful the technique is or a statement of inadequecy for the competition. Worth bearing in mind is that Qemu is now using a different translation strategy, so it might perform a bit better than the one listed.

Either way, it shows that getting 50+% native performance in user mode only emulation isn't totally unrealistic. Would love to see how it runs on games, if indeed it can manage any - necessary OS support aside, this would entail much more complicated programs, which brings about more avenues for failure. But if it can work well then imagine if say, x86->ARM performed at a similar level. With user mode Linux emulation and WINE maybe emulating something that runs on Windows needs a Pentium 2 won't be as unrealistic as everyone believes it will be, on the highest end ARM SoCs.

The source code is going to be up soon (or so his website says - maybe he changed his mind since he's starting his own business). I really want to see the translation rules (only 750!) for myself. What I've seen so far does not suggest anything the compiler figured out that is too difficult to optimize towards using propagation techniques. I also think that much faster register allocation can be done without sacrificing a lot in the result. Of course, for something like x86->ARM you can possibly get away with very simplistic register allocation.
 

Cyberman

Moderator
Moderator
I stumbled upon this paper today. It's an extended appliaction of "superoptimization" work that the same person did a couple years prior as part of his dissertation.

http://theory.stanford.edu/~sbansal/pubs/osdi08_html/index.html

Personally I'm a little surprised it works as well as it does - not sure if this is a testimony to how powerful the technique is or a statement of inadequecy for the competition. Worth bearing in mind is that Qemu is now using a different translation strategy, so it might perform a bit better than the one listed.
I use QEMU to run linux on my windows box (LOL) until my linux box is resetup. Works fine, Dosbox and bochs also have reasonable performance.

Either way, it shows that getting 50+% native performance in user mode only emulation isn't totally unrealistic. Would love to see how it runs on games, if indeed it can manage any - necessary OS support aside, this would entail much more complicated programs, which brings about more avenues for failure. But if it can work well then imagine if say, x86->ARM performed at a similar level. With user mode Linux emulation and WINE maybe emulating something that runs on Windows needs a Pentium 2 won't be as unrealistic as everyone believes it will be, on the highest end ARM SoCs.
Beagle board runs the dream cast and QEMU for windows and dos emulation. I am right now just trying to figure out how to runtime change the display resolution, unfortunately X under angstrom is an issue. Having read Xorg mailing list. X has gone down hill in terms of maintenance. I believe the idiot legacy of XF86 people are to blame there. Erstwhile this is the biggest issue I have right now. I had a discussion of the use of dynamic rec via partitioning code and using generic optimizer passes on the prior to native code generated for that. Basically it would end up being larger blocks of dynamic rec code instead of the finer grained approachs. Add to this a bit of HLE and things could get fast.

The source code is going to be up soon (or so his website says - maybe he changed his mind since he's starting his own business). I really want to see the translation rules (only 750!) for myself. What I've seen so far does not suggest anything the compiler figured out that is too difficult to optimize towards using propagation techniques. I also think that much faster register allocation can be done without sacrificing a lot in the result. Of course, for something like x86->ARM you can possibly get away with very simplistic register allocation.
Wine by the way clearly states "Wine Is Not an Emulator" it's an API interface to the underlying OS for win32 apps. You would have to create the emulator to take the PE file and drec it for use with WINE on an ARM system. I suppose that would be HLE? :)

Cyb
 
OP
E

Exophase

Emulator Developer
I use QEMU to run linux on my windows box (LOL) until my linux box is resetup. Works fine, Dosbox and bochs also have reasonable performance.

"Reasonable?" That's a very subjective trait. That anyone finds Bochs to perform acceptably in just about anything is a bit stunning. QEMU might pull 20% native, DOSBox for ARM significantly worse than that right now. Bochs far, far below that. You're speaking as if there's no value in performing better. I didn't say that there aren't emulators for DOS and Windows for ARM, but that right now people are seeing 50+% native speed marks as being ridiculous.

Beagle board runs the dream cast and QEMU for windows and dos emulation.

.. Dreamcast? You mean DOSBox? But again, I said specifically Pentium 2 level performance, and there's no way that either of those could reach that level on a Beagle Board.

I had a discussion of the use of dynamic rec via partitioning code and using generic optimizer passes on the prior to native code generated for that. Basically it would end up being larger blocks of dynamic rec code instead of the finer grained approachs. Add to this a bit of HLE and things could get fast.

It's well and good to say "do a dynarec that also optimizes", but unless you have particular optimizations in mind then it's useless. That is, unless you're doing this crazy brute force technique, but I doubt many others have considered it for translation. Still, this is not an easy thing to write either.

Wine by the way clearly states "Wine Is Not an Emulator" it's an API interface to the underlying OS for win32 apps. You would have to create the emulator to take the PE file and drec it for use with WINE on an ARM system. I suppose that would be HLE? :)

Yes, I'm aware of what WINE does. What I said is to use user mode only recompilation (which, by the way, is a far stretch from being a system emulator) to convert from x86 Linux to ARM Linux. When ran with WINE it would be from x86 Windows to x86 Linux to ARM Linux. It may not be that straightforward, but it's doable - see the Darwine project.
 

Cyberman

Moderator
Moderator
"Reasonable?" That's a very subjective trait. That anyone finds Bochs to perform acceptably in just about anything is a bit stunning. QEMU might pull 20% native, DOSBox for ARM significantly worse than that right now. Bochs far, far below that. You're speaking as if there's no value in performing better. I didn't say that there aren't emulators for DOS and Windows for ARM, but that right now people are seeing 50+% native speed marks as being ridiculous.
You must be trying to run things other than old MSDOS games (which is mostly what I use it for to be honest). Linux runs OK enough for me to run tinyfuge for example which is the most common application I use under linux for chatting.

.. Dreamcast? You mean DOSBox? But again, I said specifically Pentium 2 level performance, and there's no way that either of those could reach that level on a Beagle Board.
Dream cast emulator on the beagle boards. Dos isn't too bad but if you want to run a windows app and windows uhhh you'll get major issues (namely the rev B board only has 128M of ram which makes things nasty). Rev C might be better (256M ram) but it does seem to run DOS scale things reasonably. Again you do not need fast in that case. Are you attempting DX8 games or something?


It's well and good to say "do a dynarec that also optimizes", but unless you have particular optimizations in mind then it's useless. That is, unless you're doing this crazy brute force technique, but I doubt many others have considered it for translation. Still, this is not an easy thing to write either.
If it were easy everyone would be doing it. LOL Nothing is easy. I've seen some rather interesting discussions on that back end stage of compilors. Especially with SDCC (Small Device C compiler) it's interesting how they are able to generically optimize some code. As you said it's not easy.


Yes, I'm aware of what WINE does. What I said is to use user mode only recompilation (which, by the way, is a far stretch from being a system emulator) to convert from x86 Linux to ARM Linux. When ran with WINE it would be from x86 Windows to x86 Linux to ARM Linux. It may not be that straightforward, but it's doable - see the Darwine project.
I'll take a look at it.

I think the biggest changes or most radical I should say take small incremental steps which sounds backwards, however if you wish to retain functionality you have to make each part work correctly. Simply put a working slower system is better than a broken fast one. Which is what I have with my tweaks to PCSX. I at least recognize when "Opps I screwed up" happens instead of wading through a huge pile of code I screwed up. :D I plan to revert to the original 1.5 code base and get that compiling then create patchs for each minor revision to the 1.6 code base I have (which is partially broken). Then add in the tweaks I had made. That should make it work. However this project is going to take several months of documenting some of PCSX. Fun I know.

Back to the subject, I do appreciate you sharing the link by the way.

Cyb
 
Last edited:
OP
E

Exophase

Emulator Developer
You must be trying to run things other than old MSDOS games (which is mostly what I use it for to be honest). Linux runs OK enough for me to run tinyfuge for example which is the most common application I use under linux for chatting.

Dream cast emulator on the beagle boards. Dos isn't too bad but if you want to run a windows app and windows uhhh you'll get major issues (namely the rev B board only has 128M of ram which makes things nasty). Rev C might be better (256M ram) but it does seem to run DOS scale things reasonably. Again you do not need fast in that case. Are you attempting DX8 games or something?

I'm not personally interested in running anything. I can, however, easily speak for many people who have shown interest - and "old" DOS games is pretty subjective. A BeagleBoard running DOSBox in its current incarnation will certainly not be able to run every DOS game at correct speed. Nonetheless, I am specifically referring to running Linux and Windows games, not DOS games. I hope you understand that a lot of time passed between the late 80's/early 90's games that DOSBox may be capable of running well on Beagleboard, and when DX8 games became common. In particular there are a lot of desireable games released in the mid to late 90s that could possibly be ran with user mode emulation and very aggressive translation.

And yes, I'm familiar with the nullDC effort (which doesn't represent anything you can actually use right now) but I don't see how that ties into this topic.

I think the biggest changes or most radical I should say take small incremental steps which sounds backwards, however if you wish to retain functionality you have to make each part work correctly. Simply put a working slower system is better than a broken fast one. Which is what I have with my tweaks to PCSX. I at least recognize when "Opps I screwed up" happens instead of wading through a huge pile of code I screwed up. :D I plan to revert to the original 1.5 code base and get that compiling then create patchs for each minor revision to the 1.6 code base I have (which is partially broken). Then add in the tweaks I had made. That should make it work. However this project is going to take several months of documenting some of PCSX. Fun I know.

I don't really know what you're getting at (or why you're getting at it at least) :/
 

Top