|
Post by sykadelik on Dec 25, 2005 23:04:02 GMT 1
All games I try to run, run fairly slowly, if I turn all the options off I can think of including sound and I put the clock speed up to 333 MHz I can get some parts to run at normal speed but parts in games like Mario RPG when the star appears everytime you hit someone it slows up. Does anyone have optimized settings they could share with me that work fairly well or is it not yet able to run games at normal speeds?
|
|
|
Post by craig588 on Dec 26, 2005 1:22:29 GMT 1
It will never run SA1 games correctly. A 22MHz 65c816 is just too much for the PSP to handle.
|
|
|
Post by laxer3a on Dec 26, 2005 4:56:05 GMT 1
Yes, many people ask to run games that are using DSP and other coprocessor :-) No need to say that correct audio, graphic and cpu emulation is already a lot of work for a PSP.
Of course, PSP helps with graphic hardware stuff and ME. Still there is a lot of other problems which occurs by using them.
|
|
greekslover
Junior Member
Everyone would like more lives.
Posts: 81
|
Post by greekslover on Apr 24, 2006 12:21:14 GMT 1
Yes, we pay 250$ for that stupid PSP and it can't fully emulate even snes!!!
|
|
|
Post by laxer3a on Apr 24, 2006 13:46:16 GMT 1
I think this a subject that should be stickied but lets see...
The PSP has a 333 Mhz CPU and a very capable graphic pu (GPU). The GPU is able to draw roughly 4 pixel per clock cycle when the pipeline is fully busy.
Most of the games (even on PC) can still never reach the max peak performance of the graphic pipeline, because they need to change "render state" inside the graphic pipeline which make it "flush/stall" the pipeline.
Now when it comes to emulation, you need to EMULATE a cpu, which mean to simulate in memory the whole state of processor. Not only the register memory, but the status register, the format and specificity of the processor.
To take an example : when you add two number with the cpu you have stuff like that : registerA = registerB + registerC If (A ==0 ) updateZ flag If (A < 0) updateN flag etc...
Basically emulating one instruction cost a LOT of instruction on the target cpu. Now the main SNES CPU is 1.7 Mhz in normal mode and 2.7 Mhz in fast mode if I am correct (and who cares if not, you can see the range of power). Which mean that we need to emulate 1.7 to 2.7 million cpu cycle with a 333Mhz cpu.
Now the good point is that old cpu are not pipelined, which means that most of instructions takes between 3 to 7 cycles to execute.
They also take 3 to 7 cycle to execute on current modern cpu but because instruction are pipelined, they look like only taking 1 or 2 cycle to execute.
So basically we have 330 millions of PSP cpu cycle to emulate 2 millions of snes instruction cycles. (I will talk about the other chip later...)
But there are actually more problem, simulating memory access : the snes cpu is a 16 bit cpu, which means that memory is paged. Each memory access of the snes must be filtered to see if it does not go outside of the page... This is very costly.
Basically for byte of memory read / written we need to : - Check where is it : RAM / ROM / Chipset ? - If chipset, check if we need to modify the rendering or the emulation state of other chip. - Synchronize between chip sometime.
You start to see that emulation is pretty costly : simulate the complete state of the cpu, simulate the way the bus is accessed (not at 0/1 level and electric pin, but just the upper level), simulate if any read or write does not start an "action" inside the emulator.
Still, I dont believe the cpu is taking most of the time (I had a discussion with yoyo about it, we need now to find where is the real bottleneck in the emu).
But most like cpu + getter/setter (memory access) probably take I would say most likely between 30 to 40% of the main cpu time.
The audio chip of the snes is working very closely with the snes cpu, so we need to emulate ACCURATELY how things are going, for that we need to count the number of cycle for each simulated instruction on the apu and cpu. this has a cost too.
In average I would not be surprise to see that a single snes instruction takes aroung 100 or 150 cycles or more to be fully emulated in average...(most instruction have memory access). As there is 2 millions cycle and we have 3 to 5 cycle per instruction , it is roughly 0.5 to 0.7M instructions to emulate. --> 50 to 105 Mhz of the PSP cpu is needed just to emulate that part. (0.5 *100 cycle best case, 0.7*150 worst case)
Now there is another issue : the emulation of the graphic chipset of the Snes. We need to support the original format of the VRAM of the snes (because program could read the value). At the same time, we translate the VRAM format of the snes into a "texture cache" to be able to use the PSP GPU to draw the snes tile directly. This conversion has a cost, cache maintenance as a cost. The snes chipset is very complex actually in term of graphical operation and is not really friendly for a normal 3d chipset to help doing the emulation.
For that we do "multi pass" rendering to simulate correctly the "Z Buffer of the snes". (Yes it has a kind of Z buffer inside the snes for the current pixel beeing drawn)
In a normal PSP game, you prepair your data to do not waste cpu, you data is "ready to execute". In an emulator you cant do that kind of caching because then you would have a very costly process of dectecting changes to rebuild the cache... And may be there would be too much changes so that you cache would be useless.
So, there is no choice except do some job inside the inner loop and not be able to compute "outside".
Now the snes has the following ability : - 4 BG plane, scrollable. - Each tile inside the BG plane can switch of palette, using H or V flip - BG plane can be a 8x8 or 16x16 mode. - Each plane can be "clipped" line by line by windows register. - Each plane can have a "mosaic" enable / disable - Sometime layer can be blended with additive/sub effect. - It is possible to remove some BG and switch to the mode7 mode.
All the state CAN BE CHANED LINE BY LINE inside the Snes. It makes the emulation quite difficult.
It has a memory for tile, memory for map, memory for palette, memory for list of sprite and how they are combined.
Because it is completly tuned in hardware, we need in software to do a lot of "and/or/mask/shift" to simulate all these data structure. This is very costly too. As I said before we can "precompute" more directly usable data, but then we have a cost to check if things changed... We think we did the best trade off available except rewriting the emu from scratch.
Ok... end of chapter 1 --- Need a rest.
|
|
greekslover
Junior Member
Everyone would like more lives.
Posts: 81
|
Post by greekslover on Apr 24, 2006 14:25:08 GMT 1
Is it possible to transcode snes games to PSP games so CPU does not have to simulate snes and lose speed? And if it is possible they won't need an emulator to run?
|
|
|
Post by laxer3a on Apr 24, 2006 14:39:49 GMT 1
The problem is that you can not convert a binary program to another binary directly... The only way to do this, is to EXECUTE IT and analyze it and generate "chunk" of PSP code that does the same... Basically, instead of translating the whole application, you do it little block by little block... It is called JIT : Just In Time compilation or also named Dynarec. (dynamic recompilation). In this case, yes it is possible to speed up the cpu emulation by a factor from 5 to 10. Basically the concept is instead of DECODING the SNES INSTRUCTION, we generate the PSP code which does the SAME THING. It cost more to generate the code at first, but each block of code is then faster to execute... So when the program LOOP using the same block again and again, you gain back the time you loose creating these blocks. The emulator is not anymore emulating but is becoming a mix between the game and emulator itself. Bleem was the first game emulator to do this I believe.(but PsOne cpu is easier to JIT) Apple did also this when moving from Motorola 680x0 architecture to IBM PowerPC and keep their software compatible. They did also the same thing lately when they switched from PowerPC to Intel CPU. Java is also using that kind of technic, but java is a weird beast, because the class file format keep the integrity (= you know what is code and what is data... it is like having the source code, so JIT is very efficient and close to real compiler performance) But there is draw back with Snes JIT : - It should be forbidden to use JIT with code in RAM. Because old cpu were using technics called "self modifying code". Basically your JIT "cache" a piece of code saying : If you jump this snes code, please use this native code instead. But if a snes program modify itself, your cache isnt up to date anymore, it mean that we would have to check each time there is a write in RAM if it is a area which is "cached in JIT". This is very expensive to check... So in the case of a snes, only ROM code should be jitted. Moreover technic to efficiently optimize jitted code is close to a full compiler. Moreover, jitted code chunk are very small (like 6 or 7 snes instruction) then you have a puzzle of thousand of small block of code that you have to put together again (this has a cost too). Basically, more the JIT is efficient, more it use CPU :-) So then, it becomes a trade off between the global performance you want, or the local performance you want. But lets face it : such technic would require to write an amount of code which is quite huge to do something super efficient. So a "middle quality" range of JIT seems the only possible stuff for an emulator. It is something that we have tought about since the inception of this PSP version of Snes9x. You have also less advanced technic like rewriting only the cpu emulation in assembly or use something called "threaded code" which is a mix between JIT and emulation. Anyway... The cpu emulation takes 30% to 40% of the PSP cpu time. Is it worth to spend man-year of development time (1600 to 3000 hours) to make it down to 10% ~ 20% ? Thats something I am not really in position to answer !
|
|
|
Post by laxer3a on Apr 24, 2006 14:48:33 GMT 1
I also need to talk about the "coprocessor" later in this thread...
|
|
|
Post by tsurumaru on Apr 24, 2006 16:41:53 GMT 1
Thanks you Laxer3a, your series of posts is very interesting, you should definitely sticky it so that anyone interested in the actual processes behind the emulation can understand them.
|
|
greekslover
Junior Member
Everyone would like more lives.
Posts: 81
|
Post by greekslover on Apr 24, 2006 16:53:52 GMT 1
Thanks Laxer, your posts are very good. I hope you talk about coprosessors soon!
|
|
|
Post by laxer3a on Apr 24, 2006 17:50:54 GMT 1
I also forgot to talk about something about "translation" of binary files... Actually the problem to "automatically" translate an application is that when it is a binary file, you dont know what is data and what is code. It is just a series of byte... When the cpu execute it, the code itself says : jump here. jump there... This is why java is good: the binary file keep the code clean, then the JIT is efficient. Still you can do something called "static analysis", which is basically load a binary file, translate as much as you by looking at the program if you know where it start (like a DLL)... like : A B C D Jump to A# ... ... TERRA INCOGNITA (Code or data ? who cares for now...) ... A# Here F G H When all the jump (conditional or not) are STATIC, it is very easy to analyse a whole program without running it. You just parse the instructions and make a huge tree of jumps. Now the problem is the following : A B C Jump RegisterR0*4 byte from here... how do you solve that There is no way to know this at static time... May be the program before garantee that R0 is going to be between 0 and 7 or >10. This is why your static analysis is stuck. Any "switch case" will have your analysing tool stop or go bersek. As you can see the job is difficult enough right ? Well it is even MORE FUN on the Snes cpu. Because it can work as a 8 bit OR a 16 bit cpu.... Ex. the sequ A B C D of byte in a file... could be in 8 byte mode considered as : instruction 0 : A instruction 1 : B instruction 2 : C instruction 3 : D or in 16 bit mode : instruction 0 : AB instruction 1 : CD So you NEED TO KNOW THE STATE OF THE CPU at runtime to be able to parse the instruction. It was a real pain when I wrote my own Snes dissassembler 12 years ago. Because my tool had to "try" to guess the mode of the cpu for a given piece of code to be dissassembled. like changing of mode when invalid instruction where found, roll back and such... Now, may be if could be fun to give HINT with static tool on PC to help the emulator to run faster on smaller platform... Like having hint about the whole cartridge ROM data : Code Start - End - State of CPU... for the whole cartridge. Having a list of these kind of data. I will talk about copro tomorrow. Good night.
|
|
|
Post by mrgibby on Apr 24, 2006 21:33:25 GMT 1
Hi,
The one thing that has been bothering me since I downgraded my psp for homebrew... Why is it that neocd psp is so fast compared to the snes9x? Is it because all of the special chips found in snes or what?
At the moment my favourite emulator is actually Rin, but neocd psp is close second. NeoCD psp is definitely more impressive but those gb zelda games are oh so sweet.
I would love to play the snes rpg's and super metroid but I am a bit demanding on the framerate and such. At least ff3 and metroid are a bit too slow for my taste to actually enjoy them.
|
|
|
Post by laxer3a on Apr 25, 2006 4:07:59 GMT 1
In FF6, well, all the map inside village, houses are at 60 fps+ at FS0 in FF6 with no degradation about the graphic quality if your setup are correct. (when I played it on older version of the emu). I have tuned the mode7 so you should also get a solid 50 fps+ for beeing on the map with walking character (no rotation). (I got 87 fps with no sound and switched to approximate mode).
The problem comes for combat where you drop definitely to a 30/40 fps like frame rate or when using the airship...
While I agree it is not perfect, it seems very playable to me. About Metroid, I dont know. Never tried it...
I would say the problem with snes emulation is definitely the nasty architecture they have. Basically, a megadrive or a neogeo has a more "honnest" architecture. (ie, put the sprite, draw the map)
I personnally think that Snes emulation is the most nasty one for the 16 bit console era.
PsOne and upper architecture are using 32 bit pipeline cpu, use 3d chipset... Most likely easier to translate to PSP GPU call. If one succeed to map efficiently 3D drawing from one platform to another. Such emu are possible even if the platform is more powerfull. The biggest problem then is cpu emulation. You have a 33Mhz cpu in the psone, 80 Mhz cpu in the n64 and so on... A 333 Mhz even with JIT technique can definitely suffer to emulate these cpu speed.(even if translation is easier to do than snes in most cases)
To come back with the snes topic, The architect of the Snes GPU did a very good job at making it "powerfull" for cheap with a lot of hardware tweaking. But this implies a lot of TEST at runtime to emulate the various modes of the chip in software. Moreover, games really use it well :-) Basically they add transistor logic to do nasty/cool stuff in their chip, but the emulation is just a big pain. We are quite lucky actually that many of the graphic mode of the Snes could have been emulated with the PSP GPU still. It could have been worse. Still some stuff can not be emulated easily (mode7).
I will probably update this thread with technical link later... Anyway need to go back to work. See ya.
|
|
ar15
Junior Member
Posts: 53
|
Post by ar15 on Apr 25, 2006 6:45:51 GMT 1
metroid is playable with the right settings, same with rpgs
|
|
dagas
New Member
Posts: 3
|
Post by dagas on Apr 25, 2006 7:40:57 GMT 1
I was having huge problems with games running slow on my 2.0, but then I downgraded (Which went super smoth, no problems at all) and now even Tales of Phantasia runs great with sound and everything I strongly recommend downgrading. Being able to run snes9xTYL ME instead of S makes all the difference, no tweaking was required at all
|
|
greekslover
Junior Member
Everyone would like more lives.
Posts: 81
|
Post by greekslover on Apr 25, 2006 8:11:11 GMT 1
When I said transcoding I meant translating all the game code in advance, and transferring it to PSP format. So the game will no more be a snes game but a psp game like those we find in UMD format. I had this idea because of the Mac. Now because Macs are released with Intel prosessor instead of PowerPC all Mac programs are being optimized for the Intel architecture.Can the same be done with snes games and PSP prosessor architecture?
|
|
|
Post by laxer3a on Apr 25, 2006 8:39:23 GMT 1
1/ This is what I described in my post as "global" translation. Basically convert the complete binary into another binary...
2/ If you read in detail you will understand that it is not possible, that you MUST DECODE (=run the game) in runtime to be able to do the "translation". Then you can only translate by "chunck". And then play puzzle...
3/ Take an old Mac application, run it on a new Mac with intel cpu : - All the OS / DLL ARE IN INTEL CODE. A program spend 50% of its time running other code than itself... So automatically, 50% of the old mac application time will be in native without doing anything.
So the remaining 50% can be emulated... But for a real computer application, it is not enough. So Apple are doing "dynamic code translation"... But it is not a CONVERSION of the appli. It is runtime conversion of the application, chunk by chunk beeing emulated.
I didn't check their latest technology, may be they have a way to analyze code at earlier stage. (depends on the structure of the binary with PowerPC binaries and instruction set).
Still for the snes, some stuff arent possible anyway... Please read in detail. Everything is written in my previous posts.
4/ When people port their application from PowerPC to the new Intel platform, they dont convert the binary also... They have the source code and recompile it. So it is not a problem for them. A snes ROM is just a pile of byte. There is no way to extract "what" and "how".
Moreover, the old mac application and new mac application call the same functions. In a PSP, there is NO SNES chip. How do you convert ? You simply cant... you still need to emulate some how at a given level...
|
|
greekslover
Junior Member
Everyone would like more lives.
Posts: 81
|
Post by greekslover on Apr 25, 2006 8:48:30 GMT 1
You're right Laxer. I've read them more carefully and unterstood what you mean.
|
|
|
Post by pikamus on Apr 25, 2006 13:43:44 GMT 1
Very cool laxer, ^_^ i'm sure lots of people are reading this... just want to say it's very interesting and keep it up.. i'm quite interested in why mode 7 on the gpu isn't possible.
also in chrono trigger.. the FPS is great.. but at ANY battle.. it drops to ~7 fps take a look if u get the chance
|
|
|
Post by laxer3a on Apr 25, 2006 14:23:14 GMT 1
Ok lets go with the mode7 issue. It is a quite fun and interesting subject...
1/ The mode7 is a transformation IN 2D, not in 3d. Mean you take a piece of paper in front of you, you can rotate it on it center, zoom it...
2/ At this point you can do it with a PSP hardware : A/ Draw all the tile (=a BG, the map) on a texture. B/ Draw with 2 polygon only on the screen. mapping the UV coordinate.
Issue 1 : the mode7 has attribute like : - use a tile if it goes outside - loop if it goes outside - use a color if it goes outside. -->Problem to handle with PSP GPU.
Issue 2: having enough RAM in the PSP to handle a HUGE BITMAP. (a map is 64x64 tiles of 8x8)
256 KB in 8 Bit texture. 512 KB in 16 bit texture. (there is only 2 MB in the PSP, and we need to cache the tile also, need 2 screen buffer, and a Snes screen buffer, memory for drawing list, palette, etc...)
Issue 3:cost of updating the texture when the map change.
Issue 4: dont forget if the palette is changed at each line...(multipass rendering here again)
These kind of technique could work for simple game like... CamelTry for those who knows.
But... There is another issue.
The snes 2D registers are CHANGED AT EACH SCANLINE to make 3d effect, like FZero. Basically you would have to draw a polygon at each line drawing the map texture... (hoping that the texture does not go out of the snes map).
Basically for the moment, the fastest way we found is to draw it in software. There is also issue with the clipping windows (I am not sure if they are active in mode7)
Anyway one must not mistake : the mode7 look like beeing 3d, but it is NOT 3d, which then makes it very difficult to map with 3d engine.
|
|
greekslover
Junior Member
Everyone would like more lives.
Posts: 81
|
Post by greekslover on May 1, 2006 8:52:13 GMT 1
Laxer, can you talk about each coprossesor seperately and tell us which coprossesor each game uses? I hope I am not asking for too much things!
|
|
|
Post by ajallstar2112 on May 1, 2006 23:56:45 GMT 1
Laxer, in laymen's terms please. I do not have the attention span to read all of that! Just answer me this: Can you make the games run smoother?
|
|
|
Post by craig588 on May 2, 2006 0:48:36 GMT 1
Yes.
He's acctually making it very simple already.
|
|
|
Post by laxer3a on May 2, 2006 11:34:32 GMT 1
Well I try But it is not easy to make things simple while keeping detail and essence and not telling lies. There could be more "simple" image to illustrate what I want to say, but then it would have loss in term of technical knowledge. Especially that I am not writing a book here :-) If I had 300 pages, that would be far more different.
|
|
|
Post by swordchucks on May 2, 2006 14:36:55 GMT 1
Super Mario RPG ran horribly for me when I was using the user mode version of 0.3. However, I eventually bit the bullet and started using the ME version and it's been faultless ever since (though I really miss sleep mode). I run with the frameskip set to 0, 11kHz sound, and full processor speed. Absolutely no problems. Setting the fameskip to AUTO doesn't work so well for me in that game.
The regular version, however, locks up on me at random intervals and doesn't move so smoothly.
|
|
greekslover
Junior Member
Everyone would like more lives.
Posts: 81
|
Post by greekslover on May 2, 2006 17:43:15 GMT 1
I'm really interested to learn all about the snes emulation. I don't want just a plain anwser "We can optimize this game" or "we can't optimize it". I want to learn the true reason for which the games can or can't be optimized, and I want to take a taste of these wonderful people knowledge.
|
|
|
Post by patters on May 8, 2006 11:38:43 GMT 1
Hi Laxer,
This is a very interesting read. Are you thinking of moving the CPU core to assember (like ZSnes did a long long time ago IIRC) and like SNESAdvance on the GBA? Or is this what you meant when you talked of one man-year of effort?
|
|
|
Post by laxer3a on May 8, 2006 12:43:57 GMT 1
Nope.. I had a discussion with yoyo about it. We did that a long time ago on the GP32 and didnt gain that much compare to the effort.
I was talking more about a "dynamic code generator" but it is really a pain to devellop fully. We are not thinking about making it... Except if some motivated programmers want to join :-)
|
|
|
Post by laxer3a on May 8, 2006 18:20:19 GMT 1
I promised quite a long time ago to talk about coprocessor.
From the top of my head (may be mistaking) there is the following coprocessor for the Snes : SA1 DSP1 DSP2 SuperFX
Of course, for cost issue, Nintendo didnt release cartridges with multiple copro inside.
SuperFX is the hardest to emulate (StartFox, StarOcean ?, Donkey kong Country ?)
SuperFX is composed of two custom RISC CPU running at 11 Mhz (equiv then to ONE cpu at 22 Mhz), more over it is linked at the graphic rendering (see starfox)
SP1 : Pilot Wings, Super Mario Kart
It would be nice if somebody who knows a lot about which games use what would make a FULL LIST of games using each coprocessor. That would show how much effort need to be done to tune. (ie I think SuperFX is used in 3 games only if I am correct)
Now I agree that the games that are good are the one that have copro (seems logical as they have better hardware).
There could be two possibilities I foresee to tune the emulator : - Rewrite the DSP stuff optimized to use the PSP FPU(VFPU ?). - Try to use the ME to emulate a second chip (then AUDIO + ...) - Use a JIT/dynarec to optimize.
I am not sure neither yoyo and I want to go that far in tuning. If any good coder is interested to join in... May be worth sending a bottle to the sea actually. As the source matches the latest binary. Still has yoyo or I continue to devellop, we would like to keep the project in sync and avoid multiple versions everywhere...
For the next release, there is SMALL details that we are looking at, if we are lucky we could grab some more FPS again but it is not implementing new features.
|
|
greekslover
Junior Member
Everyone would like more lives.
Posts: 81
|
Post by greekslover on May 9, 2006 5:20:58 GMT 1
Since Donkey Kong Country uses SuperFX why is it so fast?
|
|