Hmm, me and a guy I know that owns an Ipaq 3970 have been doing some interesting tests with simple code.. We're both running PPC2002, his is the 400/PXA250, mine's 400/PXA255.
All the code does is copy some memory to the screen and display the framerate at which it manages to do so.
Seems like the _exact_ same code runs TWICE as fast on his machine as it does on mine... the bottleneck seems to be writing screen memory (using GAPI).
Memory in general doesn't seem to have anything to do with it (actually runs faster on mine than on his with NO screen access)
Also I found this page on the net which might be interesting as well:
http://www.spraguetech.com/results.html
Any ideas? I'm going to contact dell tech support and see if I can get a technical answer from them...