If you're looking for 3D performance, I think the best way to do it is using the Intel Primitives, I've used them, and they are fast (because they are assembly and fixed point, which ARM chips love :) )
Link:
http://www.intel.com/design/pca/appl...sup/IPPv30.htm
I'm not sure what the license is like thought :-\
I just recieved then with this xscale dev kit.
IIRC, it has optimized 3d/4d(homogeneous) vector/matrix routines, mpeg decoding routines and some graphics drawing stuff too :P
BTW: OpenGL on any system without a FPU or acceleration will be unbearably slow.
--jbit