Forum archive

optimizing code / inline asm

Return to forum index

0xC0DE at 2008-07-29 07:43
Dear Ken (and others),

I would like to optimize the following routine : http://rafb.net/p/Ls2zoe64.html

Yes, I know some things can be a little more efficient, such as the *4 (can be replaced with a bitshift).
And some things can be placed else were. But I’m mostly concerned about the following stuff :
```
		sx1 = (px1 + (slabs[(slabindex * 4) + 1] * tx1));
		sy1 = (py1 + (slabs[(slabindex * 4) + 1] * ty1));

		sx2 = (px2 + (slabs[(slabindex * 4) + 1] * tx2));
		sy2 = (py2 + (slabs[(slabindex * 4) + 1] * ty2));
		
		lenghtx = (sx2 - sx1) / SIZE_BITMAP_WIDTH;
		lenghty = (sy2 - sy1) / SIZE_BITMAP_HEIGHT;
```
And…
```
		sz = (intp1*64) - (intp1 * slabs[(slabindex * 4) + 1]);
		ez = (intp2*64) - (intp2 * slabs[(slabindex * 4) + 1]);
 
```
It seems there’s no good way to get rid of all the heavy * and / math. My final option would be to do everything in inline ASM. I don’t know any ASM tricks so I’m left to use mul and div instead of * and / operators.
Would that still help ? or is there a better way to get things fast ?
Spacerat at 2008-07-29 10:57
Re: optimizing code / inline asm
The multiplications can be done in parallel with mmx, then / can be replaced by >> if you use 2^x sized textures. If you change to asm you should care of pipeline optimization - thats maybe more important than some multiplications..
Another thing you can toy around is self modifying code in asm - it clears the intruction cache but is sometimes faster than the common way
Edited by Spacerat at 2008-07-29 11:01
Awesoken at 2008-07-29 11:27
I'm guessing you are writing this for a mobile device, which makes talk of MMX not applicable. I would attack the code at the algorithmic level before taking it to assembler. Your code looks like some kind of like bilinear interpolation, but I'm not sure what you are interpolating. Perhaps if you explain your algorithm briefly, with a screenshot or two (attachments finally work for me : ), I could advise a faster inner loop. Are you using orthographic or perspective projection? Carefully sorted back-to-front rendering or sloppy Z-buffer? Also, a description of your slabs[] structure would be very helpful.
0xC0DE at 2008-07-29 19:33
I'm sorry, it was very late (and I was tired) when I wrote the last message. So I didn't provide all info :) anyway :

Indeed, this stuff is for mobile/handheld devices.

The algorithm basically draws a slice of the world (if the world is 64*64*64 then it has 64 slices of 64*64). To do this I interpolate between 4 vectors of the slice. And that's were all the math comes in.

For sorting, I first sort the slices from front to back. After that I use a zbuffer (so some math routines are for that (mainly the sz and ez stuff). I need a zbuffer because I draw sprites and other objects in my voxel world aswell.

my slab[] has

x = x position
y = y position
w = width of the slab
side = visible side

NOTE: side is used to determited from which side the slab is visible, I use that for coloring so you don't need to worry about that.

Hope this info is enough :)
Awesoken at 2008-07-30 02:15
I assume by 'lenght', you mean 'length', and by 'width of slab', you mean 'height of slab'.

It looks like your inner loop is using orthographic (parallel) projection. Orthographic projection, while not correct, is a good approximation and probably a good choice on a mobile device where speed is more important than quality.

If your 'projected' array values are also orthographic, then your variables {tx1, ty1, tx2, ty2, intp1, intp2} should be the same for all planes/columns of your model. In this case, you have 2 ways to avoid multiplies:
- If you visit every index, then you can pre-calculate a step size and add that to your endpoints as you visit each plane/column.
- If you must skip around, then you can pre-calculate a 64-entry lookup table (or whatever your model size is) once per model.
0xC0DE at 2008-07-30 06:40
I'm sorry, your assumption is wrong. I really mean width instead of height.

Like said before, my voxel model consist out of "X" slices of X*X (where in this case X is 64).
When you imagine one of those slices as a 64*64 bitmap my slabs would run from left to right. Starting at the top (that is, the highest slab is drawn first,
but they don't always start at 0,0).

I transformed and projected my vertex (in this case 4 per slice) like I would in a polygon engine. the model can rotate in true 3d (over the x,y and z axe).

About your 2 points :
If you visit every index, then you can pre-calculate a step size and add that to your endpoints as you visit each plane/column.
I don't visit every index (I pressume you mean a position in X*X with index) I start at slabX (which can be any position between 0 and X on the X axe) at
slabY which can be any position between 0 and Y on the Y axe.
If you must skip around, then you can pre-calculate a 64-entry lookup table (or whatever your model size is) once per model.
I don't understand this. If I would like to stop skip around on every slice the lookup table should be alot bigger (at least 64*64 since we have 64 for which we need the info),
not to metion the problems I would have when there are multiple slabs on the same X axe.

Also I don't see how I can precalc them ? I mean, the stepsize changes all the time (since we can rotate in 3D -> x,y,z).

Maybe I still didn't gave enough information. I skipped some stuff because I thought that would make it easier for you. I was probably wrong.
I uploaded the 'complete' routine this time : http://rafb.net/p/B6lZ0o83.html
please note it's not very clean, since I'm still experimenting and thus my code is getting dirty :)
Edited by Hugo Smits at 2008-07-30 06:43
Spacerat at 2008-07-31 18:25
For detailed optimization I just came over this page : http://agner.org/optimize/
There is a great documentation about code optimization.
TX at 2008-07-31 20:26
Spacerat said at 2008-07-31 18:25
For detailed optimization I just came over this page : http://agner.org/optimize/
There is a great documentation about code optimization.
Heh, yeah, optmizing_cpp.pdf is probably my favorite optimization-related document ever. It's really rather invaluable to anyone doing any kind of C or C++ optimization, whether you know a lot or a little.
0xC0DE at 2008-08-01 00:08
It was a nice read. But it wasn't very usefull. Mainly because I'm not coding for PC (win32/linux or even Mac) but for mobile/handheld.

Anyone any suggestions on the above posted code ? thanks in advance!

Forum archive

optimizing code / inline asm

Re: optimizing code / inline asm