I recently got interested in using voxels and photon mapping to make detailed scenes with correct lighting. Does anyone here have thoughts about using cached sprites to speed things up?
What I'm thinking of is rendering the relatively static parts of a scene to normal-mapped sprites. These could then be updated at a rate somewhat less than once per frame. It seems to me that this technique should allow me to pull some of the load out of the main rendering loop. Specifically, objects that are far away won't change much either in viewing angle or apparent size as the character moves. Normal maps should allow the lighting to be updated without triggering a re-render.
Background:
My test app raycasts a 256 x 256 x 256 scene at about 1 FPS (~= 200k rays / second on 2.4GHz Core 2). It's going to need to be an awful lot faster to get to realtime (30FPS) and handle the extra load of lighting.
The current design uses an oct tree to do voxel intersection checks. For each pixel of the output buffer I generate a ray and throw it in to the oct tree, rendering the color of the terminating voxel. Things I currently do to speed things up some:
- I have a simple heuristic to try to traverse each oct's children in the correct order based on entry angle of the ray. - Before drawing a pixel I check intersection with the previous pixel's terminating voxel. If I still hit it I reuse that color.
Next things I plan to do:
- Cache each pixel ray's hit from frame to frame and use that to accelerate intersections. - Add Bresenham-style traversal after the first hit so I can trace through translucent pixels quickly. - Add level-of-detail support
After that there are some implementation-level things to do, like remove stack based recursion, convert the data structures away from floats and unpacked pointers, add multithreading, and start using SSE. I think after that I will probably still not be fast enough so I'm looking for more ways to save.
I confess I don't really understand the Voxlap engine's technique yet, it is not too easy to understand from the source.
Zelex at
Re: Voxel Engine Performance
You should be able to render a 1280x720 screen in about .7 seconds with a single thread and on a 3.0 ghz Core 2. At least that what my implementation does. Getting to that point takes some work though.
Edited by Zelex at
hedgehog at
Interesting, how complex is your scene? What technique do you use for casting rays? Right now mine draws the 256 ^ 3 scene into a 1280x720 frame in 2.7 seconds. I'm working on getting performance up to real-time at 640x360 before I work up to higher resolutions but even there I have a ways to go. I estimate that I need to go about 100x faster than I do now but I think I can squeeze that out of the current design without too much work.
Zelex at
hedgehog said at
Interesting, how complex is your scene? What technique do you use for casting rays? Right now mine draws the 256 ^ 3 scene into a 1280x720 frame in 2.7 seconds. I'm working on getting performance up to real-time at 640x360 before I work up to higher resolutions but even there I have a ways to go. I estimate that I need to go about 100x faster than I do now but I think I can squeeze that out of the current design without too much work.
> Interesting, how complex is your scene? The complexity of the scene is irrelevant. Its ray casting :) I have about 1 voxel per pixel and 10 levels in the tree if I remember correctly.
> What technique do you use for casting rays? I currently use a bottom up traversal approach as it works out the best for GPUs. However I've seen top-down approaches also work successfully. Check out Physically Based Rendering by Matt Pharr for a good description on how to traverse a KD tree. The algorithm can be easily adapted to oct-trees.
Edited by Zelex at
Zelex at
has anybody tried voxlap at 720p yet?
ConsistentCallsign at
Zelex said at
has anybody tried voxlap at 720p yet?
320x240 is all you need :D I love low-res <3
Edited by ConsistentCallsign at
Zelex at
Surprisingly I just tried voxelstein at 1024x768 and the slowest it ran was 15 FPS. Impressive.
That means that it computes the value of each pixel in less than 256 cycles on average.
Edited by Zelex at
ConsistentCallsign at
Zelex said at
Surprisingly I just tried voxelstein at 1024x768 and the slowest it ran was 15 FPS. Impressive.
/vxlmip and /scandist are set to low by default in vxlst, try using 1024x768 in the original Voxlap test game and set /vxlmip and /scandist to max and your fps will probably be like 1 :P
Edited by ConsistentCallsign at
Maren at
ConsistentCallsign said at
Zelex said at
has anybody tried voxlap at 720p yet?
320x240 is all you need :D I love low-res <3
I don't really see the benefit in using 320x240 over 320x200, but yeah, lowres has it's charms 8)
ConsistentCallsign at
Maren said at
ConsistentCallsign said at
Zelex said at
has anybody tried voxlap at 720p yet?
320x240 is all you need :D I love low-res <3
I don't really see the benefit in using 320x240 over 320x200, but yeah, lowres has it's charms 8)
Cool! :o I will continue to use 320x200 then! :D 320x200 for the win! :D
ConsistentCallsign at
You can not do real-time raytracing at a decent framerate with today's and tomorrow's hardware if the resolution is higher than 320x200. 320x200 renders 720p obsolete because you can not play a raster-raytraced voxel game with photorealistic graphics and physics with a decent framerate if the resolution is >320x200 :D:D:D I do NOT want to play a 720p game because then I would automatically get shitty, unrealistic graphics with unrealistic physics.
Do you you want photorealistic graphics and physics at low-resolution, or do you want unrealistic graphics and physics at high-resolution?
//rant mode on
Imagine, a raster-raytraced, photorealistic voxel game that has so many volumetric clouds in the sky, that the light from the sun gets dimmed! :o And transparent water voxel rain drops are falling down from the clouds, and the thunderbolts are hitting trees and DESTROYING them in all photorealistically raytrace-beautifulness! :D:D :o :o
It's easy to fill the sky with hundreds of big volumetric clouds if the voxel volume is small because if the voxel volume is small, like 1x1x2, you only need 1 single cloud voxel to dim the light from the sun :D But in a high-res voxel world, you would need billions.. :(
Take that, polygonists! >:( >:( >:( :P
//rant mode off
Edited by ConsistentCallsign at
Zelex at
ConsistentCallsign said at
You can not do real-time raytracing at a decent framerate with today's and tomorrow's hardware if the resolution is higher than 320x200. 320x200 renders 720p obsolete because you can not play a raster-raytraced voxel game with photorealistic graphics and physics with a decent framerate if the resolution is >320x200 :D:D:D I do NOT want to play a 720p game because then I would automatically get shitty, unrealistic graphics with unrealistic physics.
Do you you want photorealistic graphics and physics at low-resolution, or do you want unrealistic graphics and physics at high-resolution?
//rant mode on
Imagine, a raster-raytraced, photorealistic voxel game that has so many volumetric clouds in the sky, that the light from the sun gets dimmed! :o And transparent water voxel rain drops are falling down from the clouds, and the thunderbolts are hitting trees and DESTROYING THEM IN PHOTOREALISTIC raytrace-beautifulness! :D:D :o :o
True, ray tracing at 720p and a decent framerate is currently not possible. At least not using any algorithm I know of. However, ray casting (primary rays only) with many times the amount of equivalent triangles is possible for static scenes. Dynamic scenes should also be possible, but I haven't done a whole lot of experimenting in that arena.
<edit> I should add that the previous statement is true when using GPU acceleration via CUDA or whatever
Edited by Zelex at
ConsistentCallsign at
Zelex said at
Dynamic scenes should also be possible
Yes, of course, but it will be slower because the polygons are much, much more sensitive to scene and object complexity than voxels are! :D
an object scanned to a voxel data set with 1024x1024x1024 voxels = 1GVoxel can be rendered with interactive performance on any Pentium 4 CPU. The same object represented by a polygonal mesh that still does not reach level detail of the voxel object, easily exceeds several ten million triangles. The rendering performance for such a huge mesh on today's most powerful graphics boards is much lower that our voxel graphics technology.
Also:
In conventional ray tracing, computation time grows with the number of objects [5], because in crowded scenes a ray may pierce a substantial number of objects and there is a considerable probability that a cell contains more than one object. Moreover, con- ventional ray tracers are extremely sensitive to the type of objects comprising the scene; intersection calculation between a ray and a parametric surface is significantly more complex than intersecting the ray with a sphere or a polygon. In contrast, RRT completely eliminates the computationally expensive ray-object intersections calcula- tion, and instead relies solely on a fast discrete ray traversal mechanism and a single simple type of object – the voxel. Consequently, RRT is in practice independent of the number of objects in the scene or the object’s complexity or type. RRT perfor- mance, however, is sensitive to the resolution of the 3D raster. Therefore, for a given resolution, ray tracing time is nearly constant and can even decrease as the number objects in the scene increases, as less stepping is necessary before an object is encountered.
Even if using polygons instead of voxels will be faster (when cheating by using a GPU), it's easier to interact with voxels, boolean operations are easier to perform with voxels :D
Edited by ConsistentCallsign at
hedgehog at
Zelex said at
The complexity of the scene is irrelevant. Its ray casting :) I have about 1 voxel per pixel and 10 levels in the tree if I remember correctly.
Actually for the oct tree approach with early termination the "noisyness" of the scene can change the performance quite a lot. I plan to modify my implementation to optimize away box intersection checks for octs that have only one child (go straight to the child) but that will probably only help that situation by 30%. That being said it sounds like your implementation is quite a lot faster than mine.
Zelex said at
> What technique do you use for casting rays? I currently use a bottom up traversal approach as it works out the best for GPUs. However I've seen top-down approaches also work successfully. Check out Physically Based Rendering by Matt Pharr for a good description on how to traverse a KD tree. The algorithm can be easily adapted to oct-trees.
I have actually read that paper, it is interesting. Another interesting paper which gives me hope is "A Visualization Framework for Time Dependent Metal Casting Simulations" by Grob Lojewski and Hagen.
Zelex said at
True, ray tracing at 720p and a decent framerate is currently not possible. At least not using any algorithm I know of. However, ray casting (primary rays only) with many times the amount of equivalent triangles is possible for static scenes. Dynamic scenes should also be possible, but I haven't done a whole lot of experimenting in that arena.
I suspect it can be done, or at least practically the same effect. Some things I will try once the basics are taken care of:
- Render objects in-scene to sprites and rerender only when the objects change appearance or the viewpoint has changed enough to require it. Still ray cast for sprites but special case the rays that intersect with these objects' bounding boxes. - Inter-frame caching of ray hits. - LOD reduction & motion blur for fast-moving objects. (Motion blur will probably have to be done as a post-processing effect taking the depth channel of the framebuffer in to account but it will look correct and hide the reduced LOD) - Early termination based on luminance. (The iris clamps down quite a bit if there is a bright light source in your field of vision, can do early termination for darker regions of the scene based on that) - Simulated cone-casting instead of casting extra rays to get antialiasing.
As they say, the proof of the pudding is in the eating.
Edited by hedgehog at
Spacerat at
For a comparison, raycasting a 1024x1024x1024 volume with a GF8800GTS at a resolution of 1024x768 is at about 20fps
Zelex at
hedgehog said at
Actually for the oct tree approach with early termination the "noisyness" of the scene can change the performance quite a lot. I plan to modify my implementation to optimize away box intersection checks for octs that have only one child (go straight to the child) but that will probably only help that situation by 30%. That being said it sounds like your implementation is quite a lot faster than mine.
Yeah, so the worst case scenario would be looking diagonally down a checkerboard grid. However, I don't think that happens very often in practice. The noisiness of the image in a more realistic scenario is noise at the surface level, which only really affects the last few steps in the oct-tree traversal. It would have an impact, but hopefully it won't half your framerate.
hedgehog at
Zelex said at
Yeah, so the worst case scenario would be looking diagonally down a checkerboard grid. However, I don't think that happens very often in practice. The noisiness of the image in a more realistic scenario is noise at the surface level, which only really affects the last few steps in the oct-tree traversal. It would have an impact, but hopefully it won't half your framerate.
Ah yes, good point. All of my testing so far is with a model that came from a CT scanner where I remove all of the voxels with opacity < 0.15 but maybe I won't have that problem as much on other scenes.
Zelex at
An update on oct-tree tracing performance.
I got it down to 550 ms per 720p frame. I think I can get this twice as fast as well... stay tuned.
straaljager at
Hi all, this is my first post here. I've been interested in voxel based rendering after reading the interview with John Carmack on sparse voxel octrees, but my interest recently went through the roof when I discovered that the latest Ruby demo from AMD was also rendered with voxels and even did some raytracing for reflections. After a bit of searching, I stumbled upon this thread, which I think is very interesting because of the inventive tricks proposed by hedgehog to speed up the rendering, like using normal mapped sprites. Hedgehog also refers to this paper "A Visualization Framework for Time Dependent Metal Casting Simulations" by Gross, Lojewski and Hagen. Does anyone happen to have a link to this paper? I found out that it was presented at RT'07, but the only site that has it is IEEEXplore and unfortunately I don't have access to that. I think it might contain some valuable techniques. Many thanks in advance.
hedgehog at
I can e-mail the papers to anyone interested. Pretty much nothing got done over the summer (I was traveling) but getting back to my project I've got a few thoughts which might be interesting to someone else working on a renderer:
- Rendering voxel models to bump-mapped sprites helps but the voxel renderer still needs to be quite fast to accommodate dynamic objects. - The memory impact of actually holding complete oct trees in memory will get prohibitive for any kind of game-like environment. - Hit-checking so many boxes is a drag, especially when descending from the top of the tree for each ray. Even if this was fast it would get bad with the addition of translucent voxels. Adding the ability to traverse the tree upwards (say from the last hit, I think of this as "hinted traversal") as well as downwards should help quite a lot (assuming your hints are good) but will introduce artifacts in some circumstances. - The place where I am really worried about perf problems is lighting (currently the lighting is a lot more rays than I shoot from the eyes to draw the screen).
So I guess I understand why Bresenham tracing makes sense or the slice/skew/composite technique from SGI would be attractive.
I'm a little bit stuck in figuring out where to dig up more throughput. Has anyone done any work related to reusing rendering computations from frame to frame?