The main goal of this work is to find a parallel algorithm accelerated
by hardware for interactive volume rendering. A parallel algorithm
is desired because of the large amount of computation required in
the volume rendering process. There are two different approaches
to create a 2D projected image from 3D volume data: forward mapping
and backward mapping.
Forward mapping directly maps the volume data onto the image
plane. For each voxel, the renderer maps the point onto the image
plane and then adds its contribution to the accumulating image.
On the other hand, backward mapping maps the image plane into
the data, commonly called ray casting. For each pixel in the final
image, the renderer shoots rays from the pixel into the volume
data and intersects that ray with each data point until either
the ray exits the volume or the opacity accumulates enough density
to become opaque. In backward mapping, input values rarely fall
exactly along a ray. Consequently, an approximation of the volume
data is generated using an interpolation scheme. More sophisticated
methods may relieve some of this replication but will not eliminate
it.
Texture mapping was introduced in [Catmull 74] as a method of
adding to the visual richness of a computer generated image without
adding geometry. This process includes a mapping from the space
of the object to be textured to the texture space.
Bilinear/trilinear interpolation is usually used to sample data
in the 2D/3D textures. To accelerate texture mapping, traditional
graphics hardware includes a complete copy of all texture images
at each parallel computation node to allow all nodes to operate
in parallel.
An new way of rendering volume data using texture mapping is
introduced in this presentation. Texture mapping which maps from
object space to texture space is used for backward mapping in
volume rendering which maps from image space to volume space.
An innovative parallelization scheme is used in this work to subdivide
the entire volume into subvolumes and avoids the need to store
the whole volume at each computation node.
To handle volume data, parallel computations are done in graphics
accelerator Denali manufactured by Kubota. Denali is a general
purpose graphics accelerator with capabilities and functionality
found only in the very high end graphics systems. By using the
innovative parallelization scheme, we have extended it to be an
imaging workstation which can do versatile volume rendering.
The Denali architecture has parallelism at two levels. A Denali
system can have up to 6 Transformation and Rasterization Modules
(TRMs) which, being general purpose processors (AMD 29050s), can
perform arbitrarily complex computations.
In addition, a system can have up to 20 Frame Buffer Modules
(FBMs), each of which can perform a predefined set of per pixel
operations, like image/volume resampling and pixel blending. Image/volume
sampling operations can be performed in the FBMs and large parallel
speedups (up to 20x) are possible. Some image operations which
can not be performed on the FBMs can be done in parallel on the
TRMs with a resulting parallel speedup of up to 6x. Complex volumetric
rendering operations (consisting of multiple suboperations) may
be carried out utilizing both the FBMs and the TRMs to best advantage.
In summary, traditional 3D texture mapping used in 3D graphics
is used for backward projection in volume rendering. A very high
degree of parallelism (up to 20-fold) is achieved by storing a
different subvolume in each of the FBMs. This differs significantly
from traditional approaches that store the entire volume data
set at each computational node. Overcoming this limitation permits
volume sizes up to 256x256x253 (or 512x512x64) to be rendered
in hardware. Progressive refinement schemes can be utilized to
reduce sampling rate and interactive rendering speeds can be realized.
Currently, maximum intensity projection, multi planar reformatting,
volume resampling, ray sum, and surface rendering are implemented
in Denali.