Download sources at https://github.com/fcaruso/GLSLParametricCurve.git

Drawing parametric curves is a common issue for OpenGL developers, because there is no graphics primitive which can directly rasterize them. Moreover, it seems that future releases of OpenGL APIs will not enrich the set of primitives, since Khronos group strives to keep OpenGL API as minimal as possible: as a matter of fact, primitives like quads and polygons have been removed from the modern core profile. If you want to draw a curve using OpenGL, basically you have to draw a poly-line, or more properly, a polygonal chain. This technique requires to sample a set of points of the curve, which is a two steps process:

- Tessellation, i.e. define the resolution of the sampling and the curvilinear abscissa of the points
- Evaluation, i.e. evaluate the chosen points on the curve

Uniform Sampling is the most used and straightforward technique, but obviously more effective techniques exist. In the earlier versions of OpenGL, built-in tessellators and evaluators unburdened the developer’s efforts to sample curve points properly but, although they were easy to use, they were not much flexible. Then, with OpenGL 3, Geometry Shaders arrived. They offer a lot of functionalities, but their use does not fit properly to Curve Rasterization, even if they can be used also for this task. **Tessellation Control Shaders **and** Tessellation Evaluation Shaders, **introduced in the OpenGL Core Functionalities since version 4.0, really fit well for curve rasterization tasks. Moreover, using tesellation shaders, it is possible to raise the abstraction level of the OpenGL graphics primitives. Potentially, you can build your own library of primitives. Using such a high level primitive also simplify the software development process, preventing the coding of ad-hoc data structures to handle geometry data. Further, they also provide geometry compression. In the following lines I will describe my technique based on tessellation shaders.

The picture on the right describes the OpenGL 4 Rendering Pipeline. Tessellation Shaders stage is between Vertex Shader and Geometry Shader. Tessellation Control Shader (TCS) and Tessellation Evaluation Shader (TES) can be programmed and customized. Tessellator instead, is a fixed function stage, responsible for creating a set of new primitives from the input patch. This stage is only executed if a TES is active in the current program or program pipeline. TES is mandatory, while TCS can be omitted.

From a developer point of view, the development of a tessellator shader for a particular curve could be very effective. It allows the developer to create a set of “**custom graphics primitives**“, which requires as input only mathematical parameters, and let the GPU do the rasterization, like a normal OpenGL primitive. This can be effective for several reason:

- it is a kind of geometric compression, because just the curve control points are sent to the GPU, instead of a high resolution poly-line.
- the OpenGL application handles the curves at higher abstraction level
- a shader can be very easily imported in several applications

The following lines will briefly describe tessellation shaders. Surely you can find more exhaustive explanation in the OpenGL Red Book or in the OpenGL Tessellation Wiki, but the aim of this post is just to give an insight of how tessellation shaders work.

The tessellation process is applied to the new “patch” primitive (added in OpenGL Version 4.0). A patch contains just the vertices sufficient to define all the other vertices of the geometry going to be tessellated. It can be thought as a set of **control vertices** constraining the final geometry. It is important to notice that the number of vertices provided for the patch can be considerably lower than the number of vertices of the final geometry. As mentioned before, this is a sorta of geometric compression: the amount of vertices sent through the pipeline is fewer than the amount of vertices of the geometry which will be displayed.

It is possible to define a patch using the “GL_PATCHES” primitive. When developing using **Modern Architecture**, it is necessary to bind a Vertex Array Object with the vertices of the patch, and then render it using a *glDraw** function:

glPatchParameteri( GL_PATCH_VERTICES, number_of_vertices_of_a_single_patch ); glDrawArrays( GL_PATCHES, first_vertex_index, number_of_vertices );

But it is also possible to use them in immediate mode within a compatibility context:

glPatchParameteri( GL_PATCH_VERTICES, number_of_vertices_of_a_single_patch ); glBegin(GL_PATCHES); glVertex3fv(v0); glVertex3fv(v1); ... glEnd();

It is important to notice that before define the vertices, it is necessary to specify the number of vertices of each patch using the **glPatchParameter** function. It is also relevant that a patch does not specify a topology, and the order of the vertices is up to the developer.

The role of the Tessellation Control Shader (TCS) is to generate the **tessellation output patch vertices** and the **tessellation coordinates**.** **Output patch vertices have not to be confused with the vertices of the final geometry, which is made up by the output patch vertices acting like control points of the overall geometry, and the vertices evaluated at the tesselation coordinates.

TCS processes the vertices specified in the patch, **input-patch vertices**, already transformed by the vertex shader. The involved built-in data structures are **gl_in**, output of the vertex shader, and **gl_out.**

in gl_PerVertex { vec4 gl_Position; float gl_PointSize; float gl_ClipDistance[]; } gl_in [gl_MaxPatchVertices]; out gl_PerVertex { vec4 gl_Position; float gl_PointSize; float gl_ClipDistance[]; } gl_out[];

Tessellation Control Shaders can do complex operations and can create or remove vertices from the input-patch vertices. However, the most common usage is just passing the vertices out of the shader, using a code like the following:

layout (vertices = 4) out; void main() { gl_out[gl_InvocationID].gl_Position = gl_in[gl_InvocationID].gl_Position; }

The number of the output patch vertices is set using the **layout** construct. It also determines the number of TCS invocations used to compute this patch data, since the shader will be executed one time for each vertex in the output patch.

The other function of a tessellation control shader is to specify how much to tessellate the output patch, or in other words, how many** tessellation coordinates** have to be generated. They can be thought as the domain for the evaluation function of the evaluation shader: the vertices of the final geometry in fact will be the values of the evaluation function at the tessellation coordinates.

Tessellation coordinates can be thought as a local-to-the-patch coordinate. When tessellating quads or isolines, they are represented by a 2d value (u,v) quite similar to the texture coordinate, while when tessellating triangles, they are barycentric coordinates.

The amount of tessellation is defined by modifying the variables **gl_TessLevelInner **and **gl_TessLevelOuter**, which control respectively how the interior and the perimeter of the domain is subdivided.

For example. to tessellate a line between two points in 16 segments, a possible solution could be:

layout (vertices = 2 ) out; void main { gl_TessLevelOuter[1]=16.0; gl_out[gl_InvocationID].gl_Position = gl_in[gl_InvocationID].gl_Position; }

The final stage of the tessellation pipeline is the Tessellation Evaluation Shader (TES), which is responsible to compute the actual values for the vertices. It can be thought as a kind of vertex shader, since it operates on each vertex of the patch. Unlike the TCS, which is optional, the TES is a mandatory part of tessellation: if it is not present, then tessellation doesn’t happen. The lines below describe a possible implementation of a TES. Note that the model-view and the projection matrix are multiplied at this final stage. In this sense it can be thought as the final stage of the vertex processing.

void main() { float u = gl_TessCoord.x; float v = gl_TessCoord.y; vec4 pos = function( u, v ); gl_Position = matProjection * matModelView * pos; }

It is possible to define a plane curve in the 3D space through a second degree polynomial vectorial equation:

From a geometrical point of view, we can consider a plane curve as a curve passing by three points:

Combining the two previous equations we obtain the parametric form of a plane curve as a single vector equation:

are the vertices of the OpengGL patch to be tessellated, the tessellation control shader controls the amount of *u* tessellation coordinates, and the tessellation evaluation shader evaluates the parametric function of the curve at the u coordinate.

vec4 plane_curve(float u) { vec4 p = (2*u*u - 3*u + 1) * gl_in[0].gl_Position; p += (-4*u*u + 4*u) * gl_in[1].gl_Position; p += (2*u*u - u) * gl_in[2].gl_Position; return p; } void main() { float u = gl_TessCoord.x; float v = gl_TessCoord.y; vec4 pos = plane_curve( u ); gl_Position = matProjection * matModelView * pos; }

Hermite curve in 3D space are defined by cubic polynomials parametric functions. Therefore, can be expressed using a single vector equation like this:

The four vectors are needed to define the four coefficients can be 4 points in the space, or 2 points and 2 tangent vectors. In the latter form, the equation become

The approach used for the evaluation function is similar to the one adopted for the plane curve: compute the vertex position using the “u” tessellation coordinate as the parameter of the evaluation function. However, in this case, the two tangent vectors can be set using uniform variables within the evaluation shader, and are not vertices of the control points provided by GL_PATCHES. Here it is a snippet of the evaluation shader

uniform vec3 vTan0; uniform vec3 vTan1; /////////////////////////////////////////////////// // function to calculate the hermite function value vec3 hermite(float u, vec3 p0, vec3 p1, vec3 t0, vec3 t1) { float F1 = 2.*u*u*u - 3.*u*u + 1.; float F2 = -2.*u*u*u + 3*u*u; float F3 = u*u*u - 2.*u*u + u; float F4 = u*u*u - u*u; vec3 p = F1*p0 + F2*p1 + F3*t0 + F4*t1; return p; } void main() { float u = gl_TessCoord.x; float v = gl_TessCoord.y; vec3 vPos0 = vec3( gl_in[0].gl_Position ); vec3 vPos1 = vec3( gl_in[1].gl_Position ); vec3 v3pos = hermite( u, vPos0, vPos1, vTan0, vTan1 ); vec4 pos = vec4( v3pos, 1.); gl_Position = matProjection * matModelView * pos; }

As a further (and final) example, a tessellation shader to rasterize Bezier curve has been implemented.

The implementation is quite simple, and it is limited to third-degree Bezier curve.

The parametric equation of a third-degree Bezier curve is the following:

The evaluation shader can be written in a way to highlight the Bernstein Polynomials, *basis function* of the Bezier curve:

/////////////////////////////////////////////////// // function to evaluate the Bezier curve vec3 bezier(float u, vec3 p0, vec3 p1, vec3 p2, vec3 p3) { float B0 = (1.-u)*(1.-u)*(1.-u); float B1 = 3.*u*(1.-u)*(1.-u); float B2 = 3.*u*u*(1.-u); float B3 = u*u*u; vec3 p = B0*p0 + B1*p1 + B2*p2 + B3*p3; return p; }

Since it is a third-degree, the control polygon is made up by four points, therefore, the OpenGL patch will be made up by four control points too.

]]>

Image convolution is a well known topic for computer graphics applications. There are a lot of implementations available for download and also many examples within the CUDA SDK.

One of the main issues for convolution on GPU is to access data of the pixels around the active pixel to be computed. In the CUDA SDK there is an optimized examples which tackle this issue using 2 separable filters for the horizontal and vertical directions. You can find further explanation here and here .

The implementations provided with the SDK are highly optimized, but they are hard to integrate in a real application. For example, it is not so easy to change the kernel radius, because you have to recompile all the GPU code with different parameters.

I developed another CUDA implementation, used for radial blurring, which is very efficient since it makes a huge usage of shared memory.

Simple benchmarks shows that **my implementation is fast enough**: for a 512×512 grayscale image, my implementation takes less than 0.7 ms, while the separable convolution takes more than 0.4 ms.

Further, shared memory is sized via **template parameter to the kernel function** in accord with the kernel radius. This is an useful trick, because it makes possible to easily change the kernel radius without recompile the kernel.

This implementation works for single channel images, but you can easily extend it for multi-channel images. It is very simple and easy to integrate into an existing project.

I developed two version of it, the first in which image is loaded into global memory, and the second in which the image is loaded as a texture object.

Sources and wiki are available on github.

Feel free to download and improve

]]>