Signup for new content


AR By Hand – Part 5 – Video and OpenGL

Process video and use OpenGL for rendering.

Welcome to the last chapter of this AR series. The previous chapter ended up by drawing a cube model on top of the marker paper. This chapter will close the whole series by showing you how to connect the whole system with OpenGL.

For start, let me give you a few words about the OpenGL. Although the end goal for this project is to render a simple cube, it’s not that trivial to do that in the OpenGL. You will still need knowledge about the graphics pipeline, shaders, and various buffer objects. Just covering this is already enough for its own tutorial series. Luckily, there is a great book called “Computer Graphics Programming in OpenGL with JAVA Second Edition” written by Gordon, V. Scott. I would encourage you to read this one if you want to learn about OpenGL. I personally used this book to create this project. The only notable difference is that JOGL library is now available in the maven repository, which makes installation super easy.

The source code for this chapter starts at CameraPoseJoglTestApp executable class.

Video Processing

This is an easy part. There is a java library called xuggle-video from OpenIMAJ. Processing video works in the way, that you open the reader, register listeners, and read packets while processing the events until the end of the stream.

As for the source code. There is a VideoUtils class which allows processing videos synchronously using lambda functions. This one I used to produce videos in the previous chapters. In addition, there is a VideoPlayer class which plays the video in the separate thread and lets you process the frames in the callback. This is the class used in the OpenGL application.


I am going to cover only how to fit the previously created matrices into the OpenGL ones. The general usage of OpenGL to draw shapes is not discussed here.

In the previous chapter, you learned about 3×3 matrix \(K\) and 3×4 matrix \(V\). When you start work with OpenGL, then you will see that all the matrices are 4×4. What a hell?

Don’t worry, they are all related. Detailed article series, although a bit difficult to digest, covering also this point was published by Kyle Simek back in 2012.

Be aware that it requires some effort to make things displayed correctly. There are many conventions on the way and mistake in just one sign will result in weird result, or nothing is displayed at all.

To recap. The result of the previous chapter was internal matrix \(K\) and external matrix \(V\). And you could multiply them as \(P=KV\) to get the projection matrix. The full matrices are these ones.

K=\begin{bmatrix} f_x & s & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix} \\
V=[R|T]=\begin{bmatrix} r_{11} & r_{12} & r_{13} & t_x \\ r_{21} & r_{22} & r_{23} & t_y \\ r_{31} & r_{32} & r_{33} & t_z \end{bmatrix}

Projection and Internal

The OpenGL projection matrix is related to the internal matrix \(K\). OpenGL Projection Matrix is a nice article which explains how the whole projection and clipping works. To give you a brief idea, let’s look at the following image (taken from that article).

The left image shows the space which is displayed at the end of the graphics pipeline. Anything out of that cut pyramid is not displayed. The camera is in the origin, oriented towards negative Z. There are 2 parallel planes forming the top and bottom of the pyramid. They are called near and far and are defined by the scalar values. Near plane is also the place where the pixels are projected. These are OpenGL parameters introduced for practical reasons and you need to choose them.

The right image shows the mapping of the volume into normalized device coordinates (NDC). This allows things like clipping or depth buffers.

In summary, the OpenGL projection matrix does 2 things – perspective projection and mapping to NDC. This can be expressed as following matrix multiplication.
Therefore, you need to create these 2 matrices, having the following parameters.

  • Values of the matrix \(K\) (\(f_x,\ f_y,\ s,\ c_x,\ c_y\))
  • near and far values – I have chosen 0.1 and 1000 respectively
  • width and height are the width and height in pixels of the original input image

With all the parameters, it’s possible to write down the matrices right away.

f_x & s & -c_x & 0 \\
0 & f_y & -c_y & 0 \\
0 & 0 & near + far & near * far \\
0 & 0 & -1 & 0
\end{bmatrix} \\
P_{NDC}= \begin{bmatrix}
\frac{-2}{width} & 0 & 0 & 1 \\
0 & \frac{2}{height} & 0 & -1 \\
0 & 0 & \frac{2}{near-far} & \frac{far+near}{near-far} \\
0 & 0 & 0 & 1
\end{bmatrix} \\

Code for this in in the class Ar, method toGlProj.

View and External

The OpenGL view matrix \(V_{GL}\) is easy to construct by taking the \(V\) matrix and add the \([0,0,0,1]\) vector as the 4th row. Like this.
r_{11} & r_{12} & r_{13} & t_x \\
r_{21} & r_{22} & r_{23} & t_y \\
r_{31} & r_{32} & r_{33} & t_z \\
0 & 0 & 0 & 1
Code for this in in the class Ar, method toGlV.

Everything Together

Finally, you can see and run everything through the CameraPoseJoglTestApp executable class. Few things to mention.

  • Video is rendered to the texture, which is then drawn to the screen as 2 triangles.
  • Video processing and OpenGL loop need synchronization. Otherwise, it won’t work.
  • There are 2 sets of the shader programs. One for the background video, one for the 3D world.

That’s it. Here you have the video with the result.


And this is the end. Although there is s*^^*t lot of space for improvements, I hope you have enjoyed this series and learned something new. I would be more than happy if you post me your comments, questions, suggestions for improvements, or ideas for other AR-related projects. Or just another topic you are struggling with. I would love to be helpful.

See you in something else (;

Almost there!

Enter your information to download the project.