## Personal Mini Toolchains

It’s all about repetition. If you found yourself doing the same thing again and again for many times, then chances are, you want to simplify that and eventually automate it (unless it is sex). Even the small task makes sense to automate if the amount of repetition is high enough. Let me give you an example.

Let’s imagine you write a script to synchronize files in between the projects. It took you 8 hours to create that script and each run saves 10 minutes of your time. This means you need to run that script 48 times to “pay off” your initial investment. Anything more than that is your “profit”.

Saving 10 minutes here and there might not seem like a big deal. But if you know how to identify and effectively implement more small tasks like this, then it will scale up very quickly and create you a lot of time to do anything you like. For example, to make comic t-shirts (;

There are many ways of automating everything. This time, I will give you the real example of creating a Personal Mini Toolchain in Java. In general, a toolchain is a program that connects together several tools together to perform a complex task. I call it Personal Mini Toolchain because it exists to save your personal time and you can start in a very minimalistic way (and grow it up from there). The great news is, that once you understand the principles, then you can apply them in any area, industry, and using any technology you like.

## Step 1 - Figure Out

The most important step is to figure out what is worth and realistic for toolchain to handle. Only then it makes sense to worry about it. Worth is about return on the investment. Realistic means that the set of actions needs to be simple enough and flow from start to end without much of decision logic. Let’s imagine you put the actions into a graph like it is in the image below. You have a better chance to automate the one which is on the left side rather than the one on the right side.

Let’s look at the concrete example. As a hobby, I am creating my own Tyracorn game engine (youtube playlist is here). I started the whole project as a standard Java desktop application using the maven build system. At a certain point, I decided to run it also on Android. So I opened up the Android developer portal and started to study. I learned how to create an Android project, how to make an OpenGL context, and eventually made it work.

Now I had 2 projects, so whenever I made changes in the main project and wanted to test them on my phone, then I had to do the following.

• Open Android studio
• Copy and paste the source files
• Build project
• Install project on my phone

After I did this like 50 times, I got bored and started to figure a way how a computer can make this for me. It is important that I did that so many times. It validated the sense of automation. And manual exercise gave me enough insights to do the job.

Now, what is the lesson of this? Start your automation by doing everything manually again and again. This will confirm you that this is a bit worth investing in the effort, and give you the chance to simplify and deeply understand what are you doing. And remember, you don’t need to cover everything right from the beginning. You can start even with a single little portion, it will grow naturally over time.

## Step 2 - Create

Now, let’s focus on my concrete example. Because of the manual experience, I figured out that my mini toolchain will be helpful if it does these.

• Generate the Android project. Then I can open the Android Studio and look at it.
• Build and install the project on my device. All this without the need to touch the Android Studio.
• Have the ability to add more platforms and tasks later on.

I started by creating a new maven project configured to produce an executable fat jar. The code takes just the first two arguments from the command line, uses reflection to look up the class and method by these, and calls it passing the rest of the arguments as a list. It looks like this.

/**
* Main entry to the application.
*
* @param args arguments
*/
public static void main(String[] args) {
List as = Arrays.asList(args);
if (as.size() < 2) {
printHelp();
System.exit(1);
}

runCommand(as.get(0), as.get(1), as.subList(2, as.size()));

System.out.println("");
System.out.println("-----------------");
System.out.println("Job Done!");
System.out.println("-----------------");
}

/**
* Runs the command.
*
* @param command command
* @param method method
* @param args arguments
*/
public static void runCommand(String command, String method, List args) {
try {
String className = "com.tyracorn.toolchain." + StringUtils.capitalize(command) + "Command";
Class commandClass = Class.forName(className);
Method m = commandClass.getMethod(method, List.class);
m.invoke(null, args);
} catch (ClassNotFoundException | NoSuchMethodException | SecurityException | IllegalAccessException |
IllegalArgumentException | InvocationTargetException e) {
throw new RuntimeException(e);
}
}


Then I started to study the Android documentation to see if I can build and install the project on my phone just from the command line. Turned out that Google did a great job by preparing Gradle tasks to do all the above. And, as a bonus, I can also automatically sign the application by my secret keys before uploading to the play store.

I used a plain Android project as a base. Then replaced certain values in build.gradle by placeholders (e.g. applicationId is replaced by ‘\$tyracorn.application.id’) which will be filled in later on by toolchain. Then packed the whole directory into a zip file and place it into a toolchain project resource directory. In addition, in the project to be converted, I created a specific directory and placed configuration file and additional resources (e.g. icons in various formats) in there. Then the workflow is following.

• Clean up target directory
• Unzip the template
• Copy over the source code and assets
• Copy over the additional Android specific resources
• Override placeholders from the template by the actual values
• Build project
• Install on the phone

If you are interested, then here is the source code.

/**
* Generates the android project.
*
* @param args arguments
*/
public static void generate(List args) {
try {
// prepare directories
System.out.println("preparing directories");

File projDir = new File(args.get(args.size() - 1)).getCanonicalFile();
Guard.beTrue(projDir.isDirectory(), "%s is not a directory", projDir.getAbsolutePath());
File targetDir = new File(projDir, "target");
if (!targetDir.isDirectory()) {
targetDir.mkdirs();
Guard.beTrue(targetDir.isDirectory(), "unable to create a target directory");
}
File androidProjectDir = new File(targetDir, "android-project");
if (androidProjectDir.isDirectory()) {
FileUtils.deleteDirectory(androidProjectDir);
}
Guard.beFalse(androidProjectDir.exists(), "android-project directory cannot exists in this point");

// prepare properties
Map properties = new HashMap<>();
for (Object key : System.getProperties().keySet()) {
properties.put((String) key, System.getProperty((String) key));
}
for (int i = 0; i < args.size() - 1; ++i) {
String arg = args.get(i);
if (arg.equals("-c")) {
Guard.beTrue(i < args.size() - 2, "config directory must be specified after -c argument");
File cdir = new File(args.get(i + 1));
Guard.beTrue(cdir.isDirectory(), "config directory must be an existing directory: %s", cdir);
System.out.println("applying config: " + cdir.getAbsolutePath());
Map signingProps = Dut.copyMap(Props.load(new File(cdir, "signing.properties")));
String storeFilePath = new File(cdir, signingProps.get("tyracorn.signing.storeFile")).getAbsolutePath().replaceAll("\\\\", "/");
signingProps.put("tyracorn.signing.storeFile", storeFilePath);
properties.putAll(signingProps);
i = i + 1;
}
}

Config config = Config.load(new File(projDir, "src/main/platforms/android/config.json"));
Pom pom = Pom.load(new File(projDir, "pom.xml"));

// unpack template
System.out.println("unpacking template");
unzipResource("android-project.zip", targetDir);

// copy files from main project
System.out.println("copying source files from the main project");
File srcSrcDir = new File(projDir, "src/main/java");
File srcTargetDir = new File(targetDir, "android-project/app/src/main/java");
FileUtils.copyDirectory(srcSrcDir, srcTargetDir);

List excludeClasses = config.getStringList("excludedClasses");
for (String ec : excludeClasses) {
File cfile = new File(srcTargetDir, ec.replaceAll("\\.", "/") + ".java");
if (cfile.isFile()) {
Guard.beTrue(cfile.delete(), "unable to delete file %s", cfile);
}
}

System.out.println("copying asset files from the main project");
File assetsSrcDir = new File(projDir, "src/main/assets");
File assetsTargetDir = new File(targetDir, "android-project/app/src/main/assets/external");
FileUtils.copyDirectory(assetsSrcDir, assetsTargetDir);

System.out.println("copying app dir from the android configuration");
File srcAppDir = new File(projDir, "src/main/platforms/android/app");
if (srcAppDir.isDirectory()) {
File targetAppDir = new File(targetDir, "android-project/app");
FileUtils.copyDirectory(srcAppDir, targetAppDir);
}

System.out.println("merging templates");
String gid = pom.getGroupId();
String aid = pom.getArtifactId();
if (aid.contains("-")) {
String[] parts = aid.split("\\-");
aid = parts[parts.length - 1];
}
String appId = gid + "." + aid;
String appVersion = pom.getVersion().replace("-SNAPSHOT", "");
Guard.beTrue(StringUtils.isNumeric(appVersion), "only numberic verion is supported, please look to the pom file: %s", appVersion);

String signingStoreFile = properties.getOrDefault("tyracorn.signing.storeFile", "tyracorn-dev.jks");
String signingKeyAlias = properties.getOrDefault("tyracorn.signing.keyAlias", "tyracorn-dev");

Map vars = Dut.map(
"tyracorn.application.id", appId,
"tyracorn.application.version", appVersion,
"tyracorn.signing.storeFile", signingStoreFile,
"tyracorn.signing.keyAlias", signingKeyAlias,
"startScreen", config.getString("startScreen"));

System.out.println("merging build properties");

System.out.println("merging launch screens");
File mainActivity = new File(targetDir, "android-project/app/src/main/java/com/tyracorn/android/MainActivity.java");
Templates.merge(mainActivity, vars);

} catch (IOException e) {
throw new RuntimeException(e);
}
}

/**
* Builds the android project.
*
* @param args arguments
*/
public static void build(List args) {
generate(args);
try {
File projDir = new File(args.get(args.size() - 1)).getCanonicalFile();
File targetDir = new File(projDir, "target");
File androidProjectDir = new File(targetDir, "android-project");

String gradlePath = androidProjectDir.getCanonicalPath() + File.separator + "gradlew.bat";

System.out.println("building the project");
String buildRes = Cmds.executeSimple(androidProjectDir, gradlePath, "build");
System.out.println(buildRes);
} catch (IOException e) {
throw new RuntimeException(e);
}

}

/**
* Installs the project to the device.
*
* @param args arguments
*/
public static void install(List args) {
build(args);
try {
File projDir = new File(args.get(args.size() - 1)).getCanonicalFile();
File targetDir = new File(projDir, "target");
File androidProjectDir = new File(targetDir, "android-project");

String gradlePath = androidProjectDir.getCanonicalPath() + File.separator + "gradlew.bat";

System.out.println("installing the project to the device");
String installRes = Cmds.executeSimple(androidProjectDir, gradlePath, "installRelease");
System.out.println(installRes);
} catch (IOException e) {
throw new RuntimeException(e);
}
}


There is also a part which allows me to specify a directory with production signing keys. This allows me to use the same tool when making package for the play store while keeping the secrets out of the application versioning control system. I added this later on.

Now, I know this is far away from perfect, but it does a job. It saves me time. So I decided it is good enough for now and moved to the next thing. Knowing when to stop is an important part of the programming job. Not everything needs to be super generic and perfect. The main point is that it serves to you.

## Step 3 - Use and Maintain

Having the compiled jar file, usage is very simple. Open the command line and call java -jar jar-file-path.jar android install path\to\project\dir. Then the toolchain generates the Android project, builds it, and installs it on the connected phone. Alternatively, it’s possible to replace install by build or generate to stop earlier in the process. In addition, adding -c path\to\config\dir after the install applies external configuration (e.g. sign apk with the production keys).

Maintenance is very simple. Unless Google decides to change the build process (which is not happening that often), then it’s only about using the toolchain and adding new capabilities when they become useful. And that’s very easy to do because the code is not trying to be generic, configurable, and scalable to millions of users. This is the type of code, which is focused only on you.

If you have read it up to here, then you can download my Tyracorn Showpark application, which was fully built by this toolchain. It is a showcase application for my game engine. I know the graphic isn’t nice at the time of writing this article, but the system works well. Eventually, I will hire a freelancer, using the system as I described in the Mastering Freelancers guide, to make it pretty.

## Summary

Now you have seen how to quickly build a toolchain. The most important thing is to figure out the piece to automate and simplify that as much as possible. Then start by creating a simple toolchain for that one purpose. Don’t try to cover everything. That will come to you over time. And finally, use it a lot to create yourself more time for whatever you love.

## OpenGL – PC and Android

I started my OpenGL journey by implementing simple applications. Here you can enjoy my first demo (;

That’s was pretty good for a start. If you would like something similar, then there are plenty of good educational materials. Some are free, others cost a few protein shakes. I like these two.

You might prefer the different one, depending on your language and particular library.

After having the basics done, I decided to try to run this on my Android phone. It’s Java, “write once, run everywhere”, right? My #1 wish was that this stays true for the application code. Let’s have a look on how well it went.

Eventually, it worked out, mostly. And I was able to satisfy my #1 wish, having single source code across all platforms. But it took me more time than I originally thought.

Note that it was much harder to find tutorials and good reading. The most useful were these two sources.

Interesting issues appeared during the journey. I named as Solvables and Careables. Solvables are the ones, which can be solved by the system architecture. On the other hand, careables can’t be entirely solved. You need to care well enough so your user won’t notice them.

## Solvables

JOGL VS Android OpenGL ES. These are two java Open GL wrappers. They are very similar, but not exactly the same. So I ended up by abstracting them and creating separate Tyracorn application containers. Application can then be launched in any container. This allows me to support Nintendo (make it fun for my lovely kids (; ) and other platforms later on.

The container pattern also turned out to be a great solution for unified handling application lifecycle, various input sources, and assets. And the best is that I can easily add drivers for various sensors, cameras, or robotic pizza restaurants.

There is an interesting difference inside the shader language. OpenGL ES doesn’t allow to define sampler array and then refer to its elements by anything which isn’t a constant. This effectively means that it’s not possible to write a loop to apply multiple textures. This led to a number of if statements, where each branch has the same code with a different texture.

Working across different screens. The main thing is that aspect ratio dramatically changes. If you tune everything for the landscape orientation, then the portrait one won’t work. I implemented a way that things like camera or UI components have ability to decide their properties based on the display size. And everything can change during the runtime. This is not difficult to do, but it eats a piece of your time cake.

Finally, let me mention the nuances of Java. As of today, Android doesn’t support all Java 8 features out of the box. There are two choices on how to deal with that. The first way is to introduce additional tools that can pre-process your Java 8+ code to work on Android. The second way is to simply accept the Android limitations and write code according to that. I picked up the second way because I don’t like messing around with tools. At the same time, I really love easy to read standardized code. Therefore, I spend a lot of time to finding the code form which I like the best.

## Careables

There are two major careables – precision and performance.

Precision is related to the z-buffer and floating-point numbers. When I first run my application, I discovered that my mobile phone uses lower precision than my PC. Interesting. The result is more noticeable z-fighting and shadow artifacts. You might see other artifacts as well in any case you render objects close to each other. Typical advice on how to deal with these is about offsets and bias values. More advanced ones go into a way how to fake a higher precision. For example, split the scene into parts and render each of them with a separate z-buffer.

When I first time tried to run the shadow light example, I was shocked by seeing something like 3fps while rendering a simple cube with shadow. Took me a few days to discover that it was caused by two problems. The first problem was that I couldn’t specify empty color attachment during frame buffer creation in the OpenGL ES. This didn’t make the program to fail, but it made it to run terribly slow and keep showing some error in the log. The second problem was related to mobile phone architecture. Apparently battery life length is important so they decided to use a different architecture to save some power. This lead to an extra shuffling in between memories which took time. The solution for this was to explicitly clear the shadow buffer right after binding, so the copying didn’t need to take place.

Although having all this in place, the performance on mobile is still much lower than on PC. For example, 3 shadow lights already cause noticeable slowness on the mobile. And when I implemented PCF to make smooth shadows, it was a disaster. So I decided to keep shadows ugly on mobile phones.

As a lesson, I plan to do these two things in the feature.

• Create separate versions of assets for different devices
• Adjust scene funkiness based on the device

## Conclusion

Hope you enjoyed this one. If you would like to get the source code for all this and more, then please join me on Patreon.

I am already thinking about the next step. If you have anything you would like to see, then just let me know.

See you next time!

## AR By Hand – Part 5 – Video and OpenGL

Welcome to the last chapter of this AR series. The previous chapter ended up by drawing a cube model on top of the marker paper. This chapter will close the whole series by showing you how to connect the whole system with OpenGL.

For start, let me give you a few words about the OpenGL. Although the end goal for this project is to render a simple cube, it’s not that trivial to do that in the OpenGL. You will still need knowledge about the graphics pipeline, shaders, and various buffer objects. Just covering this is already enough for its own tutorial series. Luckily, there is a great book called “Computer Graphics Programming in OpenGL with JAVA Second Edition” written by Gordon, V. Scott. I would encourage you to read this one if you want to learn about OpenGL. I personally used this book to create this project. The only notable difference is that JOGL library is now available in the maven repository, which makes installation super easy.

The source code for this chapter starts at CameraPoseJoglTestApp executable class.

## Video Processing

This is an easy part. There is a java library called xuggle-video from OpenIMAJ. Processing video works in the way, that you open the reader, register listeners, and read packets while processing the events until the end of the stream.

As for the source code. There is a VideoUtils class which allows processing videos synchronously using lambda functions. This one I used to produce videos in the previous chapters. In addition, there is a VideoPlayer class which plays the video in the separate thread and lets you process the frames in the callback. This is the class used in the OpenGL application.

## OpenGL

I am going to cover only how to fit the previously created matrices into the OpenGL ones. The general usage of OpenGL to draw shapes is not discussed here.

In the previous chapter, you learned about 3×3 matrix $$K$$ and 3×4 matrix $$V$$. When you start work with OpenGL, then you will see that all the matrices are 4×4. What a hell?

Don’t worry, they are all related. Detailed article series, although a bit difficult to digest, covering also this point was published by Kyle Simek back in 2012.

Be aware that it requires some effort to make things displayed correctly. There are many conventions on the way and mistake in just one sign will result in weird result, or nothing is displayed at all.

To recap. The result of the previous chapter was internal matrix $$K$$ and external matrix $$V$$. And you could multiply them as $$P=KV$$ to get the projection matrix. The full matrices are these ones.

$K=\begin{bmatrix} f_x & s & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix} \\ V=[R|T]=\begin{bmatrix} r_{11} & r_{12} & r_{13} & t_x \\ r_{21} & r_{22} & r_{23} & t_y \\ r_{31} & r_{32} & r_{33} & t_z \end{bmatrix}$

### Projection and Internal

The OpenGL projection matrix is related to the internal matrix $$K$$. OpenGL Projection Matrix is a nice article which explains how the whole projection and clipping works. To give you a brief idea, let’s look at the following image (taken from that article).

The left image shows the space which is displayed at the end of the graphics pipeline. Anything out of that cut pyramid is not displayed. The camera is in the origin, oriented towards negative Z. There are 2 parallel planes forming the top and bottom of the pyramid. They are called near and far and are defined by the scalar values. Near plane is also the place where the pixels are projected. These are OpenGL parameters introduced for practical reasons and you need to choose them.

The right image shows the mapping of the volume into normalized device coordinates (NDC). This allows things like clipping or depth buffers.

In summary, the OpenGL projection matrix does 2 things – perspective projection and mapping to NDC. This can be expressed as following matrix multiplication.
$P_{GL}=P_{NDC}P_{Pers}$
Therefore, you need to create these 2 matrices, having the following parameters.

• Values of the matrix $$K$$ ($$f_x,\ f_y,\ s,\ c_x,\ c_y$$)
• near and far values – I have chosen 0.1 and 1000 respectively
• width and height are the width and height in pixels of the original input image

With all the parameters, it’s possible to write down the matrices right away.

$P_{Pers}=\begin{bmatrix} f_x & s & -c_x & 0 \\ 0 & f_y & -c_y & 0 \\ 0 & 0 & near + far & near * far \\ 0 & 0 & -1 & 0 \end{bmatrix} \\ P_{NDC}= \begin{bmatrix} \frac{-2}{width} & 0 & 0 & 1 \\ 0 & \frac{2}{height} & 0 & -1 \\ 0 & 0 & \frac{2}{near-far} & \frac{far+near}{near-far} \\ 0 & 0 & 0 & 1 \end{bmatrix} \\ P_{GL}=P_{NDC}P_{Pers}$

Code for this in in the class Ar, method toGlProj.

## View and External

The OpenGL view matrix $$V_{GL}$$ is easy to construct by taking the $$V$$ matrix and add the $$[0,0,0,1]$$ vector as the 4th row. Like this.
$V_{GL}=\begin{bmatrix} r_{11} & r_{12} & r_{13} & t_x \\ r_{21} & r_{22} & r_{23} & t_y \\ r_{31} & r_{32} & r_{33} & t_z \\ 0 & 0 & 0 & 1 \end{bmatrix}$
Code for this in in the class Ar, method toGlV.

## Everything Together

Finally, you can see and run everything through the CameraPoseJoglTestApp executable class. Few things to mention.

• Video is rendered to the texture, which is then drawn to the screen as 2 triangles.
• Video processing and OpenGL loop need synchronization. Otherwise, it won’t work.
• There are 2 sets of the shader programs. One for the background video, one for the 3D world.

That’s it. Here you have the video with the result.

## Summary

And this is the end. Although there is s*^^*t lot of space for improvements, I hope you have enjoyed this series and learned something new. I would be more than happy if you post me your comments, questions, suggestions for improvements, or ideas for other AR-related projects. Or just another topic you are struggling with. I would love to be helpful.

See you in something else (;

## AR By Hand – Part 4 – Camera Pose

Welcome in part 4 of this AR series. In the previous chapter, you could see how homography makes possible to draw into a projected planar surface. This chapter will extend the previously calculated homography into form, which allows drawing 3D objects into the scene.

The program related to this chapter is CameraPoseVideoTestApp. You can download the whole project right here.

The structure here would be the same as in the previous chapter. First, you will see the equations and then the practical example at the end. Don’t be stress about the number of parameters and variables. It’s not that difficult, once it comes to coding.

## Camera and Homography

The camera is a device which projects points from 3D space into to 2D plane. For this project, I have chosen to use the classical pinhole camera model, without worrying about perspective distortions. This model makes point projection as simple as matrix-vector multiplication in homogeneous coordinates (arrows on top of the lower case letters symbolize vectors).

$\vec{p_{2D}}=P\vec{p_{3D}} \\ \begin{bmatrix} wx_{2D} \\ wy_{2D} \\ w \end{bmatrix} = \begin{bmatrix} p_{11} & p_{12} & p_{13} & p_{14} \\ p_{21} & p_{22} & p_{23} & p_{24} \\ p_{31} & p_{32} & p_{33} & p_{34} \end{bmatrix} \begin{bmatrix} x_{3D} \\ y_{3D} \\ z_{3D} \\ 1 \end{bmatrix}$

$$P$$ is called a projection matrix and has 3 rows, 4 columns. This realizes the dimension drop into the projection plane.

The camera is a product which has some properties, most notably it’s a focal length. Important is that these properties are constant for a given camera (assuming you are not zooming). This is the internal set of properties. Then there is an external set of properties which is a position and direction of the camera.

This can be reflected in a matrix language as decomposing the matrix $$P$$ into a 3×3 calibration matrix $$K$$ (internal matrix with camera properties), and a 3×4 view matrix $$V$$ (external matrix with camera position and rotation). These matrices are sometimes called intrinsic and extrinsic. And you drill them down into the following form.

$P=KV=K[R|T]=K[R_1|R_2|R_3|T]= \begin{bmatrix} f_x & s & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} r_{11} & r_{12} & r_{13} & t_x \\ r_{21} & r_{22} & r_{23} & t_y \\ r_{31} & r_{32} & r_{33} & t_z \end{bmatrix}$

• $$f_x$$ and $$f_y$$ are focal lengths in the respective axes.
• $$s$$ is a skew factor.
• $$c_x$$ and $$c_y$$ are the principal points of the camera.
• $$R$$ is a camera rotation matrix. $$R_1,R_2,R_3$$ are columns of the rotation matrix, and $$r_{ab}$$ are the elements. The rotation matrix is orthonormal (unit vectors, and orthogonal to each other). Remember this one, because it will be discussed later.
• $$T$$ is a camera translations vector with elements $$t_x,t_y,t_z$$.

### Calibration Matrix

All the elements of matrix $$K$$ are the properties of the camera. One way to get them is to make the proper measurement. If you want to do that, then OpenCV contains a pretty lot of materials for that. I just picked them up manually as the following.

• $$f_x,f_y=400\ or\ 800$$
• $$s=0$$
• $$c_x,c_y=$$ center of the input image (for 640×480 image, these will be 320 and 240)

### Relation with Homography

To show you how camera pose and homography are related, let’s start with writing down the equations for point projection.
$\begin{bmatrix} wx_{2D} \\ wy_{2D} \\ w \end{bmatrix} = \begin{bmatrix} p_{11} & p_{12} & p_{13} & p_{14} \\ p_{21} & p_{22} & p_{23} & p_{24} \\ p_{31} & p_{32} & p_{33} & p_{34} \end{bmatrix} \begin{bmatrix} x_{3D} \\ y_{3D} \\ z_{3D} \\ 1 \end{bmatrix} = K[R|T]\begin{bmatrix} x_{3D} \\ y_{3D} \\ z_{3D} \\ 1 \end{bmatrix} = \\ = K[R_1|R_2|R_3|T]\begin{bmatrix} x_{3D} \\ y_{3D} \\ z_{3D} \\ 1 \end{bmatrix} = \\ = \begin{bmatrix} f_x & s & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} r_{11} & r_{12} & r_{13} & t_x \\ r_{21} & r_{22} & r_{23} & t_y \\ r_{31} & r_{32} & r_{33} & t_z \end{bmatrix} \begin{bmatrix} x_{3D} \\ y_{3D} \\ z_{3D} \\ 1 \end{bmatrix}$

If $$z_{3D}=0$$, then equations will look like this.
$\begin{bmatrix} wx_{2D} \\ wy_{2D} \\ w \end{bmatrix} = \begin{bmatrix} p_{11} & p_{12} & p_{13} & p_{14} \\ p_{21} & p_{22} & p_{23} & p_{24} \\ p_{31} & p_{32} & p_{33} & p_{34} \end{bmatrix} \begin{bmatrix} x_{3D} \\ y_{3D} \\ 0 \\ 1 \end{bmatrix} = K[R|T]\begin{bmatrix} x_{3D} \\ y_{3D} \\ 0 \\ 1 \end{bmatrix} = \\ = K[R_1|R_2|R_3|T]\begin{bmatrix} x_{3D} \\ y_{3D} \\ 0 \\ 1 \end{bmatrix} = \\ = \begin{bmatrix} f_x & s & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} r_{11} & r_{12} & r_{13} & t_x \\ r_{21} & r_{22} & r_{23} & t_y \\ r_{31} & r_{32} & r_{33} & t_z \end{bmatrix} \begin{bmatrix} x_{3D} \\ y_{3D} \\ 0 \\ 1 \end{bmatrix}$

Then you can make the matrix multiplication to figure out, that you can drop the third column of the rotation matrix and z coordinate of the 3D point and get the same results (reminder, you can do this only if $$z_{3D}=0$$, otherwise it won’t work). This will give you the following.
$\begin{bmatrix} wx_{2D} \\ wy_{2D} \\ w \end{bmatrix} = \begin{bmatrix} p_{11} & p_{12} & p_{14} \\ p_{21} & p_{22} & p_{24} \\ p_{31} & p_{32} & p_{34} \end{bmatrix} \begin{bmatrix} x_{3D} \\ y_{3D} \\ 1 \end{bmatrix} = K[R_1|R_2|T]\begin{bmatrix} x_{3D} \\ y_{3D} \\ 1 \end{bmatrix} = \\ = \begin{bmatrix} f_x & s & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} r_{11} & r_{12} & t_x \\ r_{21} & r_{22} & t_y \\ r_{31} & r_{32} & t_z \end{bmatrix} \begin{bmatrix} x_{3D} \\ y_{3D} \\ 1 \end{bmatrix}$

Now note that $$[R_1|R_2|T]$$ is a 3×3 matrix and at the same time, you can consider that $$K[R_1|R_2|T]=H$$ from the previous chapter. That’s how homography is related to the camera projection. And that’s also why you can project points on the $$z=0$$ plane, without worrying about the camera internal parameters at all.

## Extending Homography

Going to full camera pose. Seems the easiest way is to calculate $$[R_1|R_2 |T]=K^{-1}H$$, then make $$R_3=R_1\times R_2$$ and have full $$[R|T]$$ matrix.

Unfortunately, this doesn’t work. Remember, a little bit above I mentioned that matrix $$R$$ is orthonormal? $$K$$ and $$H$$ are already coming out of estimations, carrying errors, so it’s not guaranteed that $$R_1$$ and $$R_2$$ obtained in this way are orthonormal. That would make the final image look weird. Therefore let’s make them orthonormal.

The implementation of the following text is available inside Ar class, method estimateMvMatrix. And here I would like to refer you the “Augmented Reality with Python and OpenCV” article written by Juan Gallostra. This is where I first discovered the method which I am going to describe at the moment.

Let’s start by constructing $$[G_1|G_2|G_3 ]=K^{-1}H$$. In the implementation, you will also see that I am negating the homography matrix before plugging it into the equation. That’s because the real pinhole camera would project the flipped image, but there is no flipping here.

Now $$[G_1|G_2|G_3]$$ is close to desired $$[R_1|R_2|T]$$, because it’s still the estimation. Therefore $$[G_1|G_2|G_3]$$ is nearly orthonormal. Then you can write.

$l=\sqrt{\| G_1 \| \| G_2 \|} ,\ \ G_1’=\frac{G_1}{l} ,\ \ G_2’=\frac{G_2}{l} ,\ \ G_3’=\frac{G_3}{l} \\ \vec{c}=G_1′ + G_2′ ,\ \ \vec{p}=G_1′ \times G_2′ ,\ \ \vec{d}=\vec{c} \times \vec{p} \\ R_1=\frac{1}{\sqrt{2}}\left( \frac{\vec{c}}{\| \vec{c} \| } + \frac{\vec{d}}{\| \vec{d} \| } \right) ,\ \ R_2=\frac{1}{\sqrt{2}}\left( \frac{\vec{c}}{\| \vec{c} \| } – \frac{\vec{d}}{\| \vec{d} \| } \right) \\ R_3=R_1 \times R_2 ,\ \ T=G_3′$

Then you can stack vectors into columns to get the final $$V=[R_1|R_2|R_3|T]$$ 3×4 matrix. Finally, compute $$P=KV$$ and start projecting points.

## Summary

Now you know, how to draw 3D objects into the scene. So far, all the drawing is done through the simple image operations, which is useful only for the basic demos. In the last chapter, you will discover how to hook up the whole thing with video and OpenGL to make more funky stuff.

## AR By Hand – Part 3 – Homography

Welcome in part 3 of this AR series. In the previous chapter, you could read about how to detect and track white A4 paper. The result was 4 points in the image corresponding to the corners. This chapter will use these points to build a homography. That’s the next step towards the AR experience.
This article goes first through the mathematics behind homography, and then shows the use case relevant to this project.

## Mathematics

Here, I will briefly review the terms and then derive the equation system for homography. If you follow this, you will see why you need to detect at least 4 points which aren’t on the line.

### Terms

Correspondence. Imagine have 2 photos of the same object from a slightly different position. Then the point $$p_1$$ and $$p_2$$ on the respective images are corresponding if they are projecting the same physical point. Sign for correspondence is $$p_1\ \hat{=}\ p_2$$.

Homogeneous coordinates. This is a coordinate system used in projective geometry and will be used here from now on as well. 2D vector $$\begin{bmatrix} x\\ y \end{bmatrix}$$ in cartesian coordinates is expressed as 3D vector $$\begin{bmatrix} wx\\ wy\\ w \end{bmatrix}, \forall w\neq 0$$ in homogeneous coordinates. Similarly, 3D vectors in cartesian coordinates are 4D vectors in homogeneous coordinates. Also, note that $$\begin{bmatrix} w_{1}x\\ w_{1}y\\ w_1 \end{bmatrix}=\begin{bmatrix} w_{2}x\\ w_{2}y\\ w_2 \end{bmatrix}, \forall w_1\neq 0,\ w_2\neq 0$$ in homogeneous coordinates.

Matrices are used to represent certain geometric transformations in the homogeneous coordinates. Transformation of the point $$p$$ is realized by a simple multiplication so that $$p’=Mp$$. In addition, transformations can merged into a single one by the standard matrix multiplication.

### Homography Equations

Mr. Wikipedia says that any two images of the same planar surface in space are related by a homography (assuming a pinhole camera model).

In other words, if $$I$$ and $$I’$$ are 2 images, containing same planar surface, then there exists a 3×3 matrix $$H$$ which maps points $$p$$ into corresponding points $$p’$$, such as $$p’=Hp$$. Remember that these points must be on that plane.

Let’s write down the equations in more details.

$H=\begin{bmatrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & h_{33} \end{bmatrix} \\ \begin{bmatrix} w’x’ \\ w’y’ \\ w’ \end{bmatrix}=\begin{bmatrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & h_{33} \end{bmatrix}\begin{bmatrix} wx \\ wy \\ w \end{bmatrix} \\ \begin{bmatrix} w’x’ \\ w’y’ \\ w’ \end{bmatrix}=\begin{bmatrix} h_{11}wx + h_{12}wy + h_{13}w \\ h_{21}wx + h_{22}wy + h_{23}w \\ h_{31}wx + h_{32}wy + h_{33}w \end{bmatrix}$

The goal is to figure out 9 elements of matrix $$H$$. Without losing any generality, you can assume that $$w = 1$$ and switch to the cartesian coordinates by division. This will make the following equation.

$\begin{bmatrix} x’ \\ y’ \end{bmatrix}=\begin{bmatrix} \frac{h_{11}x + h_{12}y + h_{13}}{h_{31}x + h_{32}y + h_{33}} \\ \frac{h_{21}x + h_{22}y + h_{23}}{h_{31}x + h_{32}y + h_{33}} \end{bmatrix}$

This equation system has 9 degrees of freedom. Luckily, you can multiply all elements of $$H$$ by a non-zero $$k$$ without having affecting the solution at all. This removes 1 degree of freedom and opens 2 possible ways for a solution.

First way is to set $$h_{33} = 1$$. You can do this as soon as $$h_{33}\neq 0$$. Second, more general way, is to impose unit vector constraint such as $$h_{11}^2+h_{12}^2+h_{13}^2+h_{21}^2+h_{22}^2+h_{23}^2+h_{31}^2+h_{32}^2+h_{33}^2=1$$. Here I will use the first way because it seems to be more intuitive and better supported by the numerical libraries.

### Homography Solution

Setting $$h_{33}=1$$ will give the following.

$\begin{bmatrix} x’ \\ y’ \end{bmatrix}=\begin{bmatrix} \frac{h_{11}x + h_{12}y + h_{13}}{h_{31}x + h_{32}y + 1} \\ \frac{h_{21}x + h_{22}y + h_{23}}{h_{31}x + h_{32}y + 1} \end{bmatrix}$

After separating to the components, multiplying, and reorganizing you will get these 2 equations.

$x’=h_{11}x + h_{12}y + h_{13} – h_{31}xx’ – h_{32}yx’ \\ y’=h_{21}x + h_{22}y + h_{23} – h_{31}xy’ – h_{32}yy’$

These are the linear equations with 8 unknowns. Therefore, in theory, it is required to have 8 equations (with certain preconditions to make sure the system is not degenerated) to be able to figure out the unknowns.

In practice, we have an estimated 4 corner points of the marker paper. Although there are some errors carried out of the image processing part, these points do not lie on a single line. Therefore it is possible to plug them into equations and use numerical methods to get the approximated solution with minimal error. This is how the equations look like.

$\begin{bmatrix} x_1 & y_1 & 1 & 0 & 0 & 0 & -x_1x_1′ & -y_1x_1′ \\ 0 & 0 & 0 &x_1 & y_1 & 1 & -x_1y_1′ & -y_1y_1′ \\ x_2 & y_2 & 1 & 0 & 0 & 0 & -x_2x_2′ & -y_2x_2′ \\ 0 & 0 & 0 &x_2 & y_2 & 1 & -x_2y_2′ & -y_2y_2′ \\ \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots \end{bmatrix} \begin{bmatrix} h_{11} \\ h_{12} \\ h_{13} \\ h_{21} \\ h_{22} \\ h_{23} \\ h_{31} \\ h_{32} \end{bmatrix} = \begin{bmatrix} x_1′ \\ y_1′ \\ x_2′ \\ y_2′ \\ \cdots \end{bmatrix}$

I won’t go into numerics here. I just use the solver provided by the mathematics library to get a solution like this one.

$H=\begin{bmatrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & 1 \end{bmatrix} \\$

The source code for homography estimation is located inside the Ar class, method estimateHomography.

## Use Case

Homography has several use cases, you can easily find them on the internet. Here just the one relevant to this project. Let’s estimate the homography in the way that detected rectangle corresponds to the fixed rectangle. Then draw the blue square to the fixed rectangle and corresponding square to the original image. The result is right below.

## Summary

This chapter covered homography. This allows you to draw into the planar surfaces of the image. In the next chapter, you will learn how to extend homography to get the projection matrix and be able to draw 3D objects lying on top of that plane.

## AR By Hand – Part 2 – Plane Detection

Welcome back to AR series. The previous chapter introduced the whole project. This chapter will cover the first topic on the list, plane detection, and tracking. At the end of this article, you will be able to identify the corner points of the A4 paper from the input image, in the right order so you can draw a rectangle contour in there. Example outcome of this chapter is in the following image.

The full implementation is available in the accompanying project. The download button is right below this paragraph. The main class for this article is WhiteMarkerTracker. I would encourage you to look into the source code while reading the article. Now let’s get into it.

## Pre-processing

Pre-processing is the first phase in image processing. The goal of the pre-processing is to clean up the image and extract only the information usable in the further phase. Since, in many cases, this phase has to go through every pixel of the image, only relatively simple operations can be performed. The result is a list of “features” (you will see what this word means in a little bit) useful for more detailed processing. And many times, it’s desirable to have a much smaller number of features than the number of pixels.

In our case, the goal is to identify the white rectangle. And the good way to start is by identifying the pixels lying on the edge of the white area. That pixels are the “features” in this particular context. And they can be extracted by the following process.

1. Threshold pixel by the color (white color must have red, green and blue components high enough).
2. Identify all the connected areas (blobs).
3. Pick up the biggest blob (assuming the marker paper is the dominant white object in the image) and throw away all the others. This cleans certain artifacts.
4. If the biggest blob doesn’t have enough volume (means number of pixels), then exit.
5. Identify contour points of the blob. These are the white pixels next to the black pixel.

The result is a set of contour points, illustrated in the image below. To give you rough numbers. Input image has 640×480 pixels, which is slightly over 300,000 pixels in total. Pre-processing chooses, given reasonable input, less than 3,000 pixels. This reduces the amount of data by the factor of 1,000.

Let me add one more note. Based on the way how you acquire the input image, you might need to apply additional operation(s) like Gaussian smoothing to get the reasonable contour. I have used an image from a video sequence, where compression algorithm already did a similar job, therefore it wasn’t necessary in my case.

Once contour pixels are selected, then the next phase can start.

## Detection

Now you have a set of contour pixels. So what to do with them? Although you can see they mostly form the edges of the marker paper, there are still some errors. In addition, they are just pixels, which doesn’t tell the position of the corner points. And in some cases, corner points locations might be not that well defined. Like in the following image.

Detection algorithm first identifies 4 edge lines and then calculates the corner positions from the intersections. Note that there are several points on the way where the algorithm can fail to detect something. In such a case, it reports that there wasn’t anything detected.

### Edge Lines

The good method for identifying edge lines, while having errors and outliers in there, is RANSAC (Random Sample Consensus). The general RANSAC workflow is following.

1. Repeat N times (N is up to you)
1.  Randomly choose a minimal number of points you need to build the model
2. Build the model from chosen points
3. Compare the model with other sample points and calculate the number of good fits (inliers, or points which are close enough to the expected positions)
4. If the number of inliers is high enough, then accept the model and terminate the cycle
2. You either have a model, or there is a high probability that the model doesn’t exist

Now more concrete for the edges of the marker paper. The main difference is that we want to find 4 lines lying as much as possible over the contour points. For this, it is necessary to choose the maximal number of iterations we are willing to take (N), minimum number of points lying “close enough” to the line in order to accept the line (minAccept – good is to use % of the total number of sample points), and distance from the line which is considered as “close enough” (dstThres). The full algorithm is in the class Fitting, method linesRansac. Here just a brief description.

2. Repeat max N times, stop if result list has desired a number of lines (4 in this case)
1. Pick up 2 different random points from sample set
2. Build the line
3. Calculate the number of inliers (max distance from the line is dstThres)
4. If the number of inliers is greater or equal to the minAccept parameter, then
1. Add the line to the result list
2. Remove inliers from the sample set (to prevent the same line being detected again)
3. Return the result list

If you run this algorithm, then you will “ideally” (I will get back to the word “ideally” later in this chapter) end up with lines like in this image.

You see, RANSAC is tolerant of the various form of distractions. All you need is to have enough number of sample points being “close enough”. Now once edge lines are known, the final shape can be extracted.

## Paper Shape

Going from edges to the corner points is a matter of identifying which lines are perpendicular, calculating intersections and ordering them counterclockwise. The full implementation is in the WhiteMarkerTracker class, method extractPoints.

Identifying the perpendicular lines is possible by the angle examination because we know that the rectangle has 2 sets of 2 parallel lines. If you select any edge line, then the parallel line will always have the smallest angle in between. And 2 perpendicular lines are the remaining lines which are not the parallel one. The angle between the lines is possible to calculate from the line equations. The same for the intersection. The ordering of the points just requires to use a little bit of vector algebra and messing around.

If everything is done, then you should be able to see the image like this one.

So, are we done? Not so fast…

## RANSAC Problems

Remember, before I told the word “ideally”? This part is all about that.

Let’s make a little experiment. Let’s take the example image from this chapter and make a 200 frames video out of it. In every frame let’s perform the plane detection as described so far and follow up by estimating AR parameters and draw a 3D cube on top of the plane (don’t worry if you don’t know how to do this yet). This is how the result would look like.

The cube is shaking, although it should stay still. In addition, there are several frames which are completely wrong. This is caused by the nature of the RANSAC method.  RANSAC is a probabilistic method which randomly selects points to create a model. This has two consequences.

1. Every estimation is slightly different, although most of them are reasonably good. This is the reason for shaking. Especially because the errors are summing up on the way.
2. There is a small chance that some model is wrong yet still fits enough points to be accepted. This is the reason for several frames being totally wrong.

To be more illustrative, let’s see how the current line detection looks like.

At this video, you can clearly see that 2 RANSAC detections of the same line might be both reasonably good, yet slightly different. This is the root cause of the shaking cube in the previous video. Also, time to time you can see the miss-detection causing single edge being detected twice. This is the root cause of the cube being rendered in the wrong position.

How to improve that?

## Stabilization With Tracking

Although simple RANSAC method isn’t good enough to produce a good result, it’s still a reasonable start. Therefore, let’s use the initial detection and enhance it.

There are 2 enhancements (both are implemented inside WhiteMarkerTracker class, method trackMarker).

1. Track previously detected shape
2. Smooth the estimation by averaging

First, let’s discuss the tracking. Tracking is done a frame by frame. In each frame, the algorithm knows the parameters of the old object (4 corner points in this case) and the new observations (contour points in this case). The result is either an updated parameter set or report that the object has been lost.

This implementation works by tracking edges one by one and then putting them together. Assuming, that edges move between 2 frames doesn’t change significantly. Corner points are used to define an area, where the edge is expected to be. This reduces the number of contour points and therefore allows to require a higher percentage of inliers for the RANSAC estimation. Like in the image below.

Now regarding the smoothing. Smoothing is normally achieved by averaging. Therefore, for every tracked line. Let RANSAC estimate M good fits for that line, rather than just one. Then take the set of points, where each is close enough to at least one of these good fits. Make the final line as a least-square fit from that set of points.

When you put everything together, then the result would look like this video.

## Summary

This chapter explained to you how to track A4 marker paper in the video. Next chapter will use the outcome to build up a homography.

## AR By Hand – Part 1 – Introduction

Chances are, you already at least heard the term augmented reality. The very generic definition says that augmented reality is a real-world enhanced by computer-generated information. This mini-series will be about enhancing the captured image or video by adding a 3D model into it. Like in this video.

If you want to know how this works, then keep reading. I will walk you through all the steps. The following image is the rough summarization of the steps.

Now in simple words. The final goal is to display a cube on top of the white A4 paper. The process to do that starts by thresholding the pixels to find the white ones. This produces several blobs. Assuming the A4 paper is the biggest of them. Others are suppressed. Next step is to identify contour, edge lines and corners of the paper. Then the planar transformation is computed. Planar transformation is used to compute a projection matrix which allows drawing 3D objects to the original image. Finally, the projection matrix is transformed into the OpenGL compatible form. Then you can do anything with that.

Sounds trivial, right? Maybe. Still, it takes some effort to go through the details, therefore I have prepared the following chapters.

In addition, there is an example project accompanying this series. You can download it right here.

Code is written in Java (+ few OpenGL shaders) and build by Maven. As soon as you understand these, you should be able to build the project and run the test applications. Test applications are executable classes within test sources. Main classes are CameraPoseVideoTestApp and CameraPoseJoglTestApp.

Regarding the expected level of knowledge. It will be very helpful if you have some knowledge about linear algebra, homogenous coordinates, RGB image representation, pinhole camera model and perspective projection. Although I will try to keep the required level to the minimum, it is too much to explain every little thing in detail.
Now let me make a note about the quality of the result. There are 2 main factors which affect quality – implementation and environment. I will cover one type of implementation. Will let you judge how good it is. Please leave me comments, especially if you have a concrete idea to improve. The second factor which matters is the environment. This includes everything from camera quality, noise, distractions in the scene, lighting, occlusion, till the time you can spend on the processing each frame. Even today’s state of the art algorithms will fail under the crappy environment. Please keep this in mind, when you do your own experiments.

## Summary

This chapter gave you an overall idea of the project. Next chapter will tell you how to track the plane.