Personal Mini Toolchains

It’s all about repetition. If you found yourself doing the same thing again and again for many times, then chances are, you want to simplify that and eventually automate it (unless it is sex). Even the small task makes sense to automate if the amount of repetition is high enough. Let me give you an example.

Let’s imagine you write a script to synchronize files in between the projects. It took you 8 hours to create that script and each run saves 10 minutes of your time. This means you need to run that script 48 times to “pay off” your initial investment. Anything more than that is your “profit”.

Saving 10 minutes here and there might not seem like a big deal. But if you know how to identify and effectively implement more small tasks like this, then it will scale up very quickly and create you a lot of time to do anything you like. For example, to make comic t-shirts (;

There are many ways of automating everything. This time, I will give you the real example of creating a Personal Mini Toolchain in Java. In general, a toolchain is a program that connects together several tools together to perform a complex task. I call it Personal Mini Toolchain because it exists to save your personal time and you can start in a very minimalistic way (and grow it up from there). The great news is, that once you understand the principles, then you can apply them in any area, industry, and using any technology you like.

Step 1 - Figure Out

The most important step is to figure out what is worth and realistic for toolchain to handle. Only then it makes sense to worry about it. Worth is about return on the investment. Realistic means that the set of actions needs to be simple enough and flow from start to end without much of decision logic. Let’s imagine you put the actions into a graph like it is in the image below. You have a better chance to automate the one which is on the left side rather than the one on the right side.

Let’s look at the concrete example. As a hobby, I am creating my own Tyracorn game engine (youtube playlist is here). I started the whole project as a standard Java desktop application using the maven build system. At a certain point, I decided to run it also on Android. So I opened up the Android developer portal and started to study. I learned how to create an Android project, how to make an OpenGL context, and eventually made it work.

Now I had 2 projects, so whenever I made changes in the main project and wanted to test them on my phone, then I had to do the following.

  • Open Android studio
  • Copy and paste the source files
  • Build project
  • Install project on my phone

After I did this like 50 times, I got bored and started to figure a way how a computer can make this for me. It is important that I did that so many times. It validated the sense of automation. And manual exercise gave me enough insights to do the job.

Now, what is the lesson of this? Start your automation by doing everything manually again and again. This will confirm you that this is a bit worth investing in the effort, and give you the chance to simplify and deeply understand what are you doing. And remember, you don’t need to cover everything right from the beginning. You can start even with a single little portion, it will grow naturally over time.

Step 2 - Create

Now, let’s focus on my concrete example. Because of the manual experience, I figured out that my mini toolchain will be helpful if it does these.

  • Generate the Android project. Then I can open the Android Studio and look at it.
  • Build and install the project on my device. All this without the need to touch the Android Studio.
  • Have the ability to add more platforms and tasks later on.

I started by creating a new maven project configured to produce an executable fat jar. The code takes just the first two arguments from the command line, uses reflection to look up the class and method by these, and calls it passing the rest of the arguments as a list. It looks like this.

/**
 * Main entry to the application.
 *
 * @param args arguments
 */
public static void main(String[] args) {
    List as = Arrays.asList(args);
    if (as.size() < 2) {
        printHelp();
        System.exit(1);
    }

    runCommand(as.get(0), as.get(1), as.subList(2, as.size()));

    System.out.println("");
    System.out.println("-----------------");
    System.out.println("Job Done!");
    System.out.println("-----------------");
}

/**
 * Runs the command.
 *
 * @param command command
 * @param method method
 * @param args arguments
 */
public static void runCommand(String command, String method, List args) {
    try {
        String className = "com.tyracorn.toolchain." + StringUtils.capitalize(command) + "Command";
        Class commandClass = Class.forName(className);
        Method m = commandClass.getMethod(method, List.class);
        m.invoke(null, args);
    } catch (ClassNotFoundException | NoSuchMethodException | SecurityException | IllegalAccessException |
            IllegalArgumentException | InvocationTargetException e) {
        throw new RuntimeException(e);
    }
}

Then I started to study the Android documentation to see if I can build and install the project on my phone just from the command line. Turned out that Google did a great job by preparing Gradle tasks to do all the above. And, as a bonus, I can also automatically sign the application by my secret keys before uploading to the play store.

I used a plain Android project as a base. Then replaced certain values in build.gradle by placeholders (e.g. applicationId is replaced by ‘$tyracorn.application.id’) which will be filled in later on by toolchain. Then packed the whole directory into a zip file and place it into a toolchain project resource directory. In addition, in the project to be converted, I created a specific directory and placed configuration file and additional resources (e.g. icons in various formats) in there. Then the workflow is following.

  • Clean up target directory
  • Unzip the template
  • Copy over the source code and assets
  • Copy over the additional Android specific resources
  • Override placeholders from the template by the actual values
  • Build project
  • Install on the phone

If you are interested, then here is the source code.

/**
 * Generates the android project.
 *
 * @param args arguments
 */
public static void generate(List args) {
    try {
        // prepare directories
        System.out.println("preparing directories");

        File projDir = new File(args.get(args.size() - 1)).getCanonicalFile();
        Guard.beTrue(projDir.isDirectory(), "%s is not a directory", projDir.getAbsolutePath());
        File targetDir = new File(projDir, "target");
        if (!targetDir.isDirectory()) {
            targetDir.mkdirs();
            Guard.beTrue(targetDir.isDirectory(), "unable to create a target directory");
        }
        File androidProjectDir = new File(targetDir, "android-project");
        if (androidProjectDir.isDirectory()) {
            FileUtils.deleteDirectory(androidProjectDir);
        }
        Guard.beFalse(androidProjectDir.exists(), "android-project directory cannot exists in this point");

        // prepare properties
        Map properties = new HashMap<>();
        for (Object key : System.getProperties().keySet()) {
            properties.put((String) key, System.getProperty((String) key));
        }
        for (int i = 0; i < args.size() - 1; ++i) {
            String arg = args.get(i);
            if (arg.equals("-c")) {
                Guard.beTrue(i < args.size() - 2, "config directory must be specified after -c argument");
                File cdir = new File(args.get(i + 1));
                Guard.beTrue(cdir.isDirectory(), "config directory must be an existing directory: %s", cdir);
                System.out.println("applying config: " + cdir.getAbsolutePath());
                Map signingProps = Dut.copyMap(Props.load(new File(cdir, "signing.properties")));
                String storeFilePath = new File(cdir, signingProps.get("tyracorn.signing.storeFile")).getAbsolutePath().replaceAll("\\\\", "/");
                signingProps.put("tyracorn.signing.storeFile", storeFilePath);
                properties.putAll(signingProps);
                i = i + 1;
            }
        }

        // loading config files
        Config config = Config.load(new File(projDir, "src/main/platforms/android/config.json"));
        Pom pom = Pom.load(new File(projDir, "pom.xml"));

        // unpack template
        System.out.println("unpacking template");
        unzipResource("android-project.zip", targetDir);

        // copy files from main project
        System.out.println("copying source files from the main project");
        File srcSrcDir = new File(projDir, "src/main/java");
        File srcTargetDir = new File(targetDir, "android-project/app/src/main/java");
        FileUtils.copyDirectory(srcSrcDir, srcTargetDir);

        List excludeClasses = config.getStringList("excludedClasses");
        for (String ec : excludeClasses) {
            File cfile = new File(srcTargetDir, ec.replaceAll("\\.", "/") + ".java");
            if (cfile.isFile()) {
                Guard.beTrue(cfile.delete(), "unable to delete file %s", cfile);
            }
        }

        System.out.println("copying asset files from the main project");
        File assetsSrcDir = new File(projDir, "src/main/assets");
        File assetsTargetDir = new File(targetDir, "android-project/app/src/main/assets/external");
        FileUtils.copyDirectory(assetsSrcDir, assetsTargetDir);

        System.out.println("copying app dir from the android configuration");
        File srcAppDir = new File(projDir, "src/main/platforms/android/app");
        if (srcAppDir.isDirectory()) {
            File targetAppDir = new File(targetDir, "android-project/app");
            FileUtils.copyDirectory(srcAppDir, targetAppDir);
        }

        // adjust template
        System.out.println("merging templates");
        String gid = pom.getGroupId();
        String aid = pom.getArtifactId();
        if (aid.contains("-")) {
            String[] parts = aid.split("\\-");
            aid = parts[parts.length - 1];
        }
        String appId = gid + "." + aid;
        String appVersion = pom.getVersion().replace("-SNAPSHOT", "");
        Guard.beTrue(StringUtils.isNumeric(appVersion), "only numberic verion is supported, please look to the pom file: %s", appVersion);

        String signingStoreFile = properties.getOrDefault("tyracorn.signing.storeFile", "tyracorn-dev.jks");
        String signingStorePassword = properties.getOrDefault("tyracorn.signing.storePassword", "Password1");
        String signingKeyAlias = properties.getOrDefault("tyracorn.signing.keyAlias", "tyracorn-dev");
        String signingKeyPassword = properties.getOrDefault("tyracorn.signing.keyPassword", "Password1");

        Map vars = Dut.map(
                "tyracorn.application.id", appId,
                "tyracorn.application.version", appVersion,
                "tyracorn.signing.storeFile", signingStoreFile,
                "tyracorn.signing.storePassword", signingStorePassword,
                "tyracorn.signing.keyAlias", signingKeyAlias,
                "tyracorn.signing.keyPassword", signingKeyPassword,
                "loadingScreen", config.getString("loadingScreen"),
                "startScreen", config.getString("startScreen"));

        System.out.println("merging build properties");
        File appGradleBuild = new File(targetDir, "android-project/app/build.gradle");
        Templates.merge(appGradleBuild, vars);

        System.out.println("merging launch screens");
        File mainActivity = new File(targetDir, "android-project/app/src/main/java/com/tyracorn/android/MainActivity.java");
        Templates.merge(mainActivity, vars);

    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}

/**
 * Builds the android project.
 *
 * @param args arguments
 */
public static void build(List args) {
    generate(args);
    try {
        File projDir = new File(args.get(args.size() - 1)).getCanonicalFile();
        File targetDir = new File(projDir, "target");
        File androidProjectDir = new File(targetDir, "android-project");

        String gradlePath = androidProjectDir.getCanonicalPath() + File.separator + "gradlew.bat";

        System.out.println("building the project");
        String buildRes = Cmds.executeSimple(androidProjectDir, gradlePath, "build");
        System.out.println(buildRes);
    } catch (IOException e) {
        throw new RuntimeException(e);
    }

}

/**
 * Installs the project to the device.
 *
 * @param args arguments
 */
public static void install(List args) {
    build(args);
    try {
        File projDir = new File(args.get(args.size() - 1)).getCanonicalFile();
        File targetDir = new File(projDir, "target");
        File androidProjectDir = new File(targetDir, "android-project");

        String gradlePath = androidProjectDir.getCanonicalPath() + File.separator + "gradlew.bat";

        System.out.println("installing the project to the device");
        String installRes = Cmds.executeSimple(androidProjectDir, gradlePath, "installRelease");
        System.out.println(installRes);
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}

There is also a part which allows me to specify a directory with production signing keys. This allows me to use the same tool when making package for the play store while keeping the secrets out of the application versioning control system. I added this later on.

Now, I know this is far away from perfect, but it does a job. It saves me time. So I decided it is good enough for now and moved to the next thing. Knowing when to stop is an important part of the programming job. Not everything needs to be super generic and perfect. The main point is that it serves to you.

Step 3 - Use and Maintain

Having the compiled jar file, usage is very simple. Open the command line and call java -jar jar-file-path.jar android install path\to\project\dir. Then the toolchain generates the Android project, builds it, and installs it on the connected phone. Alternatively, it’s possible to replace install by build or generate to stop earlier in the process. In addition, adding -c path\to\config\dir after the install applies external configuration (e.g. sign apk with the production keys).

Maintenance is very simple. Unless Google decides to change the build process (which is not happening that often), then it’s only about using the toolchain and adding new capabilities when they become useful. And that’s very easy to do because the code is not trying to be generic, configurable, and scalable to millions of users. This is the type of code, which is focused only on you.

If you have read it up to here, then you can download my Tyracorn Showpark application, which was fully built by this toolchain. It is a showcase application for my game engine. I know the graphic isn’t nice at the time of writing this article, but the system works well. Eventually, I will hire a freelancer, using the system as I described in the Mastering Freelancers guide, to make it pretty.

Get it on Google Play

Summary

Now you have seen how to quickly build a toolchain. The most important thing is to figure out the piece to automate and simplify that as much as possible. Then start by creating a simple toolchain for that one purpose. Don’t try to cover everything. That will come to you over time. And finally, use it a lot to create yourself more time for whatever you love.

OpenGL – PC and Android

I started my OpenGL journey by implementing simple applications. Here you can enjoy my first demo (;

That’s was pretty good for a start. If you would like something similar, then there are plenty of good educational materials. Some are free, others cost a few protein shakes. I like these two.

You might prefer the different one, depending on your language and particular library.

After having the basics done, I decided to try to run this on my Android phone. It’s Java, “write once, run everywhere”, right? My #1 wish was that this stays true for the application code. Let’s have a look on how well it went.

Eventually, it worked out, mostly. And I was able to satisfy my #1 wish, having single source code across all platforms. But it took me more time than I originally thought.

Note that it was much harder to find tutorials and good reading. The most useful were these two sources.

Interesting issues appeared during the journey. I named as Solvables and Careables. Solvables are the ones, which can be solved by the system architecture. On the other hand, careables can’t be entirely solved. You need to care well enough so your user won’t notice them.

Solvables

JOGL VS Android OpenGL ES. These are two java Open GL wrappers. They are very similar, but not exactly the same. So I ended up by abstracting them and creating separate Tyracorn application containers. Application can then be launched in any container. This allows me to support Nintendo (make it fun for my lovely kids (; ) and other platforms later on.

The container pattern also turned out to be a great solution for unified handling application lifecycle, various input sources, and assets. And the best is that I can easily add drivers for various sensors, cameras, or robotic pizza restaurants.

There is an interesting difference inside the shader language. OpenGL ES doesn’t allow to define sampler array and then refer to its elements by anything which isn’t a constant. This effectively means that it’s not possible to write a loop to apply multiple textures. This led to a number of if statements, where each branch has the same code with a different texture.

Working across different screens. The main thing is that aspect ratio dramatically changes. If you tune everything for the landscape orientation, then the portrait one won’t work. I implemented a way that things like camera or UI components have ability to decide their properties based on the display size. And everything can change during the runtime. This is not difficult to do, but it eats a piece of your time cake.

Finally, let me mention the nuances of Java. As of today, Android doesn’t support all Java 8 features out of the box. There are two choices on how to deal with that. The first way is to introduce additional tools that can pre-process your Java 8+ code to work on Android. The second way is to simply accept the Android limitations and write code according to that. I picked up the second way because I don’t like messing around with tools. At the same time, I really love easy to read standardized code. Therefore, I spend a lot of time to finding the code form which I like the best.

Careables

There are two major careables – precision and performance.

Precision is related to the z-buffer and floating-point numbers. When I first run my application, I discovered that my mobile phone uses lower precision than my PC. Interesting. The result is more noticeable z-fighting and shadow artifacts. You might see other artifacts as well in any case you render objects close to each other. Typical advice on how to deal with these is about offsets and bias values. More advanced ones go into a way how to fake a higher precision. For example, split the scene into parts and render each of them with a separate z-buffer.

When I first time tried to run the shadow light example, I was shocked by seeing something like 3fps while rendering a simple cube with shadow. Took me a few days to discover that it was caused by two problems. The first problem was that I couldn’t specify empty color attachment during frame buffer creation in the OpenGL ES. This didn’t make the program to fail, but it made it to run terribly slow and keep showing some error in the log. The second problem was related to mobile phone architecture. Apparently battery life length is important so they decided to use a different architecture to save some power. This lead to an extra shuffling in between memories which took time. The solution for this was to explicitly clear the shadow buffer right after binding, so the copying didn’t need to take place.

Although having all this in place, the performance on mobile is still much lower than on PC. For example, 3 shadow lights already cause noticeable slowness on the mobile. And when I implemented PCF to make smooth shadows, it was a disaster. So I decided to keep shadows ugly on mobile phones.

As a lesson, I plan to do these two things in the feature.

  • Create separate versions of assets for different devices
  • Adjust scene funkiness based on the device

Conclusion

Hope you enjoyed this one. If you would like to get the source code for all this and more, then please join me on Patreon.

I am already thinking about the next step. If you have anything you would like to see, then just let me know.

See you next time!

Mastering Freelancers

Imagine you are creating something. It might be your personal project or a product in a small startup company. Then you run into the situation when you just can’t make a certain task, or you want to spend your time differently. So you decide to give some of your hard-earned cash to someone who can do that particular thing for you. And that’s a freelancer.

Freelancers can do any online task for you. For example build the websites, design work, proofreading, coding, data organization, and much more. And if you are not an expert in that particular area, then many times they can do a faster and better job than you.

There is one catch. If you manage them badly, then you might end up disappointed, without any useful work being done, and shorter of money. And this article is going to explain to you in detail how to prepare work, choose the right person, and manage the project so both of you will end up as a winner.

Preparation

There are main 2 things to prepare – budget and scope of work (SOW).

Regarding budget. Choose a number you are willing to spend. Remove sales tax (typically 10%). Remove 20% from the rest. This is to cover fees, possible up sales, and other costs you are not aware of yet. Convert the number to USD (many times) and the result is your budget.

For example, if you are willing to spend $500. Then remove 10% to get $450. Remove another 20% to get $360. Since it’s already in USD, then $360 is your final budget.

Now, let me give you a rough idea of what you can expect at certain price levels. This is coming out of my experience.

Less than $100 – simple, fast, and non-perfect work. People at this level are usually trying to be as cheap as possible. They don’t have time to do work well, because they need to rush for another gig. You can ask for simple tasks like photo correction, little translation, quick sketches, or a few WordPress pages with a lot of placeholders. I usually don’t ask for any work at this level.

Lower hundreds $, roughly up to $500 – decent work under a standardized process. Here I found a lot of people specializing in a particular area. For example, creating fashion products, 3D models, building websites, proofreading, translations, designing logos, and so on. They can do a great job for you because they have a process in place and probably already did the same thing many times. You just have to make sure that your requirements are aligned with their framework. I like this level because it’s good, easy to reach, and I am not much worried about making mistakes.

Higher hundreds $, roughly up to $1000 – take the previous level + creativity or tedious work on top of that. For example, you can get a new blog + migrate the posts from the old one. Or get a unique graphics created just for you, can be pretty complex. Or a sophisticated piece of code.

Thousand and more – the sky is the limit. Here you can expect very high quality, custom made stuff or a lot of work to be done.

Unfortunately, more players are appearing in this game. They are consultant companies and speculators. Many times they try to offer you significantly overpriced solutions. And sometimes they are able to give you a 60% discount if you point that out. I have never taken that (; Maybe they can do a great job as well, but I prefer to pay a fair price to a guy who is a real freelancer and honest from the beginning rather than sponsor such people.

Finally, if you are working with thousands of dollars (and have appropriate work for that), then it might make sense to start thinking in a different way. You might split work into smaller pieces and hire more people to do the job. And if your budget goes beyond thousands, then freelancers might not be the right choice at all.

Scope of Work

Now you have the money and a rough idea of what is realistic to ask for. The next piece is to prepare the scope of work (SOW). I will describe SOW for a fixed price project type, which is commonly used in a freelancer world. You specify what you want, and then negotiate a fixed price tag to get that. Here you can download an example of such SOW.

This is a real SOW I used to launch my other site – Tyracorn. You can use that as a template. Honestly, some training is required to create these. I personally made a terrible job a number of times. The most disaster was when I just had a single call and said something like ‘basically, eee I want something like…’. Taking that as a necessary scholarship. I want you to be better than me right from the start. Please write down at least some SOW. Anything in writing is better than nothing. Here are a few guidelines to make it easier.

Study a topic. You need to have an overall knowledge of the topic. For example, if you want to build a webshop, then you need to decide what platform to use, what payment gateway to integrate with, knowing how the checkout process works, have an idea how you would like to manage your products, and so on. These are the decisions that only you can make. So start by research, open test accounts in various services, and play with that. Don’t worry that you can’t make it look pretty or fully working, that’s the part freelancer can help with. Learn what various options, constraints, and conditions you have.

Write everything down in a plain language. If it’s not in writing it doesn’t exist. Your next freelancer might be from Indonesia, Egypt, or Mexico. These people don’t have chance to speak in English on a daily basis. Therefore help them by using a simple and clear language.

Prepare a list of requirements. Ideally long list of small and simple requirements. Idea is that you can look to the work submitted by the freelancer, run it against this list, and easily say which points are satisfied and which are not. Then during reviews, you can either adjust problematic points or deal with them in some other way.

Prepare supportive content. Anything like images, sketches, real data, content, and others is good. Put it into a place for sharing. This gives reality to the result. For example, freelancers normally won’t create the actual content for your web site. You either provide that, or end up with a web site which is full of ‘Lorem ipsum dolor sit…’. Happy cleaning then.

Mention paid products and services. I found this interesting. Many freelancers are shy to recommend the paid product or service. Make the decision you want and write the note into SOW.

Write down what you don’t want. For example, I don’t like, when someone is putting hacks into standard frameworks to satisfy 100% of the requirements. Therefore my SOW usually has a point that I am open to negotiate requirements down if they can’t be reached in a clean way.

When you should avoid freelancers

Freelancers are not always the right choice. Here are examples of when to avoid them.

  • If you are a cheap ass and want to have everything for free
  • If you can’t write what you want on a piece of paper
  • If it takes you longer to write down what you want than actually do the job by yourself
  • If you are under a contract which prevents you to do so (e.g. NDA, or employment)
  • If you have secrets required for the job (e.g. you can’t show medical data to third parties)
  • If your project is too big

Selection

With budget and SOW is easy to start looking for people. You can find them on the specific portals, there plenty of them and each of them is a little bit specific. Let me share the ones I have personally used.

  • 99designs.com – this site is for graphical work. Good point is that you post your project, and designers posts their sketches. You can then talk to them and choose whoever you like the best.
  • guru.com – here you can post any type of project. Then freelancers are bidding to do the work. So far, I have a good experience with this site. You just need to be careful. Since there are no limits in terms of work standards and prices, then this is exactly the site where price speculators and consulting companies are very active. So take some time to review and compare the offers.
  • Elementor experts – focused on work with WordPress and Elementor. If you are dealing with these technologies, then this is a perfect place. I was able to look through various portfolios, choose the guy I liked, contact him directly, and get the job done in better quality than what I would be ever able to do by myself.
  • Scribendi.com – focusing on proofreading and editing jobs. You don’t choose a particular person here. Just submit the text and describe what you want to do with that. They give you the price and find the editor for you. I had a great experience with them during editing my Robust Java Standards book.

Register to the site which matches your project, post it there. You might need to do some copy-paste from your SOW, attach only thumbnails instead of high-res graphics, and so on. Also, don’t share your budget unless the site requires you to do that. It’s better when people come up with their numbers first. Once you post the project, then get away from the computer and go for a workout, or date, or whatever.

Once you are done with your workout and date, then the next morning you can look at the offers. There will be plenty of them. If not, then you have posted something terribly wrong, so get back to the preparation and try it again. Now it’s time to make the first screening. This is about answering three basic questions. Is this a scam? Did he really read my post? Can he do a great job? And these are the indicators to look at.

  • Personal portfolio. Even if that person is just starting, then having something to show is a must. It has to be related to whatever job you are posting. If you can’t find it, or you find something broken, then move on.
  • Prove that he really read the SOW. Some people just put a generic bid to anything which appears on the portal. Luckily it’s easy to recognize. If someone truly reads the SOW, then usually react on it somehow. Many times, they write how they can approach the task into a bid. Sometimes they even make a little proof of concept without asking for anything. Or they constructively complain about a particular point in your SOW. Anything like that is a good sign.
  • Easy communication. Freelancer must be responsive to the messages (please be aware of time differences). In addition, his language (usually English) must be good enough to understand. Willingness to make a video call is also counted, but not always necessary. A good test for this is to send a few messages and see.

The process above helps you to select a handful number of people with potential. Pretty much anyone can do the job. Now it’s about choosing the one and making the deal. Simply order them from who you think is the best and try to make a deal one by one, until it actually happen.

One important note here. Make sure everything is written, and if the deal is under some freelancer site, then use whatever chat is there. The most important points to have confirmed in writing are SOW and price for the 100% delivery. Even if you use video call to agree on everything, make a written follow up and ask for a written confirmation.

This is the process to make a deal.

  • Agree on the SOW. Provide a copy, attach it to the system, and ask the candidate to read it first. Be open to questions, or a video call to explain the details. Answer all questions. Sometimes this might lead to a small change, which is ok as soon as it’s recorded. If there is too much friction in there, then it’s better to walk away to the next candidate. If you walk away like 3-4 times, then it’s likely something wrong with your SOW.
  • Agree on the total price. This is what you are going to pay once 100% is done. Every freelancer should be able to give you the number based on agreed SOW. At this point, SOW is fixed and you are not going to change that. You can negotiate as you like, or always walk away to the next candidate.
  • Agree on the payment and delivery schedule. This is usually pretty straightforward. Many times it ends up paying a deposit upfront and rest on delivery. The schedule is usually measured in weeks with expectation that once a week you can see the updates (some people even provides daily updates, but I don’t like to micro-manage).
Once you have completed an agreement with the required follow-ups, then the work starts.

Management

I found this part to be the easiest. All the hard work has been already done and all you need to do is sit back and review whatever your freelancer gives to you.
How to do a review? Take the proposal and compare it with the requirements in SOW. Write down the list of things that don’t match, or are missing. If you are working with things like design, then it’s completely ok to also include points which you just don’t like. People in these areas are used to pivot a little bit, so no one takes that badly. If you can include for example screenshots, or guidance how would you imagine that correctly, then even better. Send that back to the freelancer and wait.
Next, you get a second review. Here you revisit all the points from the list to see whether they are resolved. If the point is resolved, then ok. If not, keep this open in the second review. Then make sure that there are no new points. The goal is to resolve all points on the list (sometimes it’s ok to relax on some of these). From experience, it usually takes me 2-3 reviews to get it right.
Once your list is good enough, then Congratulation!!! Your project is done.

Failures

Unfortunately, not everything is easy in life. There are projects which fail. It happened to me several times, and there is a chance that some of my feature projects will fail as well. In nearly all cases, I can track the failure back to the point where I skipped or simplified something in preparation. Then it hit me back later. Let me share with you some of the stories.

What to do when a project is done? So I made the SOW, hire the guy. He made an amazing job. I thought it’s cool. But… I didn’t have any idea what to do next. If you are hiring a freelancer, make sure to see beyond his work.

Poor quality job. One time I hired one guy to do the particular coding job. I have to admit, that I skipped the point to verify his ability to deliver, and didn’t set up a proper review schedule. That was my mistake. He took forever, always busy in other projects. Then he delivered something which didn’t work and even the quality of the code was very bad. So I pointed this out and wanted to fix that. A few weeks later, same problems. After about 2 months back and forth, I had to cut off the losses and did it by myself. And the deal wasn’t done under any freelancer platform, so the money was gone for good.

These were valuable lessons. The first point I learned is that you can always walk away and find someone else. Don’t be shy to walk away. The second point is that the money you send is almost always gone. Doesn’t matter whether the transaction is done privately, or through some platform. So don’t put too much into a risk, until you see the results.

Conclusion

Freelancers can save your time and supply the skills you don’t have. You just need to give them money and manage them well to have a win-win deal.

As a bonus, work with them gives you a real boss experience. If you pay them from your own pocket, then each mistake hurts you directly. This is a great point to your resume for any leadership position.

So happy freelancing (^_^)

AR By Hand – Part 5 – Video and OpenGL

Welcome to the last chapter of this AR series. The previous chapter ended up by drawing a cube model on top of the marker paper. This chapter will close the whole series by showing you how to connect the whole system with OpenGL.

For start, let me give you a few words about the OpenGL. Although the end goal for this project is to render a simple cube, it’s not that trivial to do that in the OpenGL. You will still need knowledge about the graphics pipeline, shaders, and various buffer objects. Just covering this is already enough for its own tutorial series. Luckily, there is a great book called “Computer Graphics Programming in OpenGL with JAVA Second Edition” written by Gordon, V. Scott. I would encourage you to read this one if you want to learn about OpenGL. I personally used this book to create this project. The only notable difference is that JOGL library is now available in the maven repository, which makes installation super easy.

The source code for this chapter starts at CameraPoseJoglTestApp executable class.

Video Processing

This is an easy part. There is a java library called xuggle-video from OpenIMAJ. Processing video works in the way, that you open the reader, register listeners, and read packets while processing the events until the end of the stream.

As for the source code. There is a VideoUtils class which allows processing videos synchronously using lambda functions. This one I used to produce videos in the previous chapters. In addition, there is a VideoPlayer class which plays the video in the separate thread and lets you process the frames in the callback. This is the class used in the OpenGL application.

OpenGL

I am going to cover only how to fit the previously created matrices into the OpenGL ones. The general usage of OpenGL to draw shapes is not discussed here.

In the previous chapter, you learned about 3×3 matrix \(K\) and 3×4 matrix \(V\). When you start work with OpenGL, then you will see that all the matrices are 4×4. What a hell?

Don’t worry, they are all related. Detailed article series, although a bit difficult to digest, covering also this point was published by Kyle Simek back in 2012.

Be aware that it requires some effort to make things displayed correctly. There are many conventions on the way and mistake in just one sign will result in weird result, or nothing is displayed at all.

To recap. The result of the previous chapter was internal matrix \(K\) and external matrix \(V\). And you could multiply them as \(P=KV\) to get the projection matrix. The full matrices are these ones.

\[
K=\begin{bmatrix} f_x & s & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix} \\
V=[R|T]=\begin{bmatrix} r_{11} & r_{12} & r_{13} & t_x \\ r_{21} & r_{22} & r_{23} & t_y \\ r_{31} & r_{32} & r_{33} & t_z \end{bmatrix}
\]

Projection and Internal

The OpenGL projection matrix is related to the internal matrix \(K\). OpenGL Projection Matrix is a nice article which explains how the whole projection and clipping works. To give you a brief idea, let’s look at the following image (taken from that article).

The left image shows the space which is displayed at the end of the graphics pipeline. Anything out of that cut pyramid is not displayed. The camera is in the origin, oriented towards negative Z. There are 2 parallel planes forming the top and bottom of the pyramid. They are called near and far and are defined by the scalar values. Near plane is also the place where the pixels are projected. These are OpenGL parameters introduced for practical reasons and you need to choose them.

The right image shows the mapping of the volume into normalized device coordinates (NDC). This allows things like clipping or depth buffers.

In summary, the OpenGL projection matrix does 2 things – perspective projection and mapping to NDC. This can be expressed as following matrix multiplication.
\[
P_{GL}=P_{NDC}P_{Pers}
\]
Therefore, you need to create these 2 matrices, having the following parameters.

  • Values of the matrix \(K\) (\(f_x,\ f_y,\ s,\ c_x,\ c_y\))
  • near and far values – I have chosen 0.1 and 1000 respectively
  • width and height are the width and height in pixels of the original input image

With all the parameters, it’s possible to write down the matrices right away.

\[
P_{Pers}=\begin{bmatrix}
f_x & s & -c_x & 0 \\
0 & f_y & -c_y & 0 \\
0 & 0 & near + far & near * far \\
0 & 0 & -1 & 0
\end{bmatrix} \\
P_{NDC}= \begin{bmatrix}
\frac{-2}{width} & 0 & 0 & 1 \\
0 & \frac{2}{height} & 0 & -1 \\
0 & 0 & \frac{2}{near-far} & \frac{far+near}{near-far} \\
0 & 0 & 0 & 1
\end{bmatrix} \\
P_{GL}=P_{NDC}P_{Pers}
\]

Code for this in in the class Ar, method toGlProj.

View and External

The OpenGL view matrix \(V_{GL}\) is easy to construct by taking the \(V\) matrix and add the \([0,0,0,1]\) vector as the 4th row. Like this.
\[
V_{GL}=\begin{bmatrix}
r_{11} & r_{12} & r_{13} & t_x \\
r_{21} & r_{22} & r_{23} & t_y \\
r_{31} & r_{32} & r_{33} & t_z \\
0 & 0 & 0 & 1
\end{bmatrix}
\]
Code for this in in the class Ar, method toGlV.

Everything Together

Finally, you can see and run everything through the CameraPoseJoglTestApp executable class. Few things to mention.

  • Video is rendered to the texture, which is then drawn to the screen as 2 triangles.
  • Video processing and OpenGL loop need synchronization. Otherwise, it won’t work.
  • There are 2 sets of the shader programs. One for the background video, one for the 3D world.

That’s it. Here you have the video with the result.

Summary

And this is the end. Although there is s*^^*t lot of space for improvements, I hope you have enjoyed this series and learned something new. I would be more than happy if you post me your comments, questions, suggestions for improvements, or ideas for other AR-related projects. Or just another topic you are struggling with. I would love to be helpful.

See you in something else (;

AR By Hand – Part 4 – Camera Pose

Welcome in part 4 of this AR series. In the previous chapter, you could see how homography makes possible to draw into a projected planar surface. This chapter will extend the previously calculated homography into form, which allows drawing 3D objects into the scene.

The program related to this chapter is CameraPoseVideoTestApp. You can download the whole project right here.

The structure here would be the same as in the previous chapter. First, you will see the equations and then the practical example at the end. Don’t be stress about the number of parameters and variables. It’s not that difficult, once it comes to coding.

Camera and Homography

The camera is a device which projects points from 3D space into to 2D plane. For this project, I have chosen to use the classical pinhole camera model, without worrying about perspective distortions. This model makes point projection as simple as matrix-vector multiplication in homogeneous coordinates (arrows on top of the lower case letters symbolize vectors).

\[
\vec{p_{2D}}=P\vec{p_{3D}} \\
\begin{bmatrix} wx_{2D} \\ wy_{2D} \\ w \end{bmatrix}
=
\begin{bmatrix} p_{11} & p_{12} & p_{13} & p_{14} \\ p_{21} & p_{22} & p_{23} & p_{24} \\ p_{31} & p_{32} & p_{33} & p_{34} \end{bmatrix}
\begin{bmatrix} x_{3D} \\ y_{3D} \\ z_{3D} \\ 1 \end{bmatrix}
\]

\(P\) is called a projection matrix and has 3 rows, 4 columns. This realizes the dimension drop into the projection plane.

The camera is a product which has some properties, most notably it’s a focal length. Important is that these properties are constant for a given camera (assuming you are not zooming). This is the internal set of properties. Then there is an external set of properties which is a position and direction of the camera.

This can be reflected in a matrix language as decomposing the matrix \(P\) into a 3×3 calibration matrix \(K\) (internal matrix with camera properties), and a 3×4 view matrix \(V\) (external matrix with camera position and rotation). These matrices are sometimes called intrinsic and extrinsic. And you drill them down into the following form.

\[
P=KV=K[R|T]=K[R_1|R_2|R_3|T]=
\begin{bmatrix} f_x & s & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix}
\begin{bmatrix} r_{11} & r_{12} & r_{13} & t_x \\ r_{21} & r_{22} & r_{23} & t_y \\ r_{31} & r_{32} & r_{33} & t_z \end{bmatrix}
\]

  • \(f_x\) and \(f_y\) are focal lengths in the respective axes.
  • \(s\) is a skew factor.
  • \(c_x\) and \(c_y\) are the principal points of the camera.
  • \(R\) is a camera rotation matrix. \(R_1,R_2,R_3\) are columns of the rotation matrix, and \(r_{ab}\) are the elements. The rotation matrix is orthonormal (unit vectors, and orthogonal to each other). Remember this one, because it will be discussed later.
  • \(T\) is a camera translations vector with elements \(t_x,t_y,t_z\).

Calibration Matrix

All the elements of matrix \(K\) are the properties of the camera. One way to get them is to make the proper measurement. If you want to do that, then OpenCV contains a pretty lot of materials for that. I just picked them up manually as the following.

  • \(f_x,f_y=400\ or\ 800\)
  • \(s=0\)
  • \(c_x,c_y=\) center of the input image (for 640×480 image, these will be 320 and 240)

Relation with Homography

To show you how camera pose and homography are related, let’s start with writing down the equations for point projection.
\[
\begin{bmatrix} wx_{2D} \\ wy_{2D} \\ w \end{bmatrix}
=
\begin{bmatrix} p_{11} & p_{12} & p_{13} & p_{14} \\ p_{21} & p_{22} & p_{23} & p_{24} \\ p_{31} & p_{32} & p_{33} & p_{34} \end{bmatrix}
\begin{bmatrix} x_{3D} \\ y_{3D} \\ z_{3D} \\ 1 \end{bmatrix}
=
K[R|T]\begin{bmatrix} x_{3D} \\ y_{3D} \\ z_{3D} \\ 1 \end{bmatrix} = \\
=
K[R_1|R_2|R_3|T]\begin{bmatrix} x_{3D} \\ y_{3D} \\ z_{3D} \\ 1 \end{bmatrix} = \\
=
\begin{bmatrix} f_x & s & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix}
\begin{bmatrix} r_{11} & r_{12} & r_{13} & t_x \\ r_{21} & r_{22} & r_{23} & t_y \\ r_{31} & r_{32} & r_{33} & t_z \end{bmatrix}
\begin{bmatrix} x_{3D} \\ y_{3D} \\ z_{3D} \\ 1 \end{bmatrix}
\]

If \(z_{3D}=0\), then equations will look like this.
\[
\begin{bmatrix} wx_{2D} \\ wy_{2D} \\ w \end{bmatrix}
=
\begin{bmatrix} p_{11} & p_{12} & p_{13} & p_{14} \\ p_{21} & p_{22} & p_{23} & p_{24} \\ p_{31} & p_{32} & p_{33} & p_{34} \end{bmatrix}
\begin{bmatrix} x_{3D} \\ y_{3D} \\ 0 \\ 1 \end{bmatrix}
=
K[R|T]\begin{bmatrix} x_{3D} \\ y_{3D} \\ 0 \\ 1 \end{bmatrix} = \\
=
K[R_1|R_2|R_3|T]\begin{bmatrix} x_{3D} \\ y_{3D} \\ 0 \\ 1 \end{bmatrix} = \\
=
\begin{bmatrix} f_x & s & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix}
\begin{bmatrix} r_{11} & r_{12} & r_{13} & t_x \\ r_{21} & r_{22} & r_{23} & t_y \\ r_{31} & r_{32} & r_{33} & t_z \end{bmatrix}
\begin{bmatrix} x_{3D} \\ y_{3D} \\ 0 \\ 1 \end{bmatrix}
\]

Then you can make the matrix multiplication to figure out, that you can drop the third column of the rotation matrix and z coordinate of the 3D point and get the same results (reminder, you can do this only if \(z_{3D}=0\), otherwise it won’t work). This will give you the following.
\[
\begin{bmatrix} wx_{2D} \\ wy_{2D} \\ w \end{bmatrix}
=
\begin{bmatrix} p_{11} & p_{12} & p_{14} \\ p_{21} & p_{22} & p_{24} \\ p_{31} & p_{32} & p_{34} \end{bmatrix}
\begin{bmatrix} x_{3D} \\ y_{3D} \\ 1 \end{bmatrix}
=
K[R_1|R_2|T]\begin{bmatrix} x_{3D} \\ y_{3D} \\ 1 \end{bmatrix} = \\
=
\begin{bmatrix} f_x & s & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix}
\begin{bmatrix} r_{11} & r_{12} & t_x \\ r_{21} & r_{22} & t_y \\ r_{31} & r_{32} & t_z \end{bmatrix}
\begin{bmatrix} x_{3D} \\ y_{3D} \\ 1 \end{bmatrix}
\]

Now note that \([R_1|R_2|T]\) is a 3×3 matrix and at the same time, you can consider that \(K[R_1|R_2|T]=H\) from the previous chapter. That’s how homography is related to the camera projection. And that’s also why you can project points on the \(z=0\) plane, without worrying about the camera internal parameters at all.

Extending Homography

Going to full camera pose. Seems the easiest way is to calculate \([R_1|R_2 |T]=K^{-1}H\), then make \(R_3=R_1\times R_2\) and have full \([R|T]\) matrix.

Unfortunately, this doesn’t work. Remember, a little bit above I mentioned that matrix \(R\) is orthonormal? \(K\) and \(H\) are already coming out of estimations, carrying errors, so it’s not guaranteed that \(R_1\) and \(R_2\) obtained in this way are orthonormal. That would make the final image look weird. Therefore let’s make them orthonormal.

The implementation of the following text is available inside Ar class, method estimateMvMatrix. And here I would like to refer you the “Augmented Reality with Python and OpenCV” article written by Juan Gallostra. This is where I first discovered the method which I am going to describe at the moment.

Let’s start by constructing \([G_1|G_2|G_3 ]=K^{-1}H\). In the implementation, you will also see that I am negating the homography matrix before plugging it into the equation. That’s because the real pinhole camera would project the flipped image, but there is no flipping here.

Now \([G_1|G_2|G_3]\) is close to desired \([R_1|R_2|T]\), because it’s still the estimation. Therefore \([G_1|G_2|G_3]\) is nearly orthonormal. Then you can write.

\[
l=\sqrt{\| G_1 \| \| G_2 \|} ,\ \
G_1’=\frac{G_1}{l} ,\ \
G_2’=\frac{G_2}{l} ,\ \
G_3’=\frac{G_3}{l} \\
\vec{c}=G_1′ + G_2′ ,\ \
\vec{p}=G_1′ \times G_2′ ,\ \
\vec{d}=\vec{c} \times \vec{p} \\
R_1=\frac{1}{\sqrt{2}}\left( \frac{\vec{c}}{\| \vec{c} \| } + \frac{\vec{d}}{\| \vec{d} \| } \right) ,\ \
R_2=\frac{1}{\sqrt{2}}\left( \frac{\vec{c}}{\| \vec{c} \| } – \frac{\vec{d}}{\| \vec{d} \| } \right) \\
R_3=R_1 \times R_2 ,\ \
T=G_3′
\]

Then you can stack vectors into columns to get the final \(V=[R_1|R_2|R_3|T]\) 3×4 matrix. Finally, compute \(P=KV\) and start projecting points.

Summary

Now you know, how to draw 3D objects into the scene. So far, all the drawing is done through the simple image operations, which is useful only for the basic demos. In the last chapter, you will discover how to hook up the whole thing with video and OpenGL to make more funky stuff.

AR By Hand – Part 3 – Homography

Welcome in part 3 of this AR series. In the previous chapter, you could read about how to detect and track white A4 paper. The result was 4 points in the image corresponding to the corners. This chapter will use these points to build a homography. That’s the next step towards the AR experience.
This article goes first through the mathematics behind homography, and then shows the use case relevant to this project.

Mathematics

Here, I will briefly review the terms and then derive the equation system for homography. If you follow this, you will see why you need to detect at least 4 points which aren’t on the line.

Terms

Correspondence. Imagine have 2 photos of the same object from a slightly different position. Then the point \(p_1\) and \(p_2\) on the respective images are corresponding if they are projecting the same physical point. Sign for correspondence is \(p_1\ \hat{=}\ p_2\).

Homogeneous coordinates. This is a coordinate system used in projective geometry and will be used here from now on as well. 2D vector \(\begin{bmatrix} x\\ y \end{bmatrix}\) in cartesian coordinates is expressed as 3D vector \(\begin{bmatrix} wx\\ wy\\ w \end{bmatrix}, \forall w\neq 0\) in homogeneous coordinates. Similarly, 3D vectors in cartesian coordinates are 4D vectors in homogeneous coordinates. Also, note that \(\begin{bmatrix} w_{1}x\\ w_{1}y\\ w_1 \end{bmatrix}=\begin{bmatrix} w_{2}x\\ w_{2}y\\ w_2 \end{bmatrix}, \forall w_1\neq 0,\ w_2\neq 0\) in homogeneous coordinates.

Matrices are used to represent certain geometric transformations in the homogeneous coordinates. Transformation of the point \(p\) is realized by a simple multiplication so that \(p’=Mp\). In addition, transformations can merged into a single one by the standard matrix multiplication.

Homography Equations

Mr. Wikipedia says that any two images of the same planar surface in space are related by a homography (assuming a pinhole camera model).

In other words, if \(I\) and \(I’\) are 2 images, containing same planar surface, then there exists a 3×3 matrix \(H\) which maps points \(p\) into corresponding points \(p’\), such as \(p’=Hp\). Remember that these points must be on that plane.

Let’s write down the equations in more details.

\[
H=\begin{bmatrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & h_{33} \end{bmatrix} \\
\begin{bmatrix} w’x’ \\ w’y’ \\ w’ \end{bmatrix}=\begin{bmatrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & h_{33} \end{bmatrix}\begin{bmatrix} wx \\ wy \\ w \end{bmatrix} \\
\begin{bmatrix} w’x’ \\ w’y’ \\ w’ \end{bmatrix}=\begin{bmatrix} h_{11}wx + h_{12}wy + h_{13}w \\ h_{21}wx + h_{22}wy + h_{23}w \\ h_{31}wx + h_{32}wy + h_{33}w \end{bmatrix}
\]

The goal is to figure out 9 elements of matrix \(H\). Without losing any generality, you can assume that \(w = 1\) and switch to the cartesian coordinates by division. This will make the following equation.

\[
\begin{bmatrix} x’ \\ y’ \end{bmatrix}=\begin{bmatrix} \frac{h_{11}x + h_{12}y + h_{13}}{h_{31}x + h_{32}y + h_{33}} \\ \frac{h_{21}x + h_{22}y + h_{23}}{h_{31}x + h_{32}y + h_{33}} \end{bmatrix}
\]

This equation system has 9 degrees of freedom. Luckily, you can multiply all elements of \(H\) by a non-zero \(k\) without having affecting the solution at all. This removes 1 degree of freedom and opens 2 possible ways for a solution.

First way is to set \(h_{33} = 1\). You can do this as soon as \(h_{33}\neq 0\). Second, more general way, is to impose unit vector constraint such as \(h_{11}^2+h_{12}^2+h_{13}^2+h_{21}^2+h_{22}^2+h_{23}^2+h_{31}^2+h_{32}^2+h_{33}^2=1\). Here I will use the first way because it seems to be more intuitive and better supported by the numerical libraries.

Homography Solution

Setting \(h_{33}=1\) will give the following.

\[
\begin{bmatrix} x’ \\ y’ \end{bmatrix}=\begin{bmatrix} \frac{h_{11}x + h_{12}y + h_{13}}{h_{31}x + h_{32}y + 1} \\ \frac{h_{21}x + h_{22}y + h_{23}}{h_{31}x + h_{32}y + 1} \end{bmatrix}
\]

After separating to the components, multiplying, and reorganizing you will get these 2 equations.

\[
x’=h_{11}x + h_{12}y + h_{13} – h_{31}xx’ – h_{32}yx’ \\
y’=h_{21}x + h_{22}y + h_{23} – h_{31}xy’ – h_{32}yy’
\]

These are the linear equations with 8 unknowns. Therefore, in theory, it is required to have 8 equations (with certain preconditions to make sure the system is not degenerated) to be able to figure out the unknowns.

In practice, we have an estimated 4 corner points of the marker paper. Although there are some errors carried out of the image processing part, these points do not lie on a single line. Therefore it is possible to plug them into equations and use numerical methods to get the approximated solution with minimal error. This is how the equations look like.

\[
\begin{bmatrix}
x_1 & y_1 & 1 & 0 & 0 & 0 & -x_1x_1′ & -y_1x_1′ \\
0 & 0 & 0 &x_1 & y_1 & 1 & -x_1y_1′ & -y_1y_1′ \\
x_2 & y_2 & 1 & 0 & 0 & 0 & -x_2x_2′ & -y_2x_2′ \\
0 & 0 & 0 &x_2 & y_2 & 1 & -x_2y_2′ & -y_2y_2′ \\
\cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots
\end{bmatrix}
\begin{bmatrix}
h_{11} \\ h_{12} \\ h_{13} \\ h_{21} \\ h_{22} \\ h_{23} \\ h_{31} \\ h_{32}
\end{bmatrix}
=
\begin{bmatrix}
x_1′ \\ y_1′ \\ x_2′ \\ y_2′ \\ \cdots
\end{bmatrix}
\]

I won’t go into numerics here. I just use the solver provided by the mathematics library to get a solution like this one.

\[
H=\begin{bmatrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & 1 \end{bmatrix} \\
\]

The source code for homography estimation is located inside the Ar class, method estimateHomography.

Use Case

Homography has several use cases, you can easily find them on the internet. Here just the one relevant to this project. Let’s estimate the homography in the way that detected rectangle corresponds to the fixed rectangle. Then draw the blue square to the fixed rectangle and corresponding square to the original image. The result is right below.

Summary

This chapter covered homography. This allows you to draw into the planar surfaces of the image. In the next chapter, you will learn how to extend homography to get the projection matrix and be able to draw 3D objects lying on top of that plane.

AR By Hand – Part 2 – Plane Detection

Welcome back to AR series. The previous chapter introduced the whole project. This chapter will cover the first topic on the list, plane detection, and tracking. At the end of this article, you will be able to identify the corner points of the A4 paper from the input image, in the right order so you can draw a rectangle contour in there. Example outcome of this chapter is in the following image.

The full implementation is available in the accompanying project. The download button is right below this paragraph. The main class for this article is WhiteMarkerTracker. I would encourage you to look into the source code while reading the article. Now let’s get into it.

Pre-processing

Pre-processing is the first phase in image processing. The goal of the pre-processing is to clean up the image and extract only the information usable in the further phase. Since, in many cases, this phase has to go through every pixel of the image, only relatively simple operations can be performed. The result is a list of “features” (you will see what this word means in a little bit) useful for more detailed processing. And many times, it’s desirable to have a much smaller number of features than the number of pixels.

In our case, the goal is to identify the white rectangle. And the good way to start is by identifying the pixels lying on the edge of the white area. That pixels are the “features” in this particular context. And they can be extracted by the following process.

  1. Threshold pixel by the color (white color must have red, green and blue components high enough).
  2. Identify all the connected areas (blobs).
  3. Pick up the biggest blob (assuming the marker paper is the dominant white object in the image) and throw away all the others. This cleans certain artifacts.
  4. If the biggest blob doesn’t have enough volume (means number of pixels), then exit.
  5. Identify contour points of the blob. These are the white pixels next to the black pixel.

The result is a set of contour points, illustrated in the image below. To give you rough numbers. Input image has 640×480 pixels, which is slightly over 300,000 pixels in total. Pre-processing chooses, given reasonable input, less than 3,000 pixels. This reduces the amount of data by the factor of 1,000.

Let me add one more note. Based on the way how you acquire the input image, you might need to apply additional operation(s) like Gaussian smoothing to get the reasonable contour. I have used an image from a video sequence, where compression algorithm already did a similar job, therefore it wasn’t necessary in my case.

Once contour pixels are selected, then the next phase can start.

Detection

Now you have a set of contour pixels. So what to do with them? Although you can see they mostly form the edges of the marker paper, there are still some errors. In addition, they are just pixels, which doesn’t tell the position of the corner points. And in some cases, corner points locations might be not that well defined. Like in the following image.

Detection algorithm first identifies 4 edge lines and then calculates the corner positions from the intersections. Note that there are several points on the way where the algorithm can fail to detect something. In such a case, it reports that there wasn’t anything detected.

Edge Lines

The good method for identifying edge lines, while having errors and outliers in there, is RANSAC (Random Sample Consensus). The general RANSAC workflow is following.

  1. Repeat N times (N is up to you)
    1.  Randomly choose a minimal number of points you need to build the model
    2. Build the model from chosen points
    3. Compare the model with other sample points and calculate the number of good fits (inliers, or points which are close enough to the expected positions)
    4. If the number of inliers is high enough, then accept the model and terminate the cycle
  2. You either have a model, or there is a high probability that the model doesn’t exist

Now more concrete for the edges of the marker paper. The main difference is that we want to find 4 lines lying as much as possible over the contour points. For this, it is necessary to choose the maximal number of iterations we are willing to take (N), minimum number of points lying “close enough” to the line in order to accept the line (minAccept – good is to use % of the total number of sample points), and distance from the line which is considered as “close enough” (dstThres). The full algorithm is in the class Fitting, method linesRansac. Here just a brief description.

  1. Start with empty result list
  2. Repeat max N times, stop if result list has desired a number of lines (4 in this case)
    1. Pick up 2 different random points from sample set
    2. Build the line
    3. Calculate the number of inliers (max distance from the line is dstThres)
    4. If the number of inliers is greater or equal to the minAccept parameter, then
      1. Add the line to the result list
      2. Remove inliers from the sample set (to prevent the same line being detected again)
  3. Return the result list

If you run this algorithm, then you will “ideally” (I will get back to the word “ideally” later in this chapter) end up with lines like in this image.

You see, RANSAC is tolerant of the various form of distractions. All you need is to have enough number of sample points being “close enough”. Now once edge lines are known, the final shape can be extracted.

Paper Shape

Going from edges to the corner points is a matter of identifying which lines are perpendicular, calculating intersections and ordering them counterclockwise. The full implementation is in the WhiteMarkerTracker class, method extractPoints.

Identifying the perpendicular lines is possible by the angle examination because we know that the rectangle has 2 sets of 2 parallel lines. If you select any edge line, then the parallel line will always have the smallest angle in between. And 2 perpendicular lines are the remaining lines which are not the parallel one. The angle between the lines is possible to calculate from the line equations. The same for the intersection. The ordering of the points just requires to use a little bit of vector algebra and messing around.

If everything is done, then you should be able to see the image like this one.

So, are we done? Not so fast…

RANSAC Problems

Remember, before I told the word “ideally”? This part is all about that.

Let’s make a little experiment. Let’s take the example image from this chapter and make a 200 frames video out of it. In every frame let’s perform the plane detection as described so far and follow up by estimating AR parameters and draw a 3D cube on top of the plane (don’t worry if you don’t know how to do this yet). This is how the result would look like.

The cube is shaking, although it should stay still. In addition, there are several frames which are completely wrong. This is caused by the nature of the RANSAC method.  RANSAC is a probabilistic method which randomly selects points to create a model. This has two consequences.

  1. Every estimation is slightly different, although most of them are reasonably good. This is the reason for shaking. Especially because the errors are summing up on the way.
  2. There is a small chance that some model is wrong yet still fits enough points to be accepted. This is the reason for several frames being totally wrong.

To be more illustrative, let’s see how the current line detection looks like.

At this video, you can clearly see that 2 RANSAC detections of the same line might be both reasonably good, yet slightly different. This is the root cause of the shaking cube in the previous video. Also, time to time you can see the miss-detection causing single edge being detected twice. This is the root cause of the cube being rendered in the wrong position.

How to improve that?

Stabilization With Tracking

Although simple RANSAC method isn’t good enough to produce a good result, it’s still a reasonable start. Therefore, let’s use the initial detection and enhance it.

There are 2 enhancements (both are implemented inside WhiteMarkerTracker class, method trackMarker).

  1. Track previously detected shape
  2. Smooth the estimation by averaging

First, let’s discuss the tracking. Tracking is done a frame by frame. In each frame, the algorithm knows the parameters of the old object (4 corner points in this case) and the new observations (contour points in this case). The result is either an updated parameter set or report that the object has been lost.

This implementation works by tracking edges one by one and then putting them together. Assuming, that edges move between 2 frames doesn’t change significantly. Corner points are used to define an area, where the edge is expected to be. This reduces the number of contour points and therefore allows to require a higher percentage of inliers for the RANSAC estimation. Like in the image below.

Now regarding the smoothing. Smoothing is normally achieved by averaging. Therefore, for every tracked line. Let RANSAC estimate M good fits for that line, rather than just one. Then take the set of points, where each is close enough to at least one of these good fits. Make the final line as a least-square fit from that set of points.

When you put everything together, then the result would look like this video.

Summary

This chapter explained to you how to track A4 marker paper in the video. Next chapter will use the outcome to build up a homography.

AR By Hand – Part 1 – Introduction

Chances are, you already at least heard the term augmented reality. The very generic definition says that augmented reality is a real-world enhanced by computer-generated information. This mini-series will be about enhancing the captured image or video by adding a 3D model into it. Like in this video.

If you want to know how this works, then keep reading. I will walk you through all the steps. The following image is the rough summarization of the steps.

Now in simple words. The final goal is to display a cube on top of the white A4 paper. The process to do that starts by thresholding the pixels to find the white ones. This produces several blobs. Assuming the A4 paper is the biggest of them. Others are suppressed. Next step is to identify contour, edge lines and corners of the paper. Then the planar transformation is computed. Planar transformation is used to compute a projection matrix which allows drawing 3D objects to the original image. Finally, the projection matrix is transformed into the OpenGL compatible form. Then you can do anything with that.

Sounds trivial, right? Maybe. Still, it takes some effort to go through the details, therefore I have prepared the following chapters.

In addition, there is an example project accompanying this series. You can download it right here.

Code is written in Java (+ few OpenGL shaders) and build by Maven. As soon as you understand these, you should be able to build the project and run the test applications. Test applications are executable classes within test sources. Main classes are CameraPoseVideoTestApp and CameraPoseJoglTestApp.

Regarding the expected level of knowledge. It will be very helpful if you have some knowledge about linear algebra, homogenous coordinates, RGB image representation, pinhole camera model and perspective projection. Although I will try to keep the required level to the minimum, it is too much to explain every little thing in detail.
Now let me make a note about the quality of the result. There are 2 main factors which affect quality – implementation and environment. I will cover one type of implementation. Will let you judge how good it is. Please leave me comments, especially if you have a concrete idea to improve. The second factor which matters is the environment. This includes everything from camera quality, noise, distractions in the scene, lighting, occlusion, till the time you can spend on the processing each frame. Even today’s state of the art algorithms will fail under the crappy environment. Please keep this in mind, when you do your own experiments.

Summary

This chapter gave you an overall idea of the project. Next chapter will tell you how to track the plane.

Brittle POJO

POJO or JavaBean is popular pattern in Java for data holders. These are simple objects with the main purpose to hold the data in memory. Many frameworks are using them. And they can be created very fast, every IDE has feature to generate getters and setters. Unfortunately such easy pattern is designed to produce brittle code. This will cause you more and more troubles as project grows and gets more complicated. This article is going to demonstrate several issues that come with POJO pattern.

Note: Precondition for further reading is a basic understanding of Java and JUnit.

Example 1

Let’s define Rectangle class like this.

public class Rectangle {

    private double width;
    
    private double height;

    public double getWidth() {
        return width;
    }

    public void setWidth(double width) {
        this.width = width;
    }

    public double getHeight() {
        return height;
    }

    public void setHeight(double height) {
        this.height = height;
    }
    
}

Then, what about equality of 2 rectangles? Naturally, I would expect 2 rectangles to be equal as soon as they are same in width and height. Following the actual behavior of Rectangle class definition. Comments on the right show the console output.

Rectangle rect1 = new Rectangle();
rect1.setWidth(1);
rect1.setHeight(2);
Rectangle rect2 = new Rectangle();
rect2.setWidth(1);
rect2.setHeight(2);

System.out.println(rect1.equals(rect1));      // true
System.out.println(rect2.equals(rect2));      // true
System.out.println(rect1.equals(rect2));      // false
System.out.println(rect2.equals(rect1));      // false

As you see, the different instances of the Rectangle are not considered equal, although they are same in width and height. This for example affects the behavior of collections.

List list = new ArrayList<>();
list.add(rect1);
System.out.println(list.contains(rect1));     // true
System.out.println(list.contains(rect2));     // false

Set set = new HashSet<>();
set.add(rect1);
System.out.println(set.contains(rect1));     // true
System.out.println(set.contains(rect2));     // false

Map<Rectangle, Integer> map = new HashMap<>();
map.put(rect1, 2);
System.out.println(map.containsKey(rect1));     // true
System.out.println(map.containsKey(rect2));     // false

Here contains method always returns false, unless the tested object is the object instance which was added in. Same for the map keys. That means you can’t test whether 2 collections are equal unless they hold the exact same object instances. This creates a huge complications in unit tests. For example, let’s define RectangleParser interface.

public interface RectangleParser {
    public List parse(String input);
}

Next, imagine there is an implementation called SuperParser which needs to be tested. Because there is no way how to test the equality of 2 rectangle objects, then all properties have to be compared manually. Something like this.

RectangleParser parser = new SuperParser();
List rects = parser.parse("rectangle[1,2],recangle[2,3]");
assertEquals(2, rects.size());
assertEquals(1d, rects.get(0).getWidth(), 0d);
assertEquals(2d, rects.get(0).getHeight(), 0d);
assertEquals(2d, rects.get(1).getWidth(), 0d);
assertEquals(3d, rects.get(1).getHeight(), 0d);

And this is very often the reason why many developers don’t write unit tests at all. Very typical excuses are:

  • We don’t have time for unit tests
  • Business logic in our project is too complicated
  • I am a developer, not a tester

Many others just write unit tests for simplest units, or end up by comparing for example only list sizes without having any clue about the objects inside. Such tests are just good to show the green bar to non-tech managers and doesn’t bring any real value to the project. What brings real value to the project are unit tests of the most complex units with deep comparison. Unfortunately lack of equals method in data holding objects makes impossible to create them. Therefore let’s improve that.

Example 2

Adding equals method. This method comes together with hashCodemethod. For those who haven’t done this yet, I would recommend to spend 5 minutes and read about them in Javadoc. Here is the next version of the Rectangle class.

public class Rectangle {

    // ... same properties with getters and setters as before

    @Override
    public int hashCode() {
        return (int) width + 13 * (int) height;
    }

    @Override
    public boolean equals(Object obj) {
        if (obj == null) {
            return false;
        }
        if (!(obj instanceof Rectangle)) {
            return false;
        }
        Rectangle other = (Rectangle) obj;
        return other.width == width && other.height == height;
    }

}

The outcome of previous code after adding equals and hashCode methods.

Rectangle rect1 = new Rectangle();
rect1.setWidth(1);
rect1.setHeight(2);
Rectangle rect2 = new Rectangle();
rect2.setWidth(1);
rect2.setHeight(2);

System.out.println(rect1.equals(rect1));      // true
System.out.println(rect2.equals(rect2));      // true
System.out.println(rect1.equals(rect2));      // true
System.out.println(rect2.equals(rect1));      // true

List list = new ArrayList<>();
list.add(rect1);
System.out.println(list.contains(rect1));     // true
System.out.println(list.contains(rect2));     // true

Set set = new HashSet<>();
set.add(rect1);
System.out.println(set.contains(rect1));     // true
System.out.println(set.contains(rect2));     // true

Map<Rectangle, Integer> map = new HashMap<>();
map.put(rect1, 2);
System.out.println(map.containsKey(rect1));     // true
System.out.println(map.containsKey(rect2));     // true

You see, rectangles are considered to be equal and it is possible to test whether collections contains the specific one. Then the test case for RectangleParser can be rewritten in this way.

RectangleParser parser = new YourSuperParser();
List expected = Arrays.asList(... insert the rectangles...);
List rects = parser.parse("rectangle[1,2],recangle[2,3]");
assertEquals(expected, rects);

This is much better, because objects are deeply compared. Such tests are much more robust than the previous ones so developers can seamlessly catch and fix the (side) effects of the code changes. Seems like a problem solved. Unfortunately such way brings another issue. Look at this.

Rectangle rect1 = new Rectangle();
rect1.setWidth(1);
rect1.setHeight(2);
Rectangle rect2 = new Rectangle();
rect2.setWidth(1);
rect2.setHeight(2);

Set set = new HashSet<>();
set.add(rect1);
System.out.println(set.contains(rect1));     // true
System.out.println(set.contains(rect2));     // true

rect1.setWidth(5);

System.out.println(set.contains(rect1));     // false
System.out.println(set.contains(rect2));     // false

Now rectangle was inserted to the set. Since both rectangles are equal, then set returns true when calling contains method. Next, original rectangle was changed. That means it’s hash code value changed as well. But set isn’t aware of this change. That means it keeps the object in the wrong bucket. And therefore it looks like the rectangle disappeared from the set. In this example it is easy to spot, but it’s very hard to find it when same situation happens in a large system. This means that invocation of public method can easily break completely different portion(s) of the application.

This is the problem of all mutable patterns. You might solve it by convention to say, no one is going to call setter after the object is constructed. It might or might not work out for you. I have personally chosen not to rely on such convention.

Example 3

What about popular inheritance? Let’s extend Rectangle class and add color in there.

public class ColorRectangle extends Rectangle {

    private int color;

    public int getColor() {
        return color;
    }

    public void setColor(int color) {
        this.color = color;
    }

    @Override
    public int hashCode() {
        return (int) getWidth() + 13 * (int) getHeight() + 169 * color;
    }

    @Override
    public boolean equals(Object obj) {
        if (obj == null) {
            return false;
        }
        if (!(obj instanceof ColorRectangle)) {
            return false;
        }
        ColorRectangle other = (ColorRectangle) obj;
        return other.getWidth() == getWidth() && other.getHeight() == getHeight() && other.color == color;
    }

}

And again a small test.

Rectangle rect1 = new Rectangle();
rect1.setWidth(1);
rect1.setHeight(2);
ColorRectangle rect2 = new ColorRectangle();
rect2.setWidth(1);
rect2.setHeight(2);
rect2.setColor(0x00ffff00);

System.out.println(rect1.equals(rect2));     // true
System.out.println(rect2.equals(rect1));     // false

Result is that rect1 is equal to rect2, and rect2 is not equal to rect1. That means the symmetric relation for equals method is broken. And it is proven that if you extend some class and add extra property into the child one, then there is no way how to make equals method work according contract written in Javadoc, unless the parent class is aware of the child. This can easily cause a weird behavior which is hard to uncover.

Other Issues

Regarding consistency. POJO objects are not guaranteed to be consistent. Properties are set one by one after the construction. Objects might be in invalid state and validation has to be invoked somehow externally. This means another responsibility for users. In addition, any later call of the setter might put object into an invalid state again.

Regarding thread safety. POJO objects are not thread safe by definition. This brings another limitations to the users.

Conclusion

In this article I have demonstrated several issues with POJO pattern. For those reasons I have decided to use purely immutable objects with prohibition of inheritance as a main data holders. These objects might be constructed for example by builder pattern or static factory method. Like this.

Rectangle rect1 = new Rectangle.Builder().
        setWidth(1).
        setHeight(2).
        build();
Rectangle rect2 = Rectangle.ceate(1, 2);

Important is that every object is guaranteed to be in valid state and immutable for the whole life time. Therefore it is safe to use such objects in collections and multi threaded environment. Mentioned issued just doesn’t exists. POJOs are still good as a bridge to various framework as soon as they are used purely inside that integration layer, and never ever leak to the core application code. If you would like to get more details about this topic, then Effective Java written by Joshua Bloch is a great resource.

Bitcoin and JUnit

Bitcoin and blockchain are hot topics today. Many related projects are already out there and much more are being developed right now. If you are a developer in this area, then you know how important and tricky is to have bullet proof testing in place. In this article I will briefly describe the options for testing bitcoin and then go into more detail of the one which you can run easily and offline. Please note that this article is focusing purely on Bitcoin.

Testing Options

There are 3 possible modes or stages for testing bitcoin applications.

  1. Regtest mode
  2. Test on the testnet
  3. Test on the mainnet

Regtest mode is what I would recommend to start with. It allows you to run the node on your local computer just as a sandbox, completely isolated from the internet. And it has one very useful feature. You are fully in control of the block generation and new blocks can be ‘mined’ just by calling a simple command. This removes the need to wait for blocks to be mined. Also you have unlimited number of coins to play with, because all the mining awards go to the account on your local node. Testnet, as name suggested, is a place which behaves almost the same as real Bitcoin network, is a fully functional network. This includes real mining, necessary waiting time and a need to account for activity of other people. Differences are that coins doesn’t have any value, anyone can get small amount of free test coins and the whole network gets nuked time to time. It’s a good next step after having a working system on the regtest. Finally mainnet is the network where the real transactions are happening. Bugs can become very expensive here. So better to leave this for the final test, after being confident that everything is working as planned. I believe that during application development you should walk through all 3 stages. The rest of this article will show how to connect regtest mode with junit. First, let’s just get a little bit familiar with bitcoind, a program which implements Bitcoin protocol and can act as a node.

Running Local Node

Start with software installation. I prefer Bitcoin Unlimited. Download latest version of Official Bitcoin (BTC) Release for your platform and unzip / install it. Open console and navigate to the folder with binaries. Local node in the regtest mode can be started following command.

  • bitcoind -regtest -txindex=1 -server -rpcallowip=127.0.0.1 -rpcport=18332 -rpcuser=user -rpcpassword=Password1

There are many parameters you can put into the command, reference is here. Important ones are those which specify regtest mode, data directory and open JSON-RPC port for clients. Next step is to open second console, navigate to the same directory, and use JSON-RPC client to perform actions. Following sequence generates blocks, sends coins to the specified address and generates next set of blocks to simulate transaction progress in chain.

  • bitcoin-cli -regtest -rpcport=18332 -rpcuser=user -rpcpassword=Password1 generate 101
  • bitcoin-cli -regtest -rpcport=18332 -rpcuser=user -rpcpassword=Password1 getbalance
  • bitcoin-cli -regtest -rpcport=18332 -rpcuser=user -rpcpassword=Password1 sendtoaddress put_your_address_here 10
  • bitcoin-cli -regtest -rpcport=18332 -rpcuser=user -rpcpassword=Password1 generate 7

If you have made it up to here, then you should be able to test your application and restart chain from the beginning (by deleting data directory) whenever you want. For the moment all done manually. Next chapter will show you how to automate this.

Regtest and JUnit

The example project is a standard java maven project which contains a single unit test and can be invoked from command line just by mvn test. Bitcoin binaries for Windows are included. If you are using different platform, then please download and replace the binaries.

Coding is pretty straightforward and can be summarized in the following points.

  • Clean up the data directory and start bitcoind process inside the test setup
  • Tested client connects to the node during the test case as needed
  • New blocks are generated on demand (there is a method for that)
  • Bitcoind process is stopped during the test tear down

Note: Depending on your environment you might need to deal with 2 issues – permissions and firewall.

Here is how you start new bitcoind process.

ProcessBuilder processBuilder = new ProcessBuilder(BIN_DIR_PATH + "/bitcoind.exe", "-regtest",
        "-datadir=" + dataDir.getAbsolutePath(), "-txindex=1", "-server",
        "-rpcallowip=127.0.0.1", "-rpcport=18332", "-rpcuser=user", "-rpcpassword=Password1");
processBuilder.directory(new File("src/test/resources/bitcoinUnlimited-1.0.3/bin"));
try {
    bcProcess = processBuilder.start();
    Thread.sleep(5000);
} catch (IOException e) {
    throw new RuntimeException("error during process start", e);
} catch (InterruptedException e) {
    throw new RuntimeException(e);
}

Variable bcProcess is defined in the test class and is used in a tear down method to close the process. 5 seconds thread sleep is more tricky. processBuilder.start() method returns immediately when process is started. Unfortunately the bitcoind is not initialized at that point so connection would fail. In this particular case reasonable waiting time just makes the job.

Next how to stop the process.

        try {
            bcProcess.getInputStream().close();
            bcProcess.getErrorStream().close();
            bcProcess.getOutputStream().close();
            bcProcess.destroy();
            bcProcess = null;
            Thread.sleep(1000);
        } catch (IOException e) {
            throw new RuntimeException(e);
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }

That’s the whole “magic”. The rest of the code just cleans up the working directory before test and invokes the actual test. For more details, please look to the source code.

Summary

As you can see the java code to run bitcoin node and interact with it is easy and it works. You can write test cases which runs relatively fast and have guaranteed environment. Portion which I don’t really like is the dependency on the native program. Especially while imagining handling multiple chains on the multiple systems. Then the amount of necessary binaries can grow up significantly. I have in my mind couple of ways how to resolve that. Please let me know if you have an elegant solution for that.