Personal Mini Toolchains

It’s all about repetition. If you found yourself doing the same thing again and again for many times, then chances are, you want to simplify that and eventually automate it (unless it is sex). Even the small task makes sense to automate if the amount of repetition is high enough. Let me give you an example.

Let’s imagine you write a script to synchronize files in between the projects. It took you 8 hours to create that script and each run saves 10 minutes of your time. This means you need to run that script 48 times to “pay off” your initial investment. Anything more than that is your “profit”.

Saving 10 minutes here and there might not seem like a big deal. But if you know how to identify and effectively implement more small tasks like this, then it will scale up very quickly and create you a lot of time to do anything you like. For example, to make comic t-shirts (;

There are many ways of automating everything. This time, I will give you the real example of creating a Personal Mini Toolchain in Java. In general, a toolchain is a program that connects together several tools together to perform a complex task. I call it Personal Mini Toolchain because it exists to save your personal time and you can start in a very minimalistic way (and grow it up from there). The great news is, that once you understand the principles, then you can apply them in any area, industry, and using any technology you like.

Step 1 - Figure Out

The most important step is to figure out what is worth and realistic for toolchain to handle. Only then it makes sense to worry about it. Worth is about return on the investment. Realistic means that the set of actions needs to be simple enough and flow from start to end without much of decision logic. Let’s imagine you put the actions into a graph like it is in the image below. You have a better chance to automate the one which is on the left side rather than the one on the right side.

Let’s look at the concrete example. As a hobby, I am creating my own Tyracorn game engine (youtube playlist is here). I started the whole project as a standard Java desktop application using the maven build system. At a certain point, I decided to run it also on Android. So I opened up the Android developer portal and started to study. I learned how to create an Android project, how to make an OpenGL context, and eventually made it work.

Now I had 2 projects, so whenever I made changes in the main project and wanted to test them on my phone, then I had to do the following.

  • Open Android studio
  • Copy and paste the source files
  • Build project
  • Install project on my phone

After I did this like 50 times, I got bored and started to figure a way how a computer can make this for me. It is important that I did that so many times. It validated the sense of automation. And manual exercise gave me enough insights to do the job.

Now, what is the lesson of this? Start your automation by doing everything manually again and again. This will confirm you that this is a bit worth investing in the effort, and give you the chance to simplify and deeply understand what are you doing. And remember, you don’t need to cover everything right from the beginning. You can start even with a single little portion, it will grow naturally over time.

Step 2 - Create

Now, let’s focus on my concrete example. Because of the manual experience, I figured out that my mini toolchain will be helpful if it does these.

  • Generate the Android project. Then I can open the Android Studio and look at it.
  • Build and install the project on my device. All this without the need to touch the Android Studio.
  • Have the ability to add more platforms and tasks later on.

I started by creating a new maven project configured to produce an executable fat jar. The code takes just the first two arguments from the command line, uses reflection to look up the class and method by these, and calls it passing the rest of the arguments as a list. It looks like this.

/**
 * Main entry to the application.
 *
 * @param args arguments
 */
public static void main(String[] args) {
    List as = Arrays.asList(args);
    if (as.size() < 2) {
        printHelp();
        System.exit(1);
    }

    runCommand(as.get(0), as.get(1), as.subList(2, as.size()));

    System.out.println("");
    System.out.println("-----------------");
    System.out.println("Job Done!");
    System.out.println("-----------------");
}

/**
 * Runs the command.
 *
 * @param command command
 * @param method method
 * @param args arguments
 */
public static void runCommand(String command, String method, List args) {
    try {
        String className = "com.tyracorn.toolchain." + StringUtils.capitalize(command) + "Command";
        Class commandClass = Class.forName(className);
        Method m = commandClass.getMethod(method, List.class);
        m.invoke(null, args);
    } catch (ClassNotFoundException | NoSuchMethodException | SecurityException | IllegalAccessException |
            IllegalArgumentException | InvocationTargetException e) {
        throw new RuntimeException(e);
    }
}

Then I started to study the Android documentation to see if I can build and install the project on my phone just from the command line. Turned out that Google did a great job by preparing Gradle tasks to do all the above. And, as a bonus, I can also automatically sign the application by my secret keys before uploading to the play store.

I used a plain Android project as a base. Then replaced certain values in build.gradle by placeholders (e.g. applicationId is replaced by ‘$tyracorn.application.id’) which will be filled in later on by toolchain. Then packed the whole directory into a zip file and place it into a toolchain project resource directory. In addition, in the project to be converted, I created a specific directory and placed configuration file and additional resources (e.g. icons in various formats) in there. Then the workflow is following.

  • Clean up target directory
  • Unzip the template
  • Copy over the source code and assets
  • Copy over the additional Android specific resources
  • Override placeholders from the template by the actual values
  • Build project
  • Install on the phone

If you are interested, then here is the source code.

/**
 * Generates the android project.
 *
 * @param args arguments
 */
public static void generate(List args) {
    try {
        // prepare directories
        System.out.println("preparing directories");

        File projDir = new File(args.get(args.size() - 1)).getCanonicalFile();
        Guard.beTrue(projDir.isDirectory(), "%s is not a directory", projDir.getAbsolutePath());
        File targetDir = new File(projDir, "target");
        if (!targetDir.isDirectory()) {
            targetDir.mkdirs();
            Guard.beTrue(targetDir.isDirectory(), "unable to create a target directory");
        }
        File androidProjectDir = new File(targetDir, "android-project");
        if (androidProjectDir.isDirectory()) {
            FileUtils.deleteDirectory(androidProjectDir);
        }
        Guard.beFalse(androidProjectDir.exists(), "android-project directory cannot exists in this point");

        // prepare properties
        Map properties = new HashMap<>();
        for (Object key : System.getProperties().keySet()) {
            properties.put((String) key, System.getProperty((String) key));
        }
        for (int i = 0; i < args.size() - 1; ++i) {
            String arg = args.get(i);
            if (arg.equals("-c")) {
                Guard.beTrue(i < args.size() - 2, "config directory must be specified after -c argument");
                File cdir = new File(args.get(i + 1));
                Guard.beTrue(cdir.isDirectory(), "config directory must be an existing directory: %s", cdir);
                System.out.println("applying config: " + cdir.getAbsolutePath());
                Map signingProps = Dut.copyMap(Props.load(new File(cdir, "signing.properties")));
                String storeFilePath = new File(cdir, signingProps.get("tyracorn.signing.storeFile")).getAbsolutePath().replaceAll("\\\\", "/");
                signingProps.put("tyracorn.signing.storeFile", storeFilePath);
                properties.putAll(signingProps);
                i = i + 1;
            }
        }

        // loading config files
        Config config = Config.load(new File(projDir, "src/main/platforms/android/config.json"));
        Pom pom = Pom.load(new File(projDir, "pom.xml"));

        // unpack template
        System.out.println("unpacking template");
        unzipResource("android-project.zip", targetDir);

        // copy files from main project
        System.out.println("copying source files from the main project");
        File srcSrcDir = new File(projDir, "src/main/java");
        File srcTargetDir = new File(targetDir, "android-project/app/src/main/java");
        FileUtils.copyDirectory(srcSrcDir, srcTargetDir);

        List excludeClasses = config.getStringList("excludedClasses");
        for (String ec : excludeClasses) {
            File cfile = new File(srcTargetDir, ec.replaceAll("\\.", "/") + ".java");
            if (cfile.isFile()) {
                Guard.beTrue(cfile.delete(), "unable to delete file %s", cfile);
            }
        }

        System.out.println("copying asset files from the main project");
        File assetsSrcDir = new File(projDir, "src/main/assets");
        File assetsTargetDir = new File(targetDir, "android-project/app/src/main/assets/external");
        FileUtils.copyDirectory(assetsSrcDir, assetsTargetDir);

        System.out.println("copying app dir from the android configuration");
        File srcAppDir = new File(projDir, "src/main/platforms/android/app");
        if (srcAppDir.isDirectory()) {
            File targetAppDir = new File(targetDir, "android-project/app");
            FileUtils.copyDirectory(srcAppDir, targetAppDir);
        }

        // adjust template
        System.out.println("merging templates");
        String gid = pom.getGroupId();
        String aid = pom.getArtifactId();
        if (aid.contains("-")) {
            String[] parts = aid.split("\\-");
            aid = parts[parts.length - 1];
        }
        String appId = gid + "." + aid;
        String appVersion = pom.getVersion().replace("-SNAPSHOT", "");
        Guard.beTrue(StringUtils.isNumeric(appVersion), "only numberic verion is supported, please look to the pom file: %s", appVersion);

        String signingStoreFile = properties.getOrDefault("tyracorn.signing.storeFile", "tyracorn-dev.jks");
        String signingStorePassword = properties.getOrDefault("tyracorn.signing.storePassword", "Password1");
        String signingKeyAlias = properties.getOrDefault("tyracorn.signing.keyAlias", "tyracorn-dev");
        String signingKeyPassword = properties.getOrDefault("tyracorn.signing.keyPassword", "Password1");

        Map vars = Dut.map(
                "tyracorn.application.id", appId,
                "tyracorn.application.version", appVersion,
                "tyracorn.signing.storeFile", signingStoreFile,
                "tyracorn.signing.storePassword", signingStorePassword,
                "tyracorn.signing.keyAlias", signingKeyAlias,
                "tyracorn.signing.keyPassword", signingKeyPassword,
                "loadingScreen", config.getString("loadingScreen"),
                "startScreen", config.getString("startScreen"));

        System.out.println("merging build properties");
        File appGradleBuild = new File(targetDir, "android-project/app/build.gradle");
        Templates.merge(appGradleBuild, vars);

        System.out.println("merging launch screens");
        File mainActivity = new File(targetDir, "android-project/app/src/main/java/com/tyracorn/android/MainActivity.java");
        Templates.merge(mainActivity, vars);

    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}

/**
 * Builds the android project.
 *
 * @param args arguments
 */
public static void build(List args) {
    generate(args);
    try {
        File projDir = new File(args.get(args.size() - 1)).getCanonicalFile();
        File targetDir = new File(projDir, "target");
        File androidProjectDir = new File(targetDir, "android-project");

        String gradlePath = androidProjectDir.getCanonicalPath() + File.separator + "gradlew.bat";

        System.out.println("building the project");
        String buildRes = Cmds.executeSimple(androidProjectDir, gradlePath, "build");
        System.out.println(buildRes);
    } catch (IOException e) {
        throw new RuntimeException(e);
    }

}

/**
 * Installs the project to the device.
 *
 * @param args arguments
 */
public static void install(List args) {
    build(args);
    try {
        File projDir = new File(args.get(args.size() - 1)).getCanonicalFile();
        File targetDir = new File(projDir, "target");
        File androidProjectDir = new File(targetDir, "android-project");

        String gradlePath = androidProjectDir.getCanonicalPath() + File.separator + "gradlew.bat";

        System.out.println("installing the project to the device");
        String installRes = Cmds.executeSimple(androidProjectDir, gradlePath, "installRelease");
        System.out.println(installRes);
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}

There is also a part which allows me to specify a directory with production signing keys. This allows me to use the same tool when making package for the play store while keeping the secrets out of the application versioning control system. I added this later on.

Now, I know this is far away from perfect, but it does a job. It saves me time. So I decided it is good enough for now and moved to the next thing. Knowing when to stop is an important part of the programming job. Not everything needs to be super generic and perfect. The main point is that it serves to you.

Step 3 - Use and Maintain

Having the compiled jar file, usage is very simple. Open the command line and call java -jar jar-file-path.jar android install path\to\project\dir. Then the toolchain generates the Android project, builds it, and installs it on the connected phone. Alternatively, it’s possible to replace install by build or generate to stop earlier in the process. In addition, adding -c path\to\config\dir after the install applies external configuration (e.g. sign apk with the production keys).

Maintenance is very simple. Unless Google decides to change the build process (which is not happening that often), then it’s only about using the toolchain and adding new capabilities when they become useful. And that’s very easy to do because the code is not trying to be generic, configurable, and scalable to millions of users. This is the type of code, which is focused only on you.

If you have read it up to here, then you can download my Tyracorn Showpark application, which was fully built by this toolchain. It is a showcase application for my game engine. I know the graphic isn’t nice at the time of writing this article, but the system works well. Eventually, I will hire a freelancer, using the system as I described in the Mastering Freelancers guide, to make it pretty.

Get it on Google Play

Summary

Now you have seen how to quickly build a toolchain. The most important thing is to figure out the piece to automate and simplify that as much as possible. Then start by creating a simple toolchain for that one purpose. Don’t try to cover everything. That will come to you over time. And finally, use it a lot to create yourself more time for whatever you love.

How to successfully apply coding standards

Many teams and companies are terrible at following coding standards. Then the code is written in a messy way, without meaningful tests, and nobody even thinks about the documentation. Excuses are always the same. Tight deadline, our case is too special, or bad management. Results are delays, overtimes, reoccurring bugs, stresses, and failed projects.

Luckily, there are teams which does pretty well. They are writing pretty code with a lot of bulletproof unit test, and awesome documentation. And they can do that only because every team member works in the same style. They follow the coding standards. I know this doesn’t sound sexy. But it works.

Therefore in this article, I am going to explain to you how to prepare, actually use, and maintain coding standards. This text is mainly intended for senior developers and team leaders. It might take you several months to go through all phases.

Preparation

This is hard work and at the same time necessary for success. Good news is that once you make the preparation, then you have your personal asset which you can reuse for years. I am successfully using the same standards, with updates, again and again for more than 6 years.

The main goals for preparation are the following.

  • Make your standards great.
  • Remove every possible excuse to not follow them.
  • Get ready to defend them.

Write it down

If it’s not in writing, then it doesn’t exist.

Start with writing down everything you have in your head. A form doesn’t matter. You can start in the word, google docs, or even plain text editor. I personally prefer Notepad++ and use Pandoc markdown syntax for this type of texts. If you have never done this before, then you will discover that it takes time to tidy things up. Let me give you some tips for writing.

Start with these 3 main topics.

  • How to code
  • How to test
  • How to document

Be imperative. You are defining standards and expecting other people to follow them. Say it straight and in a clear way. One way to do this is by boxes with rules. Like this one.

Add extra rules for negotiation. This will be especially useful if you want to apply the standards to the existing team. Chance is that someone will start complaining. Then you can use such rules for negotiation.

Include a lot of examples. When you make an example, make sure they always comply with your text. For example this one.

Keep it short, under 100 pages. If your standards have 300 pages, then most people won’t be able to even read them. And then it’s impossible to ask anyone to understand and follow such text.

Focus on priority. Choose that, and structure rules for that. Everything else needs to be just reasonable good. An example of the priority is a safe code with guaranteed low amounts of bugs. Speed, memory optimization, or code compactness are not a priority (as long as they are good enough). This is suitable for projects with very complicated logic, where other factors can be solved by scaling up the environment. Another example is the super low latency code. Direct access to the array has more priority than code safety. This is suitable for real-time processing on the embedded devices, where ever microsecond is counted.

Don’t waste real estate on things which can be automated. For example code formatting. There is no need to write down the details about every bracket. This can be done automatically by IDE. Therefore, in your standards make a single rule specifying what is considered as a “proper” and make an export of such rules. Then attach it to the standards so other people can just import it in their IDE. Following an example of the formatting rule.

Use minimal language syntax to reach your goal. Coding standards are not a textbook to teach programming language. The less language features you need to cover, the easier is for the reader to understand and follow. In fact, it is good to prohibit some patterns or language features by rules.

Make it easy to read. This is easy to say, and hard to do. Just realize that you are not writing a novel. The purpose is to make sure everyone can understand and follow the text. You don’s want to entertain, be super polite or sexually neutral. If you have some money to spend, then it’s worth to let a professional editor to help you.

Finally, if you are working with Java, or just want to have an inspiration, then Robust Java Standards be a good fit for you. I have put serious effort and many years of coding, testing, and team leading experience into this book. And these are the actual standards I am using on a daily basis.

Testing

You have written everything. This means you are ahead of most other people.

The next step is to verify that you can follow your standards easily and that they cover everything you need. You need to master your own standards.

And one more important point. The whole point of the standards is to create the long term benefit for the whole team, comparing to the situation without them. When you finish testing, you have to be truly sure, that this is the case. And you have to be able to explain why. If you can’t do that, then get back to the writing phase and improve.

The best way to start testing is by writing a private side project. It’s all about writing a big volume of various code exactly according to the standards. More is better. The bare minimum should be approximately equivalent of 1-month full-time work.

While doing this, watch for these points.

  • Do you need to break any rule to reach a goal?
  • Is there any unacceptable hit caused by standards (e.g. performance, memory or logical)?
  • Is there any rule which is already bothering you?

If you answer yes to any of these questions, then get back to the writing phase, tweak it, and do the testing again. Seriously, it would be a miracle if you make everything perfect on the first shot. Feel free to make a number of iterations here. In my case, it took roughly 4 months to produce a useful version.

If you can clearly answer no to all these questions, then congratulation!

Now you have the following.

  • Text with standards, which is tested.
  • You are sure that the team as a whole will benefit out of it and you can clearly explain why.

Therefore you are confident to use the standards in real projects and stand behind. Once someone will try to put them down (this will happen sooner or later), then you will be able to defend.

Provide early access

Readers need to have easy access. Otherwise, this could be a good excuse. The simplest way is to share the link within the company system or tools like google docs.

Once you are done with all these, then you are ready for the next phase.

Applying

This phase really depends on the situation you are in. The best case is that you are starting a new project as the leader. Here, I will present several strategies. Then you will need to adjust and combine them to fit into your situation. If you are creative and hardworking enough, then sooner or later you will be able to have your standards in place.

Strategy 1 – enforce from the beginning. This strategy works if you are on the leading position and project is at the beginning. Set up a meeting, walk through the standards and ask everybody in the team to follow that. Be ready for the questions. Strongly shutdown any attempt to break any priority point. If you don’t know what to answer, then the universal answer is that you are the boss and you have the responsibility for that decision. Don’t use this too often. In the end, ask everyone for the commitment.

Strategy 2 – follow for new code and update on touch. This strategy is helpful when you are a team leader of the running project. Call a meeting, and walk through the standards. Then ask everyone to write new code to comply with that and update any portion they touch to the standards. Same as in strategy 1. Be ready for the questions, and ask everyone for the commitment in the end.

Strategy 3 – grow it up. This strategy is suitable when you are a regular team member. And for this particular case, it’s actually even better when the project is struggling. Understand that’s a bad situation, but it’s an opportunity to step out of the crowd. By the way, that’s how I applied my own standards for the first time. Start by finding the isolated part of the code and taking exclusive ownership. A smaller portion is better. Then make the change. This should, after an initial investment, save a portion of your time. Then just do it again. And again. Until other people start recognizing you. Then you have an open door to scale up.

Strategy 4 – rewrite all. This is suitable when you are the leader, or your standards get recognized as a better way of coding. Here you take the whole project and transform it into the form which complies with standards. Many times this is the final step after you start with strategy 2 or 3.

Strategy 5 – don’t ask for permission. This involves some risk, so be ready for the case it won’t work out. The advantage is that you can do this anytime unless there is a gatekeeper who is constantly reviewing your work. Simply write your code according to the standards, without asking anyone for permission, and take it as a normal way you work. Do it until you get to the conflict with someone else. Then this is the time to put on the table what you have and prove that it’s for the benefit of the whole team. Now it’s about the conflict resolution.

Strategy 6 – wait for another opportunity. Hopefully, the reason is the project is going really well and there is no point to change anything. Therefore instead of introducing anything brand new standards, you can improve the existing ones and grow up on it. That’s pretty good because you can build up your track record and use the knowledge for your next big thing. The other extreme might be that leadership shuts down any effort for improvement, while the project is struggling. I know this sucks, regardless of the reason behind. In such a case, it’s better to minimize your effort on this ship and look for another opportunity.

Regardless of the strategy choice, you can consider this phase as successful as soon as some reasonable piece of the product is written according to your rules. And other people must respect that. You don’t necessarily need to be their boss. But you need to have a definite word about that concrete part.

If you have that, then you are ready for further expansion. And it makes sense for you to go for the next phase.

Long term maintenance

The whole work up to now is useful only if you can benefit out of it for a long time. In order to do this, there are 2 things to care about. Make sure that rules are not fading out, and make sure they stay actual.

Code reviews

Code reviews are the primary instrument for making sure the rules are being followed. Naturally, this requires more effort at the beginning, which gradually decreases when people are learning.

Some background. Common project layout is that people are committing to the repository (to understand the following text, you will need some knowledge about Mercurial or Git). These repositories support branches, where people are typically making their commits. Later branches are merged into the main branch as a set of commits which implement the particular request.

I prefer to separate people into 3 groups, based on the way how the merge is done.

  1. People who are verified and doing great. They can merge to the main branch without asking. Typically these are people I work with for a long time and we have mutual trust.
  2. People who need guidance. They need to ask for approval before merging. I review the code and give them permission, or list of things to change. Then repeat the process until necessary. Eventually, these people will move to group #1. This setup is typical for people who are new in the team.
  3. Always review. This is the group which requires review all the time. I use the repository set up which prevents these people to merge. Then do the same review as #2 and merge on my own (or ask someone from the previous groups). This setup is handy for vendor type partners when I even don’t know who is doing the actual work. And these people might switch very often.

How to write a list of things to change. Make it simple, just location and comments. For example this way.

  • UserService – line 54 – input parameter XYZ cannot be used in the interface
  • Account, Car – consistency validation is missing
  • Account – line 123 – getter returns an unsafe object which might cause the side effect, make the defence copy or return it with an unmodifiable wrapper
  • … other points

If the list is long, or some priority points are there, then it’s a good practice to walk through it with the person who is doing the work. And in some cases it makes sense to make this in the formal way, so you can track the difference between in code quality between the reviews. Over time, this list should shrink significantly and shouldn’t contain any priority point.

Finally, it’s good to know when to stop. Wasting too much time on a minor stylish issue which has nothing to do with the actual quality of work is not good. So tolerate that 1% (these are that extra rules you prepared up front). People are not robots.

Making exceptions

Time to time you might get into the situation when you have to make a decision whether to make an exception or not. And here I mean that type of exception which breaks one of the important rules. Here is how to handle this.

  1. Take time to think about it. Never allow exception right away.
  2. Try on your own to figure out how to satisfy the goal while keeping the standards in place.
  3. If you succeed, then your answer is no and you need to support it with your solution.
  4. If you can’t make it, then figure out the minimal possible break of standards and allow just this in that particular case. Nothing more.

And remember, making an exception should happen very rarely. If you are thinking about making an exception once a month or so, then there is something wrong with the standards.

Hiring process

This is the best. If you can, give the candidate your standards to read and then let him write a little bit of code following that. You will quickly know whether he is capable and willing to code according to it. If yes, then you can invite him for the interview. During the interview, cover the topic of the company standards and ask him to respect that. If he says yes, then you just made a great deal.

Updated and improvements

Updates and improvements are the nature of every living product, coding standards are the same. Therefore I recommend to constantly looking for ways how to make things better.

Tip: Good candidates are the ways which reduces the amount of code while keeping quality.

Before you do an update. Realize that you already have a solid base. Don’t rush for an update.

Test first. Every update can be tested in the separate project, or at least in the limited scope. If the update turns out to be bad, then it’s easy to rollback. On the other hand, if the update turns out to be good, then you are ready to spread this across everything.

Making update. First, write it down. Then announce the update. During the announcement, demonstrate the new update and ask everyone to write new code according to that, and update the old code when a particular part is touched. You might do that by email or in person. In the end, ask for everyone to confirm the message.

Conclusion

Having quality coding standards in place is really a great investment. You need to make the hard work at the beginning, and then you can live from that for years.

While protecting all the rules, please stay human and be grateful for the people who are working hard. Words like thank you, reasonable praising, and time to time nice surprises to your teammates are helping to make a nice environment.

Just do it, and enjoy your life!

Brittle POJO

POJO or JavaBean is popular pattern in Java for data holders. These are simple objects with the main purpose to hold the data in memory. Many frameworks are using them. And they can be created very fast, every IDE has feature to generate getters and setters. Unfortunately such easy pattern is designed to produce brittle code. This will cause you more and more troubles as project grows and gets more complicated. This article is going to demonstrate several issues that come with POJO pattern.

Note: Precondition for further reading is a basic understanding of Java and JUnit.

Example 1

Let’s define Rectangle class like this.

public class Rectangle {

    private double width;
    
    private double height;

    public double getWidth() {
        return width;
    }

    public void setWidth(double width) {
        this.width = width;
    }

    public double getHeight() {
        return height;
    }

    public void setHeight(double height) {
        this.height = height;
    }
    
}

Then, what about equality of 2 rectangles? Naturally, I would expect 2 rectangles to be equal as soon as they are same in width and height. Following the actual behavior of Rectangle class definition. Comments on the right show the console output.

Rectangle rect1 = new Rectangle();
rect1.setWidth(1);
rect1.setHeight(2);
Rectangle rect2 = new Rectangle();
rect2.setWidth(1);
rect2.setHeight(2);

System.out.println(rect1.equals(rect1));      // true
System.out.println(rect2.equals(rect2));      // true
System.out.println(rect1.equals(rect2));      // false
System.out.println(rect2.equals(rect1));      // false

As you see, the different instances of the Rectangle are not considered equal, although they are same in width and height. This for example affects the behavior of collections.

List list = new ArrayList<>();
list.add(rect1);
System.out.println(list.contains(rect1));     // true
System.out.println(list.contains(rect2));     // false

Set set = new HashSet<>();
set.add(rect1);
System.out.println(set.contains(rect1));     // true
System.out.println(set.contains(rect2));     // false

Map<Rectangle, Integer> map = new HashMap<>();
map.put(rect1, 2);
System.out.println(map.containsKey(rect1));     // true
System.out.println(map.containsKey(rect2));     // false

Here contains method always returns false, unless the tested object is the object instance which was added in. Same for the map keys. That means you can’t test whether 2 collections are equal unless they hold the exact same object instances. This creates a huge complications in unit tests. For example, let’s define RectangleParser interface.

public interface RectangleParser {
    public List parse(String input);
}

Next, imagine there is an implementation called SuperParser which needs to be tested. Because there is no way how to test the equality of 2 rectangle objects, then all properties have to be compared manually. Something like this.

RectangleParser parser = new SuperParser();
List rects = parser.parse("rectangle[1,2],recangle[2,3]");
assertEquals(2, rects.size());
assertEquals(1d, rects.get(0).getWidth(), 0d);
assertEquals(2d, rects.get(0).getHeight(), 0d);
assertEquals(2d, rects.get(1).getWidth(), 0d);
assertEquals(3d, rects.get(1).getHeight(), 0d);

And this is very often the reason why many developers don’t write unit tests at all. Very typical excuses are:

  • We don’t have time for unit tests
  • Business logic in our project is too complicated
  • I am a developer, not a tester

Many others just write unit tests for simplest units, or end up by comparing for example only list sizes without having any clue about the objects inside. Such tests are just good to show the green bar to non-tech managers and doesn’t bring any real value to the project. What brings real value to the project are unit tests of the most complex units with deep comparison. Unfortunately lack of equals method in data holding objects makes impossible to create them. Therefore let’s improve that.

Example 2

Adding equals method. This method comes together with hashCodemethod. For those who haven’t done this yet, I would recommend to spend 5 minutes and read about them in Javadoc. Here is the next version of the Rectangle class.

public class Rectangle {

    // ... same properties with getters and setters as before

    @Override
    public int hashCode() {
        return (int) width + 13 * (int) height;
    }

    @Override
    public boolean equals(Object obj) {
        if (obj == null) {
            return false;
        }
        if (!(obj instanceof Rectangle)) {
            return false;
        }
        Rectangle other = (Rectangle) obj;
        return other.width == width && other.height == height;
    }

}

The outcome of previous code after adding equals and hashCode methods.

Rectangle rect1 = new Rectangle();
rect1.setWidth(1);
rect1.setHeight(2);
Rectangle rect2 = new Rectangle();
rect2.setWidth(1);
rect2.setHeight(2);

System.out.println(rect1.equals(rect1));      // true
System.out.println(rect2.equals(rect2));      // true
System.out.println(rect1.equals(rect2));      // true
System.out.println(rect2.equals(rect1));      // true

List list = new ArrayList<>();
list.add(rect1);
System.out.println(list.contains(rect1));     // true
System.out.println(list.contains(rect2));     // true

Set set = new HashSet<>();
set.add(rect1);
System.out.println(set.contains(rect1));     // true
System.out.println(set.contains(rect2));     // true

Map<Rectangle, Integer> map = new HashMap<>();
map.put(rect1, 2);
System.out.println(map.containsKey(rect1));     // true
System.out.println(map.containsKey(rect2));     // true

You see, rectangles are considered to be equal and it is possible to test whether collections contains the specific one. Then the test case for RectangleParser can be rewritten in this way.

RectangleParser parser = new YourSuperParser();
List expected = Arrays.asList(... insert the rectangles...);
List rects = parser.parse("rectangle[1,2],recangle[2,3]");
assertEquals(expected, rects);

This is much better, because objects are deeply compared. Such tests are much more robust than the previous ones so developers can seamlessly catch and fix the (side) effects of the code changes. Seems like a problem solved. Unfortunately such way brings another issue. Look at this.

Rectangle rect1 = new Rectangle();
rect1.setWidth(1);
rect1.setHeight(2);
Rectangle rect2 = new Rectangle();
rect2.setWidth(1);
rect2.setHeight(2);

Set set = new HashSet<>();
set.add(rect1);
System.out.println(set.contains(rect1));     // true
System.out.println(set.contains(rect2));     // true

rect1.setWidth(5);

System.out.println(set.contains(rect1));     // false
System.out.println(set.contains(rect2));     // false

Now rectangle was inserted to the set. Since both rectangles are equal, then set returns true when calling contains method. Next, original rectangle was changed. That means it’s hash code value changed as well. But set isn’t aware of this change. That means it keeps the object in the wrong bucket. And therefore it looks like the rectangle disappeared from the set. In this example it is easy to spot, but it’s very hard to find it when same situation happens in a large system. This means that invocation of public method can easily break completely different portion(s) of the application.

This is the problem of all mutable patterns. You might solve it by convention to say, no one is going to call setter after the object is constructed. It might or might not work out for you. I have personally chosen not to rely on such convention.

Example 3

What about popular inheritance? Let’s extend Rectangle class and add color in there.

public class ColorRectangle extends Rectangle {

    private int color;

    public int getColor() {
        return color;
    }

    public void setColor(int color) {
        this.color = color;
    }

    @Override
    public int hashCode() {
        return (int) getWidth() + 13 * (int) getHeight() + 169 * color;
    }

    @Override
    public boolean equals(Object obj) {
        if (obj == null) {
            return false;
        }
        if (!(obj instanceof ColorRectangle)) {
            return false;
        }
        ColorRectangle other = (ColorRectangle) obj;
        return other.getWidth() == getWidth() && other.getHeight() == getHeight() && other.color == color;
    }

}

And again a small test.

Rectangle rect1 = new Rectangle();
rect1.setWidth(1);
rect1.setHeight(2);
ColorRectangle rect2 = new ColorRectangle();
rect2.setWidth(1);
rect2.setHeight(2);
rect2.setColor(0x00ffff00);

System.out.println(rect1.equals(rect2));     // true
System.out.println(rect2.equals(rect1));     // false

Result is that rect1 is equal to rect2, and rect2 is not equal to rect1. That means the symmetric relation for equals method is broken. And it is proven that if you extend some class and add extra property into the child one, then there is no way how to make equals method work according contract written in Javadoc, unless the parent class is aware of the child. This can easily cause a weird behavior which is hard to uncover.

Other Issues

Regarding consistency. POJO objects are not guaranteed to be consistent. Properties are set one by one after the construction. Objects might be in invalid state and validation has to be invoked somehow externally. This means another responsibility for users. In addition, any later call of the setter might put object into an invalid state again.

Regarding thread safety. POJO objects are not thread safe by definition. This brings another limitations to the users.

Conclusion

In this article I have demonstrated several issues with POJO pattern. For those reasons I have decided to use purely immutable objects with prohibition of inheritance as a main data holders. These objects might be constructed for example by builder pattern or static factory method. Like this.

Rectangle rect1 = new Rectangle.Builder().
        setWidth(1).
        setHeight(2).
        build();
Rectangle rect2 = Rectangle.ceate(1, 2);

Important is that every object is guaranteed to be in valid state and immutable for the whole life time. Therefore it is safe to use such objects in collections and multi threaded environment. Mentioned issued just doesn’t exists. POJOs are still good as a bridge to various framework as soon as they are used purely inside that integration layer, and never ever leak to the core application code. If you would like to get more details about this topic, then Effective Java written by Joshua Bloch is a great resource.

Bitcoin and JUnit

Bitcoin and blockchain are hot topics today. Many related projects are already out there and much more are being developed right now. If you are a developer in this area, then you know how important and tricky is to have bullet proof testing in place. In this article I will briefly describe the options for testing bitcoin and then go into more detail of the one which you can run easily and offline. Please note that this article is focusing purely on Bitcoin.

Testing Options

There are 3 possible modes or stages for testing bitcoin applications.

  1. Regtest mode
  2. Test on the testnet
  3. Test on the mainnet

Regtest mode is what I would recommend to start with. It allows you to run the node on your local computer just as a sandbox, completely isolated from the internet. And it has one very useful feature. You are fully in control of the block generation and new blocks can be ‘mined’ just by calling a simple command. This removes the need to wait for blocks to be mined. Also you have unlimited number of coins to play with, because all the mining awards go to the account on your local node. Testnet, as name suggested, is a place which behaves almost the same as real Bitcoin network, is a fully functional network. This includes real mining, necessary waiting time and a need to account for activity of other people. Differences are that coins doesn’t have any value, anyone can get small amount of free test coins and the whole network gets nuked time to time. It’s a good next step after having a working system on the regtest. Finally mainnet is the network where the real transactions are happening. Bugs can become very expensive here. So better to leave this for the final test, after being confident that everything is working as planned. I believe that during application development you should walk through all 3 stages. The rest of this article will show how to connect regtest mode with junit. First, let’s just get a little bit familiar with bitcoind, a program which implements Bitcoin protocol and can act as a node.

Running Local Node

Start with software installation. I prefer Bitcoin Unlimited. Download latest version of Official Bitcoin (BTC) Release for your platform and unzip / install it. Open console and navigate to the folder with binaries. Local node in the regtest mode can be started following command.

  • bitcoind -regtest -txindex=1 -server -rpcallowip=127.0.0.1 -rpcport=18332 -rpcuser=user -rpcpassword=Password1

There are many parameters you can put into the command, reference is here. Important ones are those which specify regtest mode, data directory and open JSON-RPC port for clients. Next step is to open second console, navigate to the same directory, and use JSON-RPC client to perform actions. Following sequence generates blocks, sends coins to the specified address and generates next set of blocks to simulate transaction progress in chain.

  • bitcoin-cli -regtest -rpcport=18332 -rpcuser=user -rpcpassword=Password1 generate 101
  • bitcoin-cli -regtest -rpcport=18332 -rpcuser=user -rpcpassword=Password1 getbalance
  • bitcoin-cli -regtest -rpcport=18332 -rpcuser=user -rpcpassword=Password1 sendtoaddress put_your_address_here 10
  • bitcoin-cli -regtest -rpcport=18332 -rpcuser=user -rpcpassword=Password1 generate 7

If you have made it up to here, then you should be able to test your application and restart chain from the beginning (by deleting data directory) whenever you want. For the moment all done manually. Next chapter will show you how to automate this.

Regtest and JUnit

The example project is a standard java maven project which contains a single unit test and can be invoked from command line just by mvn test. Bitcoin binaries for Windows are included. If you are using different platform, then please download and replace the binaries.

Coding is pretty straightforward and can be summarized in the following points.

  • Clean up the data directory and start bitcoind process inside the test setup
  • Tested client connects to the node during the test case as needed
  • New blocks are generated on demand (there is a method for that)
  • Bitcoind process is stopped during the test tear down

Note: Depending on your environment you might need to deal with 2 issues – permissions and firewall.

Here is how you start new bitcoind process.

ProcessBuilder processBuilder = new ProcessBuilder(BIN_DIR_PATH + "/bitcoind.exe", "-regtest",
        "-datadir=" + dataDir.getAbsolutePath(), "-txindex=1", "-server",
        "-rpcallowip=127.0.0.1", "-rpcport=18332", "-rpcuser=user", "-rpcpassword=Password1");
processBuilder.directory(new File("src/test/resources/bitcoinUnlimited-1.0.3/bin"));
try {
    bcProcess = processBuilder.start();
    Thread.sleep(5000);
} catch (IOException e) {
    throw new RuntimeException("error during process start", e);
} catch (InterruptedException e) {
    throw new RuntimeException(e);
}

Variable bcProcess is defined in the test class and is used in a tear down method to close the process. 5 seconds thread sleep is more tricky. processBuilder.start() method returns immediately when process is started. Unfortunately the bitcoind is not initialized at that point so connection would fail. In this particular case reasonable waiting time just makes the job.

Next how to stop the process.

        try {
            bcProcess.getInputStream().close();
            bcProcess.getErrorStream().close();
            bcProcess.getOutputStream().close();
            bcProcess.destroy();
            bcProcess = null;
            Thread.sleep(1000);
        } catch (IOException e) {
            throw new RuntimeException(e);
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }

That’s the whole “magic”. The rest of the code just cleans up the working directory before test and invokes the actual test. For more details, please look to the source code.

Summary

As you can see the java code to run bitcoin node and interact with it is easy and it works. You can write test cases which runs relatively fast and have guaranteed environment. Portion which I don’t really like is the dependency on the native program. Especially while imagining handling multiple chains on the multiple systems. Then the amount of necessary binaries can grow up significantly. I have in my mind couple of ways how to resolve that. Please let me know if you have an elegant solution for that.

Anomaly detection using the Bag-of-words model

I am going to show in detail one unsupervised learning. The major use case is behavioral-based anomaly detection, so let’s start with that. Imagine you are collecting daily activity from people. In this example there are 6 people \(S_1 – S_6\) When all the data are sorted and pre-processed, then result might look like this list.

  • \(S_1 =\) eat, read book, ride bicycle, eat, play computer games, write homework, read book, eat, brush teeth, sleep
  • \(S_2 =\) read book, eat, walk, eat, play tennis, go shopping, eat snack, write homework, eat, brush teeth, sleep
  • \(S_3 =\) wake up, walk, eat, sleep, read book, eat, write homework, wash bicycle, eat, listen music, brush teeth, sleep
  • \(S_4 =\) eat, ride bicycle, read book, eat, play piano, write homework, eat, exercise, sleep
  • \(S_5 =\) wake up, eat, walk, read book, eat, write homework, watch television, eat, dance, brush teeth, sleep
  • \(S_6 =\) eat, hang out, date girl, skating, use mother’s CC, steal clothes, talk, cheating on taxes, fighting, sleep

\(S_1\) is set of the daily activity of the first person, \(S_2\) of the second one and so on. If you look at this list, then you can pretty easily recognize that activity of \(S_6\) is somehow different from the others. That’s because there are only 6 people. What if there were 6 thousand? Or 6 million? Unfortunately there is no way you could recognize the anomalies. And that’s what machines can do. Once a machine can solve such problem in a small scale, then it can usually handle the large scale relatively easy. Therefore the goal here is to build an unsupervised learning model which will identify the \(S_6\) as an anomaly.

What is this good for? Let me give you 2 examples.

The first example is traditional audit log analysis for the purpose of suspicious activity detection. Let’s look at e-mail. Almost everyone has his own usage pattern on day-to-day basis. If this pattern suddenly changes, then this is considered “suspicious”. It might mean that someone has stolen your credentials. And it can also mean that you just changed your habit. Machines can’t know the underlying reason. What machines can do is analyze millions of accounts and pick up only the suspicious ones, which is typically a very small number. Then the operator can manually call to these people and discover what is going on.

Or imagine you are doing pre-sales research. You employ an agency to make a country-wise survey. And there is a question like ‘Please give us 40-50 words feedback’. Let’s say you have got 30,000 responses which satisfies the length. Now you want to choose the responses which are somehow special. Because they might be extremely good, extremely bad, or just interesting. All of these give you valuable insight and possibly direction for the future. Since the overall amount is relatively high, then any human would certainly fail in this job. For machines, this is just a piece of cake.

Now let’s look at how to teach the machine to do the job.

Example project (button for download is right above this paragraph) is a standards java maven project. Unpack it into any folder, compile by ‘mvn package‘, and run by executing ‘java -jar target/anomalybagofwords-1.0.jar 0.5 sample-data-small.txt‘. If you run the program this way, it will execute the described process over the cooked data set and identifies \(S_6\) as an anomaly. If you want to drill down the code, then start with ‘BagofwordsAnomalyDetectorApp‘ class.

Terminology

Let’s briefly establish useful terminology.

Bag of words is a set of unique words within a text, where each word is paired with the number of its occurrence. One specific point is that the order of words is ignored by this structure. If word is not presented in the text, then its occurrence will be considered as \(0\). For example bag of words for ‘eat, read book, ride bicycle, eat, play computer games, write homework, read book, eat, brush teeth, sleep‘ can be written as a following table.

WordNumber of occurrences
eat3
read2
book2
ride1
bicycle1
play1
computer1
games1
write1
homework1
brush1
teeth1
sleep1

Sometimes you can find the visualization as a histogram. For example this one.

Notation \(B(x)\) will be used for bag of words. Following the example for \( S_1\).

\(B(S_1) = \left( \begin{array}{cc}
eat & 3 \\
read & 2 \\
book & 2 \\
ride & 1 \\
bicycle & 1 \\
play & 1 \\
computer & 1 \\
games & 1 \\
write & 1 \\
homework & 1 \\
brush & 1 \\
teeth & 1 \\
sleep & 1 \end{array} \right)\)

Next term is a distance between 2 bags of words. Distance will be written as \(|B(x) – B(y)|\) and is calculated as a sum of absolute values of the differences for all words appearing in both bags. Following the example.

\(|B(read\ article\ and\ book) – B(write\ book\ and\ book)| = \\
= \left|
\left( \begin{array}{cc}
read & 1 \\
write & 0 \\
article & 1 \\
and & 1 \\
book & 1 \end{array} \right) –
\left( \begin{array}{cc}
read & 0 \\
write & 1 \\
article & 0 \\
and & 1 \\
book & 2 \end{array} \right) \right| = \textbf{4}\)

Applying this definition, you can calculate the distance between all the example sequences. For example \(|B(S_1) – B(S_2)| = \textbf{12}\) and \(|B(S_1) – B(S_6)| = \textbf{30}\). The latter is higher, because the \(S_1\) and \(S_6\) are more different in words than. This is analogy to the distance between 2 points in the space.

Last term is probability density function. Probability density function is a continuous function defined over the whole real numbers space which is greater or equal to zero for every input and integral over the whole space is 1. Notation \(P(x)\) will be used. More formally this means the following.

\(P(x) \ge 0 \quad \forall x \in \mathbb{R} \\
\int_{\mathbb{R}}P(x) = 1\)

Typical example of probability density function is normal distribution. Example source code is using more complex one, called normal distribution mixture. Parameter \(x\) is called random variable. In very simplistic way, the higher \(P(x)\) is, the more “likely” variable \(x\) is. If \(P(x)\) is low, then variable \(x\) is falling away from the standard. This will be used when setting up the threshold value. Finally let’s make a note about how to create a probability density from finite number of random variables. If \([x1,…,x_N]\) is the set of N random variables (or samples you can collect), then there is a process called estimation which transforms this finite set of numbers into a continuous probability density function \(P\). Explanation of this process is out of scope for this article, just remember there is such thing. In particular, attached example is using a variation of EM algorithm.

Notmal distribution mixture estimated from 5000 samples.

Process

Now it’s a time to explain the process. The whole process can be separated into 2 phases called training and prediction. Training is the phase where a all the data is iterated through and relatively small model is produced. This is usually the most time consuming operation and outcome is sometimes called predictive model. Once model is prepared, then prediction phase comes into place. In this phase an unknown data record is examined by model. Next let’s drill down the details.

Training phase

There are required 2 inputs for the training phase.

  • Set of activities \([S_1, …, S_N]\) This might be the example set from the beginning.
  • Sensitivity factor \(\alpha\) which is just the number initially picked up by human that \(\alpha \geq 0\) More on this one later.

The whole process is pretty straightforward and you can find the implementation in the source code, class BagofwordsAnomalyDetector, method performTraining.

  1. For each activity, calculate a bag of words. Result of this step is \(N\) bags of words \([B(S_1), …, B(S_N)]\)
  2. Calculate random variables. One random variable is calculated for each bag of words. Result of this step is \(N\) random variables \([x_1, …, x_N]\). Formula for calculation is following
    \(x_i = \frac{\sum_{j=1}^N |B(S_i) – B(S_j)|}{N} \quad \forall i = 1..N\)
  3. Estimate probability density function \(P\). This process takes random variables \([x_1, …, x_N]\) and produces probability density function \(P\). Variation of EM algorithm is used in the example program.
  4. Calculate the threshold value \(\theta\). Value is calculated according following formula.
\(\theta = \frac{\sum_{i=1}^N P(x_i)}{N} * \alpha\)

Regarding the sensitivity factor \(\alpha\). The higher \(\alpha\) is, the more activities will be identified as anomaly. Problem with unsupervised learning model is that data is not labeled and therefore there is no way to know what the correct answers are and how to set up the optimal \(\alpha\). Therefore some rule of thumbs are used instead. For example set up \(\alpha\) to report reasonable percentage of activity as anomaly. Typically it is required that the amount of identified anomalies must be manageable by the human investigators. In the bigger system there is usually a feedback loop which incrementally adjusts the until the optimal value is reached. This is then called reinforcement learning. For this small example, \(\alpha\) was picked up manually as  by try and error just to reach the goal.

  1. Store all bags of words, \(P\) and \(\theta\) for later usage.

When training phase finishes, then model is ready to be used in prediction phase.

Prediction phase

This is the phase when potentially unseen activities are tested by model. Model evaluates them and returns whether activities are considered as an anomaly or not.

The whole process works for each activity \(S_U\) separately, U stands for “unknown”. And it can be summarized by these points.

  1. Calculate bag of words \(B(S_U)\)
  2. Calculate random variable \(x_U\) as
    \(x_U = \frac{\sum_{i=i}^N |B(S_i) – B(S_U)|}{N}\)
  3. If \(P(x_U) \le \theta\) than activity \(S_U\) is considered as anomaly. Otherwise activity is considered as normal.

Summary

You have learned about relatively simple model for identifying unusual sequence from a bulk of them. Now you can play with source code, try different variations and see how this affect the result. Here are few ideas to start with.

  • Normalize bags of words. In other words don’t count the absolute number, just relative frequency.
  • Use chunks of more than one word. This is then called n-gram model.
  • Try to implement different ways how to measure distance between items, for example sequence alignment.

Key takeaways

  • There is no knowledge about what the correct outcome is at the beginning of the unsupervised learning. Therefore best guess and possibly feedback loop are implemented.
  • Predictive models are usually built in the training phase and then used to classify the unknown data in the prediction phase.
  • In order to be able find the outliers, abstract features like sentences or actions need to be transformed into a measurable form. After that probability and statistics are used to establish the baselines and find the outliers.

Unit-Level Performance tuning in Java

When it comes to performance testing, I hear a lot about having a dedicated environment, funky tools like JMeter or Apica, and complicated scenarios. These take a lot of effort to set up and maintain. Therefore, I like to first make sure that the most critical units are well-optimized without any of these tools. One way to make this is through unit-level performance test apps. What’s great about these apps is that there is no need for any special tool, they can be ready to go within a minutes and they are proven to save a lot of time, money, and calls from angry customers.

In this article, I am going to share an example of such a test app. You can do the same in your projects.

Technology stack:

  • Netbeans IDE
  • Java
  • Maven

Testable Unit

In order to be able to run performance tests for a single unit, there is a need to have well-defined and testable units. Let’s work with an example (I have cooked this one, but you will get the idea).

This is a module for message broadcasting. Core method accepts pipe delimited String as an input, extracts parameters, finds the appropriate username, and broadcasts the messages. Following the implementation.

package com.enterprisemath.articles.unitperformance;

/**
 * Provider for user related data.
 * 
 * @author radek.hecl
 */
public interface UserProvider {

 /**
  * Returns user name.
  * 
  * @param userId user id
  * @return user name
  */
 public String getUserName(String userId);

}

// -----------------------------------------------------

package com.enterprisemath.articles.unitperformance;

import java.util.Date;

/**
 * Service for message broadcast.
 * 
 * @author radek.hecl
 */
public interface BroadcastService {

 /**
  * Broadcasts message.
  * 
  * @param userName user name
  * @param timestamp timestamp
  * @param message message
  */
 public void broadcastMessage(String userName, Date timestamp, String message);

}

// -----------------------------------------------------

package com.enterprisemath.articles.unitperformance;

import com.enterprisemath.utils.Dates;
import com.enterprisemath.utils.ValidationUtils;
import java.util.Date;
import org.apache.commons.lang3.builder.ToStringBuilder;

/**
 * Module for message broadcasting.
 *
 * @author radek.hecl
 */
public class MessageBroadcastModule {

 /**
  * Provider for user data.
  */
 private UserProvider userProvider;

 /**
  * Service for broadcast.
  */
 private BroadcastService broadcastService;

 /**
  * Creates new instance.
  */
 private MessageBroadcastModule() {}

 /**
  * Guards this object to be consistent. Throws exception if this is not the case.
  */
 private void guardInvariants() {
  ValidationUtils.guardNotNull(userProvider, "userProvider cannot be null");
  ValidationUtils.guardNotNull(broadcastService, "broadcastService cannot be null");
 }

 /**
  * Processes message.
  *
  * @param message message
  */
 public void processMessage(String message) {
  String[] parts = message.split("\\|");
  String userName = null;
  String txt = null;
  Date now = null;
  for (String part: parts) {
   String[] subs = part.split("=", 2);
   if (subs[0].equals("userId")) {
    userName = userProvider.getUserName(subs[1]);
   }
   else if (subs[0].equals("timestamp")) {
    now = Dates.parse(subs[1], "yyyy/MM/dd HH:mm:ss");
   }
   else if (subs[0].equals("message")) {
    txt = subs[1];
   }
  }

  ValidationUtils.guardNotNull(userName, "wrong input message, user is missing");
  ValidationUtils.guardNotNull(txt, "wrong input message, text is missing");
  ValidationUtils.guardNotNull(now, "wrong input message, timestamp is missing");
  broadcastService.broadcastMessage(userName, now, txt);

 }

 @Override
 public String toString() {
  return ToStringBuilder.reflectionToString(this);
 }

 /**
  * Creates new instance.
  *
  * @param userProvider provider for user data
  * @param broadcastService broadcast service
  * @return created object
  */
 public static MessageBroadcastModule create(UserProvider userProvider, BroadcastService broadcastService) {
  MessageBroadcastModule res = new MessageBroadcastModule();
  res.userProvider = userProvider;
  res.broadcastService = broadcastService;
  res.guardInvariants();
  return res;
 }

}

If you want to compile this, here is a POM file with dependencies:

<project
    xmlns="http://maven.apache.org/POM/4.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.enterprisemath.articles</groupId>
    <artifactId>unitperformance</artifactId>
    <packaging>jar</packaging>
    <version>1.0-SNAPSHOT</version>
    <name>unitperformance</name>
    <url>http://maven.apache.org</url>
    <dependencies>
        <dependency>
            <groupId>com.enterprisemath</groupId>
            <artifactId>em-utils</artifactId>
            <version>2.4.0</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.10</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.mockito</groupId>
            <artifactId>mockito-core</artifactId>
            <version>1.9.5</version>
            <scope>test</scope>
        </dependency>
    </dependencies>
</project>

The core method is called processMessage. In more detail, this method does following:

  1. Splits input into parts.
  2. Maps relevant parameters and finds the username.
  3. Validates that all mandatory parameters for this module are presented.
  4. Broadcasts the message.

If you want to properly test this method, then test case has to cover following:

  1. Positive case, when message is broadcasted
  2. Negative cases, when validation failed
  3. Make sure that no harmful side effect is caused

And here is the example impementation.

package com.enterprisemath.articles.unitperformance;
import com.enterprisemath.utils.Dates;
import com.enterprisemath.utils.Month;
import org.apache.commons.lang3.builder.ToStringBuilder;
import static org.junit.Assert.assertTrue;
import static org.junit.Assert.fail;
import org.junit.Before;
import org.junit.Test;
import static org.mockito.Mockito.mock;
import static org.mockito.Mockito.verify;
import static org.mockito.Mockito.verifyNoMoreInteractions;
import static org.mockito.Mockito.when;

/**
 * Test case for message broadcast module.
 *
 * @author radek.hecl
 */
public class MessageBroadcastModuleTest {

 /**
  * Provider for user data.
  */
 private UserProvider userProvider;

 /**
  * Service for broadcast.
  */
 private BroadcastService broadcastService;

 /**
  * Tested module.
  */
 private MessageBroadcastModule module;

 /**
  * Creates new instance.
  */
 public MessageBroadcastModuleTest() {}

 /**
  * Sets up test environment.
  */
 @Before
 public void setUp() {
  userProvider = mock(UserProvider.class);
  broadcastService = mock(BroadcastService.class);
  module = MessageBroadcastModule.create(userProvider, broadcastService);
 }

 /**
  * Tests message processing.
  */
 @Test
 public void testProcessMessage() {

  when(userProvider.getUserName("1")).thenReturn("John Seaman");
  module.processMessage("userId=1|message=Hello world|timestamp=2017/01/01 12:00:00|ip=127.0.0.1|city=Brno|age=12|occupation=student");

  try {
   module.processMessage("message=Hello world|timestamp=2017/01/01 12:00:00|ip=127.0.0.1|city=Brno|age=12|occupation=student");
   fail("exception expected");
  } catch (RuntimeException e) {
   assertTrue(e.getMessage(), e.getMessage().contains("wrong input message, user is missing"));
  }

  try {
   module.processMessage("userId=1|message=Hello world0|ip=127.0.0.1|city=Brno|age=12|occupation=student");
   fail("exception expected");
  } catch (RuntimeException e) {
   assertTrue(e.getMessage(), e.getMessage().contains("wrong input message, timestamp is missing"));
  }

  try {
   module.processMessage("userId=1|timestamp=2017/01/01 12:00:00|ip=127.0.0.1|city=Brno|age=12|occupation=student");
   fail("exception expected");
  } catch (RuntimeException e) {
   assertTrue(e.getMessage(), e.getMessage().contains("wrong input message, text is missing"));
  }

  verify(broadcastService).broadcastMessage("John Seaman", Dates.createTime(2017, Month.JANUARY, 1, 12, 0, 0), "Hello world");

  verifyNoMoreInteractions(broadcastService);
 }

 @Override
 public String toString() {
  return ToStringBuilder.reflectionToString(this);
 }
}

In the unit test you can see one happy case and 3 cases where validation failed. In addition you can see there invocation of verifyNoMoreInteractions to check that broadcastService doesn’t broadcasted any unwanted messages. That would be a harmful side effect. On the other hand userProvider doesn’t need to have this protection, because it performs read only operation (assuming it is truly read only and multiple readings are not causing anything harm). Regardless your coding style it is important to make similar work and make sure code is logically correct before starting with any optimization.

Test Application and Profiler

Now, when you have a code separated to the isolated unit and well defined unit tests, you are ready to start optimizing the performance. Before writing anything, ask yourself a question: Is this a critical part of the application? If answer is not, then it’s better not to optimize. Examples of critical parts are:

  • Methods called hundreds of millions of times.
  • Methods that process a lot of records.
  • Methods aggregating data from third parties in (near) real-time.

If you evaluated your method as a critical part of the application, then it is time to write the test application. Typically, you can place the test application right next to the unit tests. Here is the example.

package com.enterprisemath.articles.unitperformance;

import com.enterprisemath.utils.Dates;
import com.enterprisemath.utils.Month;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import org.apache.commons.lang3.time.DateUtils;
import static org.mockito.Matchers.any;
import static org.mockito.Mockito.mock;
import static org.mockito.Mockito.when;
import org.mockito.invocation.InvocationOnMock;
import org.mockito.stubbing.Answer;

/**
 * Performance test application for message broadcast module.
 *
 * @author radek.hecl
 */
public class MessageBroadcastModulePerformanceTestApp {

 /**
  * Prevents construction.
  */
 private MessageBroadcastModulePerformanceTestApp() {}

 /**
  * Main method.
  *
  * @param args arguments
  */
 public static void main(String args[]) {

  //
  // set up
  System.out.println("Setting up and generating test data");

  UserProvider userProvider = mock(UserProvider.class);

  BroadcastService broadcastService = new BroadcastService() {
   public void broadcastMessage(String userName, Date timestamp, String message) {}
  };

  MessageBroadcastModule module = MessageBroadcastModule.create(userProvider, broadcastService);

  System.out.println("Generating test data");

  when(userProvider.getUserName(any(String.class))).thenAnswer(new Answer < String > () {
   public String answer(InvocationOnMock invocation) throws Throwable {
    //Thread.sleep(10);
    String id = (String) invocation.getArguments()[0];
    return "user " + id;
   }
  });

  Date ts = Dates.createDate(2017, Month.JANUARY, 1);
  List < String > messages = new ArrayList < String > (1000000);
  for (int i = 0; i < 1000000; ++i) {
   int usid = i % 50;
   ts = DateUtils.addMilliseconds(ts, 1);
   messages.add("userId=" + usid + "|message=Hello world|timestamp=" + Dates.format(ts, "yyyy/MM/dd HH:mm:ss") +
    "|ip=127.0.0.1|city=Brno|age=12|occupation=student");
  }

  System.out.println("Set up completed and data generated");

  //
  // wait to give user chance to connect profiler
  System.out.println("Timeout to allow attach profiler");

  try {
   for (int i = 0; i < 20; ++i) {
    Thread.sleep(1000);
    System.out.print(".");
   }
   System.out.println("");
  } catch (InterruptedException e) {
   throw new RuntimeException(e);
  }
  System.out.println("Finished waiting for profiler");

  //
  // test
  System.out.println("Started performance test");

  long startTime = System.currentTimeMillis();
  for (String msg: messages) {
   module.processMessage(msg);
  }
  long endTime = System.currentTimeMillis();

  System.out.println("Performance test finished");

  //
  // dump the result
  long duration = endTime - startTime;
  System.out.println("Num messages = " + messages.size());
  System.out.println("Duration = " + duration + "; referenceDuration = 14882");
  System.out.println("Duration / message = " + ((double) duration / messages.size()));
  System.out.println("JOB DONE!!!");
 }

}

Usually, this is just small application containing four parts:

  1. Environment setup and data generation (it is desired to exclude this from measurement).
  2. A waiting period to allow user connect profiler.
  3. Test execution.
  4. Result presentation.

As you can see I have used mockito to mock userProvider. And broadcastService is implemented inline. Both ways allow you to create a unit performance test without even having the real implementation of dependent services. For this purpose, the difference in them is that the mock version carries additional overhead. The right choice depends on the particular use case. That’s all about setup.

Starting Profiler

When you have your application ready, you can run it and attach the profiler during the prepared waiting period (to get good results, you should attach the profiler during that time). In NetBeans, it is pretty easy. Application can be run by the right click and then Run File option. Profiler is attached from top menu bar as Profile > Attach Profiler, then choose CPU and click Attach. Finally, choose your application and click OK. For illustration, please see the images below.

Analyzing Results

If everything is done everything correctly, then the console should look familiar to the following dump after the application finishes. See the profile attachment inside the waiting period (in the middle of the dots):

cd C:\projects\enterprisemath.com\articles\unitperformance; "JAVA_HOME=C:\\Program Files\\Java\\jdk1.8.0_25" cmd /c "\"\"C:\\Program Files\\NetBeans 8.0.1\\java\\maven\\bin\\mvn.bat\" -Dexec.args=\"-classpath %classpath com.enterprisemath.articles.unitperformance.MessageBroadcastModulePerformanceTestApp\" -Dexec.executable=\"C:\\Program Files\\Java\\jdk1.8.0_25\\bin\\java.exe\" -Dexec.classpathScope=test -Dmaven.ext.class.path=C:\\Users\\radek.hecl\\AppData\\Roaming\\NetBeans\\8.0.1\\maven-nblib\\netbeans-eventspy.jar org.codehaus.mojo:exec-maven-plugin:1.2.1:exec\""
Running NetBeans Compile On Save execution. Phase execution is skipped and output directories of dependency projects (with Compile on Save turned on) will be used instead of their jar artifacts.
Scanning for projects...

------------------------------------------------------------------------
Building unitperformance 1.0-SNAPSHOT
------------------------------------------------------------------------

--- exec-maven-plugin:1.2.1:exec (default-cli) @ unitperformance ---
Setting up and generating test data
Generating test data
Set up completed and data generated
Timeout to allow attach profiler
..............Profiler Agent: Waiting for connection on port 5140 (Protocol version: 15)
.Profiler Agent: Established connection with the tool
Profiler Agent: Local accelerated session
.....
Finished waiting for profiler
Started performance test
Performance test finished
Num messages = 1000000
Duration = 15530; referenceDuration = 14882
Duration / message = 0.01553
JOB DONE!!!
Profiler Agent: Connection with agent closed
Profiler Agent: Connection with agent closed
Profiler Agent: JNI OnLoad Initializing...
Profiler Agent: JNI OnLoad Initialized successfully
Profiler Agent: 250 classes cached.
Profiler Agent: 250 classes cached.
------------------------------------------------------------------------
BUILD SUCCESS
------------------------------------------------------------------------
Total time: 42.375s
Finished at: Wed May 10 00:32:11 JST 2017
Final Memory: 5M/123M
------------------------------------------------------------------------

From the console output you can read that test took roughly 15 seconds. In addition to the console output, there is a profiler result which looks similar to the following.

It is possible to drill down within the profiler result and see how much time program spend in each method. The point of this test is to see details of the processMessage method. It is very clear that majority of time is taken by the getUserName method. In this case, it is caused by calling to the mock class. For simplicity, let’s assume that it would look similar if underline implementation makes a call to the database (in such a case, SQL would need to be sent to the database and the database would need to parse it, pull data, and return the result over some protocol, which would definitely take some time). So as the resolution, let’s consider the method getUserName as a bottleneck to deal with.

Bottleneck Optimization

As you probably know, the typical way to avoid expensive queries is some form of caching. Let’s try the most primitive one: using HashMap. Here’s how the optimized processMessage method looks:

...

/**
 * Cache for users.
 */
private Map < String, String > usersCache = new HashMap < String, String > ();

...

public void processMessage(String message) {
 String[] parts = message.split("\\|");
 String userName = null;
 String txt = null;
 Date now = null;
 for (String part: parts) {
  String[] subs = part.split("=", 2);
  if (subs[0].equals("userId")) {
   if (usersCache.containsKey(subs[1])) {
    userName = usersCache.get(subs[1]);
   }
   else {
    userName = userProvider.getUserName(subs[1]);
    usersCache.put(subs[1], userName);
   }
  }
   else if (subs[0].equals("timestamp")) {
   now = Dates.parse(subs[1], "yyyy/MM/dd HH:mm:ss");
  }
   else if (subs[0].equals("message")) {
   txt = subs[1];
  }
 }
 ValidationUtils.guardNotNull(userName, "wrong input message, user is missing");
 ValidationUtils.guardNotNull(txt, "wrong input message, text is missing");
 ValidationUtils.guardNotNull(now, "wrong input message, timestamp is missing");
 broadcastService.broadcastMessage(userName, now, txt);
}

When you run the performance test program with this adjustment, then the whole run takes around 4.4 seconds instead of the original 15 seconds (on the same machine). The profiler result looks like the following:

Now the bottleneck becomes the function for parsing dates from the string. This would be the next step for optimization, if required. Before closing, let me add a few notes.

  • Using hash maps for caching is probably not what you want in most of real cases.
  • Optimization introduced new branch of code which is not covered by current unit test. Good practice is to revisit unit test and get this case properly covered.
  • Optimization generally makes code more complex and less readable. Therefore focus first only on the parts of your application which are critical, use profiler to find the real bottleneck within the units and stop optimizing when performance is good enough for your case.

Summary

This article shows one way of performance tuning at the unit level. This type of optimization has an advantage in that that anyone can do it with only a laptop and a few basic tools and everything can be setup within a minutes. Therefore, this is the great first layer of the performance tuning, which will save you a lot of time and money during the later stages.