What kind of tests should I write as a software developer?

In over 16 years of my career as a software developer I wrote thousands of tests. Tests in C, C++, PHP, Python, Java, Ruby and not so much in JavaScript ;-)

As you can expect some projects were similar some were different. Small, large, complex, easy, API, libraries for customers, web sites, etc.

Also team sizes were different from a one person projects up to over one hundred people working on the code base. People located in the same place or in many places around the globe

In this time I realized my own approach to testing which I'm going to share with you.

But first I need to start with the iron triangle of software development:

Read about it here and here.

Now my philosophy is simple and driven by the triangle.

Producing each software costs money. Each software can have different qualities. First quality is the cost. Scope is another one. Time it takes to develop it is next, and quality is meeting all those qualities in some sweet spot.

This is the context you operate in and what you do should follow it - the code you create and the tests you write.

Lets start with JIRA - this is a perfect example of a good developer practise as you would say. Because JIRA has thousands of tests. On each level - it has unit tests, functional tests, integration tests, front end tests. Those tests run on different OSes, databases, LDAPs, etc. If you would run them on one machine it would probably take a week to finish (or more).

Is it justified? I would say in most of the cases it is but not all. This is a complex software, very powerful, it can be customized a lot. It has also a lot of integration points that need to be functional. It's also a software that is crucial to customers, some of them have businesses built on JIRA. So anything goes wrong and you upset customers. That's why you want to make sure that with every release you still deliver.

Also JIRA is a desktop software so anything goes wrong and you need to release a new version and ask customers to upgrade which is a pain for them. Also you cannot monitor JIRA on customer premises, you don't have metrics, logs, db access, etc.

On the other hand there's a lot of tests that are too low level, so refactoring takes time and make's developers life miserable (sometimes).

Now lets imagine you're building an internal product that operates in SaaS model. You own the infrastructure, you even "own" customers because they work in the same office. You have all the monitoring and logging and tracking tools that you want. If anything breaks you will get an instant feedback. Your pipeline is simple and you can deploy a fix in minutes or hours (if it's a complex one). Would you want to follow the same testing regiment as in JIRA?

Well, maybe yes, if it was a system operating on real time data that has a monetary value, imagine day time trading, or if there was 10 000 people that will get stuck if the system broke.

In other cases? Probably not.

What if you work in a start up, just got another round of funding and you're still ahead of "stabilising". Would you care about tests?

In that case I would say some smoke tests would be enough.

Or what if you were creating a new product that you're unsure of? Would you rather spend time writing tests or looking for customers?

I bet tests would be the last thing you would want to spend time on (except a few smoke tests).

Now you see the point. It all depends on the context and you cannot be fundamental about it. If you work in a team or with a product owner you want to discuss what kind of quality expectations there are and what you can do to deliver them (write tests or set up monitoring). You need to know and understand those constraints.

You need to be aware also that people are different and some are risk awerse (many developers) or risk aware (me for example). So it's also part of the context.

One thing for sure - always write regression tests. What I mean by that, if you released something and it was working fine and then some change broke it it means the code is fragile in this place.

In this case you need to make sure to cover the case that made the system fail to make sure that the problem you faced will not repeat. Well actually you want to decrease the risk, you cannot be 100% sure if it will not repeat.

It is for your sake, but mostly for your customers sake. It's really bad when your software breaks the same way few times in a row. That makes your customers loose trust in you. And trust is everything.

Also for all those TDD fundamentalists - TDD is dead. I don't see value in it. And I don't know many people that see the value.

For the tests you should run. I see that most of the value is in functional tests.

Unit tests are great when something is logically complex and you want to make sure that the algorithm works. But most of the time code is simple and you just want to make sure everything works fine together. So moving up and testing the system through different layers is what you want.

So my favorite are functional tests or BDD as some call them - for a web projects this means a browser emulating the user doing all the stuff he would do. I love them. They are usually easy to write (if you have a way to establish a state for the application, which you should BTW) and provide good value for your time.

They usually represent a real life scenario supported by business needs. So every product owner understands their value.

Thinking about tests there's also another aspect. It is the cost of maintenance. Every test you have means slower dev loop, more headache when you are revamping the functioning of the system. Changing they way it works not refactoring.

The other thing is what if your tests always pass? Do they bring any value? Initially you would say yes they are as they are making you feel confident. But think twice.

If you're making changes and tests always pass. Isn't that a sign that you have useless tests that don't catch anything? You feel more confident but actually you didn't make it safer. It's an illusion.

There's a great article on "why most unit testing is waste" by James O Coplien which I found writing this article. I strongly suggest you read it from top to bottom as the extension on this article. I focused mostly on business value and understanding the context. James goes into information theory and detailed evaluation on which tests are good or bad.

Despite the title James talks about all kinds of tests. We both agree that functional are the best. Something he advocates is recycling tests, I haven't tried this but I already see a point - the test base in long lived projects can grow so big it's a job itself to run it.

I'm copying his best practices here as I agree with them:

  • Keep regression tests around for up to a year — but most of those will be system-level tests rather than unit tests.
  • Keep unit tests that test key algorithms for which there is a broad, formal, independent oracle of correctness, and for which there is ascribable business value.
  • Except for the preceding case, if X has business value and you can test X with either a system test or a unit test, use a system test — context is everything.
  • Design a test with more care than you design the code.
  • Turn most unit tests into assertions.
  • Throw away tests that haven’t failed in a year.
  • Testing can’t replace good development: a high test failure rate suggests you should shorten development intervals, perhaps radically, and make sure your architecture and design regimens have teeth
  • If you find that individual functions being tested are trivial, double-check the way you incentivize developers’ performance. Rewarding coverage or other meaningless metrics can lead to rapid architecture decay.
  • Be humble about what tests can achieve. Tests don’t improve quality: developers do.

I would only extend on "turn most unit tests into assertions". James writes about it but in case you skip it - having assertions in your application is a good thing! I see developers rarely add assertions but they are really useful. Not only they protect the code, set the context right, but also are a great documentation. People used to say that unit tests are great because they document what your code does. Assertions are even better because they do it exactly in the place you look into first!

You can try out solid_assertfor that!

Monday Link Pack

Your files, my files or just files? How to design user interface

Running rails migrations automatically on Heroku

How to write a custom RSpec matcher

Your photos can tell if you are depressed!

This is an awesome bot!

How to call services ansynchonously in Java?

Imagine you are building some aggreggation service of some sort. It could be a stock exchange monitor, or a news website that needs to read data from different sources.

I guess there are couple of things you would want it to do. First you want it to deliver results as soon as possible. You also don't want it to get blocked when one of the sources is unusually slow.

I think this is a good example to show how you can write code that executing in parallel. So let's write it!

Let me start with Java first as I have a working code already. In next posts I'm going to follow up and show similar solutions in different languages.

Of course our application needs to be a REST service. For the sake of presentation services that are being called will be mocked.

You can jump right to the code or read about the most interesting part (executing and mointoring tasks):

package async.aggregate;

// imports ommitted

@Slf4j
@Service
public class AggregationService implements DisposableBean, InitializingBean {
    private static final int MAX_WAIT_TIME = 5;

    @Autowired
    private List<ServiceClient> serviceClients;

    private ThreadPoolExecutor executor;

    public ResultsDto getResults() throws InterruptedException {
        List<Future<Either<Exception, String>>> results = executor.invokeAll(serviceClients
                .stream()
                .map(serviceClient -> {
                            Callable<Either<Exception, String>> callable = () ->
                                    serviceClient.getData();
                            return callable;
                        }
                ).collect(toList()), MAX_WAIT_TIME, TimeUnit.SECONDS);

        // get only all the successful results
        Iterable<String> connectionsForEachProvider = Eithers.filterRight(results
                .stream()
                .filter(Future::isDone)
                .map(future -> {
                    try {
                        return future.get();
                    } catch (Exception e) {
                        return Either.<Exception, String>left(e);
                    }
                })
                .collect(toList()));

        return ResultsDto.builder()
                .results(copyOf(concat(connectionsForEachProvider)))
                .build();
    }

    @Override
    public void destroy() throws Exception {
        executor.shutdownNow();
    }

    @Override
    public void afterPropertiesSet() throws Exception {
        executor = new ThreadPoolExecutor(serviceClients.size(), Integer.MAX_VALUE,
                60L, TimeUnit.SECONDS,
                new SynchronousQueue<>());
        // it will immediately create threads
        executor.prestartAllCoreThreads();
    }
}

It may look unfamiliar at first but when you look closely I bet you can undestand it.

serviceClients is a list of clients we are going to use to retrieve data from remote services.

ThreadPoolExecutor is a great class that allows to spin up and manage pool of threads that can do work for us.

afterPropertiesSet is a method called when the bean gets created in Spring. So we can spin up our ThreadPoolExecutor. It will create n threads based on size of serviceClients. It will keep them running even if there's no job for them.

destroy will be called when the bean gets removed from Spring.

executor.invokeAll executes all requests and waits for results up to MAX_TIME seconds. So if the service is slow we will skip it.

List<Future<Either<Exception, String>>> is a list of future results that will be either Exception (in case the call fails) or String (the real result).

Now lets test it with:

time http http://localhost:8080

HTTP/1.1 200 OK
Content-Type: application/json;charset=UTF-8
Date: Thu, 11 Aug 2016 18:27:26 GMT
Server: Apache-Coyote/1.1
Transfer-Encoding: chunked

{
    "results": [
        "4000", 
        "1000"
    ]
}

http http://localhost:8080  0.20s user 0.05s system 4% cpu 5.281 total
HTTP/1.1 200 OK
Content-Type: application/json;charset=UTF-8
Date: Thu, 11 Aug 2016 19:05:24 GMT
Server: Apache-Coyote/1.1
Transfer-Encoding: chunked

{
    "results": []
}

http http://localhost:8080  0.20s user 0.05s system 4% cpu 5.269 total

This article is Part 1 in a 1-Part Series.

  • Part 1 - This Article

PS Grab http if you already don't have it

How to generate non predicable alphanumerical ids in Rails?

In many applications you generate ids that are visible to customers or used in links. Rails by default uses sequential integers for that which is fine most of the time. Those are integers so they are fast, short and look good ;-)

But they have one disadvantage - if they are visible externally someone can learn a lot about your business. For example if you're running a shop someone can guess how many orders you process, or number of items you offer or clients you have. Probably you want to avoid that.

You can fix that easily by switching to UUIDs, this is something Rails makes it really easy to do.

You can have nice, non guessable identifiers like those:

id: 898f73bc-290c-4427-b75a-68f34464e188, title: The Raven
id: dd126f47-de45-4cbe-aa1c-8b052693498e, title: My Side of the Mountain
id: 479af9a8-c096-42e2-8a29-4a321cdd5f7c, title: The Giver

The only downside is that they are long and ugly, computers don't care but humans do. You probably don't want to show them to user.

Is there an alternative?

Yes, there is. As I need non-predictable, human readable ids in our application I made a research and here's a few gems that I found interesting.

Hashids

The first one that draw my attentions is a project called hashids. It's a set of libraries implementing the same algorithm in many languages.

You can get a lib for Ruby, Python, Java, Swift, or whatever else you like :-)

There's hashid-rails as well for simple integration with Rails.

Simply update Gemfile with

gem 'hashid-rails', github: 'akinomaeni/hashid-rails'

Create config/initializers/hashids.rb

hashids = Hashids.new()

Hashid::Rails.configure do |config|
  config.secret = Rails.application.secrets.secret_key_base # some secret id
  config.length = 6 # length of the generated ids
end

And update your model to return hashid instead of the sequential id (here's for JSON representation):

class HashidExample < ApplicationRecord
    def as_json(options = {})
        super(options).merge(id: to_param)
    end
end

So let's see how this works:

[
    {
        "created_at": "2016-07-15T20:34:17.722Z", 
        "id": "KpzRp6", 
        "title": "test", 
        "updated_at": "2016-07-15T20:34:17.722Z"
    }, 
    {
        "created_at": "2016-07-15T20:48:15.876Z", 
        "id": "XmXMp3", 
        "title": "another", 
        "updated_at": "2016-07-15T20:48:15.876Z"
    }
]

What's really nice about the library is that you can still refer objects by their old id. So there's an easy migration path.

I have mixed feelings about one things though → hashids are not stored, so once you want to change settings (for example make them longer) you will break existing ones. So think carefully how large your database can get.

Other than that I like the gem. You can also encode multiple ids into one (in case you have complex keys and associations that you want to link to).

Uniqify

An alternative solution is to add a unique token to each model and store it in the database. There's a simple solution for that as well - uniqify.

To add it to your project, update Gemfile

gem 'uniquify', github: 'Openbay/uniquify'

Prepare a migration:

add_column :uniqify_examples, :token, :string, null: false
add_index :uniqify_examples, :token, unique: true

In your model:

class UniqifyExample < ApplicationRecord
    uniquify :token, :length => 6

    def as_json(options = {})
        super(options).reject{|k,v| k == "id"} # to hide id from JSON representation
    end
end

What nice about this library is that you can have multiple tokens in the same model (in case you want that):

uniquify :token, :another, :length => 6

You can specify length, and allowed characters. Token gets persisted so you can change to format as you go.

random_unique_id

There's another very similar gem called random_unique_id. I tested it out but didn't like it.

There are two limitations - I doesn't work out of the box with model hierarchy introduced by Rails (all models subclassing ApplicationRecord by default). You need to change your model and extend ActiveRecord::Base.

Also you can only have one unique field per model which is fine most of the time. But we are going to use multiple tokens for some models.

Or just use math (update 2016-07-19)

Kari Ikonen was so kind to mention that there is another way - mathematical one!

You can use multiplicative inverses to create obfuscated integers.

It's really cheap and easy to do, and you can read more about it at Eric Lippert's blog.

How long should be the token?

That Depends on the character set that you will use. Generally all libraries use something like 62 possibilities for each character:

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

5 chars in base 62 will give you 62^5 unique IDs = 916,132,832 (~1 billion) At 10k IDs per day you will be ok for 91k+ days

6 chars in base 62 will give you 62^6 unique IDs = 56,800,235,584 (56+ billion) At 10k IDs per day you will be ok for 5+ million days

-- from StackOverflow

Other approaches?

Instagram come up with an interesting approach that helps them generate unique ids and at the same time shard data. If you're going to be huge like them it's worth considering. I'd love to have problems like that ;-)

Final thoughts

I think uniqify and hashids are both two interesting gems you can try to use. I'm not sure yet which one we're going to choose. Will update the article once we have a decision.

I also prepared a small project you can play with. Run:

bundle install

rails s

brew install httpie

Then you can play with it:

http http://localhost:3000/hashids title=test

http http://localhost:3000/hashids

http http://localhost:3000/hashids/1

http http://localhost:3000/rids title=another

http http://localhost:3000/rids

http http://localhost:3000/uniqifies title=hurra

http http://localhost:3000/uniqifies

How to display Active Record validation errors according to Atlassian AUI?

Active record validations are a wonderful thing. Combined with Rails form helpers they make it easy to create forms for modifying models.

Unfortunately default error rendering is not compatible with Atlassian AUI. Fields get wrapped with <div class=\"field_with_errors\">#{html_tag}</div> which doesn't look good:

There's an easy way to fix it though.

Create a new file config/initializers/active_view_base_field_error_proc.rb and put this into it:

ActionView::Base.field_error_proc = Proc.new do |html_tag, instance|
  if html_tag =~ /label/
    html_tag.html_safe
  else
    (html_tag + instance.error_message.map(&:capitalize).map { |em| "<div class=\"error\">#{em}</div>" }.join('').html_safe)
  end
end

field_error_proc is called for every field in the form so you want to skip it for labels. There's also instance.object that represents model in case you want to grab some properties from it.

That looks much better!

PS

I wrote also a series on writing a BitBucket Cloud add-on.