What's wrong with coding tests and how to do them right

Posted to: DEV; Hacker News

Live coding tests are extremely common when hiring software engineers. And online coding tests with tight timeframes are just as common.

I’m far from the first person to mention how bad high-pressure coding tests are (e.g. 1, 2, 3, 4, 5, 6, 7). But much like open-plan offices, high-stress coding tests seem to stubbornly persist despite clear research findings warning against them:

Interviewers may be filtering out qualified candidates by confounding assessment of problem-solving ability with unnecessary stress. … Our study raises key questions about the validity and equity of a core procedure used for making hiring decisions across the software industry.

And it’s not just job candidate who aren’t happy. Software developer jobs may be objectively the hardest roles to fill, with hiring managers also bemoaning how poorly many experienced developers perform on coding tests.

Ideas for improving coding tests

For the last couple of years in my previous role, there was a significant focus on hiring inside the company. I had the opportunity to research and think deeply about effective ways to hire good developers, and I have some thoughts.

Reducing pressure

The most obvious flaw with coding tests is that many of them put candidates under a lot of pressure. Different people will respond differently to different sorts of pressure, but there are two things that we can be pretty sure will cause some anxiety in many people:

Testing a candidates’ performance under a particular type of pressure could be perfectly valid if that was a clear requirement for the job. I would argue that for almost all development teams, this is not a key requirements. And if it is a requirement, it probably shouldn’t be. A high-pressure working environment should be an undesirable temporary state for a software team that they should be working to resolve as quickly as possible, and hence, there’s very little value to making this a key requirement for permanent employees.

If you think about the actual process of programming, very little of it is high-stress. It’s usually a solitary activity where you have and need a lot of time to consider the shape of a problem and try out many different solutions. The sort of person who is good at this type of problem solving may very likely be the exact opposite to the sort of person who performs well under stress.

Studies show that anxiety hugely effects performance. I believe this is a key reason why many senior developers often fail coding tests, and therefore why companies find “good candidates” so hard to find.

Therefore our goal should be to try to reduce any possible source of anxiety when testing software engineers. To achieve this, I would recommend only using at-home technical tests that developers can do in their own time without a tight timeframe.

This doesn’t completely negate the use of standard coding test tools like HackerRank or DevSkiller, but since both require setting a time limit (I think), I would suggest making this time limit something extra-generous like 1 day or more, instead of the much more common hour-or-two time limit.

Targeting the right skills

The next obvious question is what sort of exercises should we be setting to find the candidates we really need?

An extremely common practice is to set challenging algorithmic problems, like the “balance a binary tree” cliche. These sorts of tests are also the most hated.

It’s incredibly rare that an engineer will need to solve complex and novel algorithmic problems in their day to day job. This means that these assignments will likely also cause a lot of anxiety, as candidates will be immediately know that this isn’t something they have much experience doing.

A tricky algorithmic brain-teaser is very unlikely to be the best way to find the engineers we really need. So what is?

Many tests will check for deep programming knowledge - for example, a Python test might test someone’s knowledge of **kwargs or the GIL. However, I would still argue that even deep testing of programming ability shouldn’t be our primary driver, especially in more senior developers.

As developers get more senior, their job tends to involve more and more things that aren’t pure coding: mentoring, code reviews, stakeholder communication, architectural discussions, culture building etc. Therefore a developer will likely be sharpest at actual programming when they’re somewhere between mid-weight and senior, and from then upwards their programming skills will take a back seat.

This isn’t to say that we shouldn’t be testing for technical knowledge. On the contrary, it means we should think very carefully about hiring for really valuable technical experience as opposed to esoterica that can be easily googled when needed.

These actually useful skills will look different for every team, but they may include:

It’s also worth considering the skills you’re testing around the actual challenge. As there are many soft skills that are important in software engineering, if we can incorporate these into the test then so much the better. For example, asking them to submit a pull request will test not only the code, but also:

Making efficient use of time

Another key reason why people hate coding tests is that they ask engineers to invest a lot of time for a job they likely won’t get anyway.

I certainly don’t have a silver bullet for this one. It’s natural that employers want to do as much vetting up-front as possible to avoid a costly bad hire, and the more of that effort they can push onto the candidate the better for them, so they can spend their limited resources on pushing more candidates through the pipeline. And it’s natural for the candidates to not want to do that work for free.

It would be great if employers paid candidates for their time, and some do, but most employers are unlikely to ever do this unless they have to. Ultimately this is just a power struggle that depends on the job market and who is more desperate.

But given that it’s such a sort point, there is a real risk that great candidates will be put off by how expensive your hiring process is. So we should do all the up-front work we possibly can to reduce the time cost on both sides for each candidate.

This means thinking deeply about the most efficient way to test the skills we really care about. We should also consider where in the hiring process we need to test these skills. It’s clearly going to be more palatable to candidates if they’re asked do a lot of work once they’re further through the hiring process and therefore more likely to actually get the job.

We should also do anything we can to reduce any set-up costs for the assignment.

Recommendations

Taking all these points into account, my preferred type of technical tests would be centred around PRs into an open-source GitHub repository. For example, I really liked the assignment I did a while back for Deepset AI.

It’s also important that the work you get them to do is something very close to the work they would actually be doing in the job.

By doing a test within an actual project on a code hosting platform, you get to test the engineer’s skills within that platform, in the version control technology, in how they communicate when submitting code. If you like you can have some discussions about the code right there in the platform.

This is about the closest you can possibly get to watching a prospective developer doing their actual job.

If you don’t want to take up too much of their time, instead of having them create a PR themselves, you could have them review an existing one. This should be pretty time efficient, but still tests a lot of their knowledge. You could even do a code review as an early stage in your hiring process, and then have them submit their own PR at a later stage in the process.

The big picture

Okay now I’ve made my recommendations, I’d like to explore the broader attitudes and philosophies behind all this a bit. If you’ve made it down this far, perhaps you have an appetite for a little bit more exploration.

How did this happen?

A big looming question over all this is: How did we get here? Why are we in this state where so many hiring processes are so chronically broken?

I believe this all stems from a sort of tech exceptionalism, an idea that the point of hiring developers is to find the true geniuses, the worthy ones, in a sort of initiation ceremony.

Today’s tech leaders were the bullied, four-eyed, scrawny geeks of the 80s and 90s. Back in the 90s and early 00s, there was a real feeling that these hackers and tech wizards were changing the world for the better by rejecting all the old precepts of the past. Mixed with a slight feeling of revenge against their cooler, tech-illiterate schoolmates, this led to this idea that software engineers were a superior breed, extra intelligent, and of course this worldview carried its own mythology.

All this landed into a western culture that was already fairly obsessed with IQ. That is, obsessed with the idea that some people were born with an innate superiority, and discovering which people were superior was a worthy goal.

To me, this is the only way in which these high-pressure algorithmic brain-teasers make sense in a hiring process - if you believe that you’re searching for superior people who will be good at brain teasers and therefore good at everything else.

The antidote

Of course this worldview isn’t at all accurate. The IQ test is not a very good predictor of most other skills. There are many many different sorts of “intelligence”, and using generalised tests as a stand-in for actually assessing these skills directly is not only ineffective, but it will introduce a lot of bias.

The best way to assess people for a job will always be to get as close as you can to letting them actually do the job. Coding tests, as with all sorts of hiring, will be most effective if they tries their best to test for what’s actually needed in the day-to-day job.

By @nottrobin