I don’t often post controversial opinions, but I’ve recently seen a fair few people espousing the “if you don’t deploy of a Friday then your deployment process is broken” line. I’d like to offer an alternate opinion.
Let’s start by defining some terms and concepts that are relevant here. I’m not sure that it matters that anyone accepts these definitions (although they are not mine), more that they are accepted for the purpose of this post.
Deployment Versus Release
Deployment is installing software on an environment: i.e., you are changing the physical bits on the target machine.
Release is where functionality is made available to the user. The use of techniques such as feature flags and AB testing means that this can happen at a different time than the deployment.
Taken from this Octopus Deploy Article.
A repeatable mechanism for deploying software to a target machine.
I’m referring to testing generically here to include anything from unit to end-to-end and manual tests, and everything in-between.
Strange that I should define Friday, but I do so because I don’t mean Friday, I mean any time where one or more of the following are true:
Your service is (or is shortly likely to be) under unusual load or stress.
The number of people that you have that are able to deal with an issue is reduced (for example on a weekend, or at times when lots of people typically take holidays).
The consequence of an outage would be higher than at other times.
Just to elaborate on this; if you’re programming software for a nuclear reactor then number 3 is always so high that it might as well be ignored. If you’re writing software for a florist, then the week leading up to February 14th is probably a Friday. If you’re the only person writing a system, then number 2 is irrelevant, and can be ignored.
The Counter Argument
Before I make my case, let’s explore the alternative (for which I have a lot of sympathy).
If you have confidence in your tests, and your deployment pipeline, then deployment is de-risked. If we accept this then deploying software at 4:55pm on a Friday carries the same risk as deploying it at 10am on a Monday morning.
If there are issues during the deployment, then the fault is with the pipeline; and, if there are issues with the software after it is deployed or released, then there are problems with you tests.
Back to my Point
Okay - so, everything I’ve just said is correct (or at least I believe it to be).
Let’s start with some precepts (I’ll qualify each):
- Change (of any type) carries risk. Every industry knows this, and so they all have checks and balances, but only to deal with change; for example, building inspectors don’t turn up at a house that has been standing for 20 years and start and inspection: they turn up when the house is built, or if something happens to it.
- Software is complex. Not only is the software that you write complex, but the software the builds and runs the software that you write is complex.
- People are fallible. We’ve known this for some time in the software industry - that’s why we have automated deployment pipelines and tests. People get things wrong.
- Tests are fallible. Referring back to the previous two points with relation to automatic tests: tests are written by people, and people are fallible. Tests are also software, and software is complex.
Now, let’s talk about risk. Risk is a fact of life: every time you leave your house, you take a risk; in the UK, almost 1700 people were killed in road traffic accidents in 2022 - so there is a chance that you will never return when you leave your house in the morning. You, therefore, have the option to not leave the house, and you remove that risk: but (for most people) the price of this is too high, and so every day you take that risk.
BASE jumping involves jumping from high places and falling to the ground. The risk is extremely high, and very few people do it, because the risk is much higher, and not BASE jumping is unlikely to negatively affect your life.
Bringing it back, let’s talk about how that relates to software deployment and release.
Deployment and Release
It’s 4:55pm on a Friday before a bank holiday weekend. The company that you work for does a lot of trade over the weekend, especially bank holiday. You have a change ready to go - it’s been tested, it’s passed your comprehensive suite of tests, you’re confident that it works. Do you release it?
My main point in this post is that the answer should not be an unconditional yes. Like everything in software, it should depend: the following questions could be asked…
- What is the benefit to the deployment / release?
- What is the worse thing that could happen, should the deployment / release go wrong?
Some cases are obvious: you’ve got a critical bug that stops the use of your system: nothing to lose - release; you’ve got a cosmetic change: nothing to gain - wait. The challenge comes when you have a low level bug - maybe it affects one user in 100, do you release the fix?
Given that there is risk in releasing the fix, you should consider who will support the fix: do they even know what you’re doing, if something unexpected happens? Do you have an acceptable rollback plan; or, do you have the time and capacity to fix and roll forward if it doesn’t work?
Caveats & Summary
Obviously, I’m not trying to say that urgent bug fixes should wait until Monday; nor am I saying that Friday is some kind of sacred day when software can’t be released (see above). I am suggesting that software deployment and release carries risk, and that risk should be considered when deploying or releasing software.