- Adam Piskorek
Why Do We Need Static Code Analysis
How many times we have heared the word "automation"? Probably so many times that it became natural to us. The process of removing the "human factor" (or to fire a collegue - speaking in plain english) is getting faster. Now the ultimate goal is to make code write itself and there are three reasons for it:
- It sounds fun and we humans feel incredibly accomplished when we control the machine or even make it pretend to think for us.
- It is cheaper - thoughts are extremly difficult to obtain (education takes a long time)
- Final and probably the most relevant factor - it is not natural to write, let alone to write code. In order to write, one needs to know how to read and although that skill has gotten more popular over the years:
still, it's not a skill aquired by overwhelming majority (like e.g. gossiping).
Jokes aside, machine language is even less natural to write with than human language syntax.
Therefore we need machines to help us.
What is Static Code Analysis
Just to get a general grasp of the cocnept: imagine a recipe for goulash. Let's say it consist of 50 phrases. You glimpse at the page and check the syntax, make sure the ingredients have correct measurement units, the sentences are understandable and the text has some paragraphs, title etc. You don't actually perform the tasks, you don't prepare the goulash and you don't know how it tastes.
Same thing happens with static code analysis - it does not check whether the code will run or not. It just gives hints regarding syntax, good practices, potential errors and other things. It tells you how to structure, write and correct the code so it has fewer errors and it more readable. Static Analysis is unquestionably required in software engineering and is very helpful in software testing.
Here, we focus on two particular software that I find almost free and super helpful when combined together. There are of course other tools e.g. linters of all kinds but that's for another time.
How Can Static Analysis Help You
Simple. The goal is to write a good code. It does just that. How?
let's start with SonarQube
It's a software that is available in Community Edition (free of charge with some limitations) that can be used by anybody. It is especially useful in less experiences teams, in solopreneurs projects and in fresh, bigger projects. It helps by identyfying the errors and can be used to maintain the levels of quality.
SonarQube uses the idea of gates which are percentage or numeric levels of all that could go wrong. Those gates can be a hint for a manager or a blocker for jenkins (e.g. during deploy). That's why the stage of implementing the tool is important - the sooner the better. The bigger the project is when implementing, the more cumbersome it is to learn it and fit into the development lifecycle.
It analyses the code and divides the findings into couple of categories:
- Bugs - finds code that can potentially cause bugs. For example: a field that is nullable or potentially failing request that is not caught in an exception. Bugs count should be as low as possible, especially if you are working in a team. SQ (SonarQube) often finds things, that are just a matter of well written code that should be required for daily pull requests.
Vulnerabilities - Security holes. Imagine that apart of developing features, they tell you to be a great white-hat guy. Impossible. SonarQube uses database of all vulnerabilities and raises a red light in such a case.
Hotspots - potentially dangerous code that should be reviewed.
Code Smells - baddly written code. It's a great place to find code that is just too complicated and does not follow DRY principle.
Coverage - the percentage of unit tests coverage.
Duplications - it is just a reminder that we should not copy/paste tests and other chunks of code. It's the least important measurement cause algorithms of assessing it can be often times wrong e.g. when checking translating, locale jsons or other data.
What I really love SonarQube for is it's teaching potential. Any one-person code project is tremendously difficult to create and maintain. Mostly due to lack of peer reviews (here I wrote more on that) and companionship. Static Analysis can help with the first one by providing guidelines, solutions and hints on fixing the problems. It's an actual gold mine for just about anybody that wants to learn during a solo-coding adventure.
Often times the complexity of this tool is undermined or overlooked. SQ is in fact huge. In addition to those above we get support for various languages, setting up gates, issues, history graph, tons of knowledge and access to SDLC (creating and assigning tasks to fix the problems found during scans).
That can be of couse a disadvantage. Someone who is just starting his adventure might have difficulties setting it all up but again - I encourage you to implement it, because the advantages are overwhelming.
It might be setup just to measure the quality and help developers increase their skills. It can be both a learning tool and tool that helps with quality of your software.
The community version (free) offers free scans of code and various CI integrations (GitLab, GitHub, jenkins, azure, you name it). You can of course test your code on local machine with docker but you might as well use cheap hosting and fire up a docker online to offer the analysis for the team. Implementing some reporting might also be beneficial and although PDF reports are available on Enterprise tier, you can use plugins to have it with Community one.
This very blog is being tested by SonarQube. I am hosting a SQ instance on Vultr. They give 250 dollars to new users and have very cheap machines starting from 2.5 $/month. That's 100 months of code scan (!).
Disadvantages of using SonarQube
- It might be difficult at the beginning. The tool itself is quite old and the experience of using it might not be top knotch.
- It is cumbersome and difficult to set up. Try running it locally first and then using premade scripts for actions and CI tools e.g. GitHub actions.
- It might be not useful due to it's findings and size. Usually great senior programmer knows most of what it does and other findings can be found by a simple linter. The gates can slow down the development and are different and arbitrary to setup in each case.
It's "just" an AI addon to GitHub. My recent finding and I like it quite a lot. It works only on pull requests so I am using it as an addition to SonarQube (which can scan on any step and gather the findings in its database).
It writes comments on pull requests, adds sum ups that are very well written and consice:
It can also reply to your comments:
Even taking into account that - as a mindless algorithm - it will cause false positives, it's a very helpful tool. Again - super helpful for solo developers, solopreneurs or those who do not do peer reviewing in general. It's a free tool that you might try out without a hassle. Just a simple .yml file in your GitHub workflow.
Performance of CodeRabbit
It writes comments, ok, we get it but are they any good?
That's a good question. It is based on ChatGPT so in order to check the quality of advices, you might want to read more about AI code interpreters.
I was testing it extensively on my other project and I have a couple of findings. You can use different Open AI models to get the replies (version 3.5 and 4) and the tool itself costs you only the price of API itself (they have other tiers too).
I paid 5 bucks to get an access (otherwise you get an error while setting it up) and a pull request was costing me on average 10cents with version 3.5. A bigger PR with version 4 has costed me over 1.10$ (!).
I think that ChatGPT 3.5 might be better than ChatGPT 4 in many ways regarding the code analysis aspect.
The latter, v.4 was generating redundant comments:
was wrong many times in different places and unnecessarily expensive. I am using ChatGPT 3.5 turbo and I am happy with it, mostly because it offers interesting hints. Super useful for people who are not code wizards.
The disadvantage would be to tell you about simple errors but again - That should have been caught using other tools, so it's probably developer's fault:
For the most part, during reviewing the tool in a couple of pull reviews, I was using ChatGPT 3.5 turbo and it was offering me some very nice hints.
Disadvantages of using CodeRabbit
- It costs money (although arguable very little).
- It can generate false positives.
- It can be wrong - AI can make redundant comments and confuse instead of helping.
- It runs only on GitHub.
The setup I find very useful and that can cheaply and effectively increase the code quality is having SonarQube scans on every branch and every pull request. It has mostly an informative value. Optionally, very rigid gates can be set to stop bugs and security hostspots. If the time is pressing, those can always be reviewed in the SQ itself and marked as "to resolve". This allows a build to pass and unblocks the developer in case of time pressure.
In addition, setting up CodeRabbit can be a great addon, especially for younger developers, as it suggests changes on a deeper cognitive level. It is not a good tool for beginners as general knowledge of code it required in order to omit the false positives.
We have a tool that uses an AI API which can give an explanation or answer via the comments. Without a need to waste time on stack overflow (which was probably the source for the AI models anyway).
If you have not used or heard about those tools before, you should give them a try if you value code quality!