Opinions on Good Software Engineering Practices in Academia and Industry
Also a rant on hiring practices
Recently a colleague commented that software engineering and high performance scientific computing are incompatible. He claimed that maximising the performance of a numerical computation requires sacrificing core principles of a good software architecture, such as portability and extensibility. Hence, according to my colleague, all software used and produced in a scientific context will always remain a pile of hacks glued together by incomprehensible spaghetti code. My colleague cited Lapack and John Burkhardts page as examples. Other people seem to share similar opinions, even though they cite different reasons.
I do have an opposing point of view. In my opinion good software engineering practices are essential for efficient scientific computations. Modern research projects are getting more and more complex. This complexity raises the expectations and requirements on the quality and performance of the software that does the computations and simulations. Academic software is no better or worse than software produced in industry. You will find good and bad projects in both camps. Also, both camps work hard to improve the current situation. Many universities require their students to take at least some programming courses if they want to graduate in a science related topic and clean code is a concept that many managers are also aware of by now.
The underlying issue is rather that certain prejudices and stigma persist and it’s hard to get rid of them.
I’ve spent almost ten years in academia and worked on a handful of software packages. Some of them were one man projects, others were team projects that have been maintained for years. Among all these projects there were a few that would probably qualify as a tangled mess of spaghetti code. Quite often because it was a proof of concept code for a research project that was still ongoing. The reason was rarely, as falsely claimed by my colleague, an aggressive optimization of performance at the cost of readability. I’ve also seen research teams that had a really nice CI/CD setup with a rather thorough test coverage that would run on every pushed commit. One of these teams developed a finite element solver which was considered to be the one of the fastest and most accurate solvers available at that time. Their project would probably qualify as a paragon of a good software engineering setup.
Software engineering is a subject that is taught at universities and people working at computer science departments (or any research lab with a close connection to computer science) usually know how to program well enough to produce a decent code quality.
There is simply no reason why software written at a university should be of lower quality than the one produced by a startup in Silicon Valley.
John Burkhardts page is a treasure trove of code examples that are very valuable for educational purposes.
I would consider many of his code snippets as proof of concepts and nothing that you should use in a productive environment.
I admit that documentation is sometimes sparse and that there’s no apparent version control or other modern tooling that helps you maintain a high code quality.
Yet, the source files that you can download from John Burkhardts page webpage are neatly organized and well readable!
Citing it as a example for bad coding practices is, in my opinion, a bit unfair.
Lapack is hosted on github, pretty well documented and comes with an extensive number of tests.
Sure, the function dnrm2
inside the Lapack Repository might look daunting, but that’s because avoiding underflows and overflows is a tricky task in numerical computations. A naive implementation of the euclidean norm would perform significantly worse.
And actually, I find the code well-structured and quite readable. (I guess beauty is in the eye of the beholder applies here.)
Since I’ve left academia I’ve been employed for several companies in the automotive industry. Here I’ve encountered similar situations as at the former universities I stayed at. I’ve worked on well engineered code bases with state-of-the-art tools as well as software packages where tools like SonarQube or Wily would probably recommend you to just give up and start from scratch. The issues that I encountered range from non-existent documentation and/or version control over to really bad coding practices such as an abundant use of global variables.
I think the reasons why such bad projects exist are the same in academia and industry. You can also find several arguments here and here
Common reasons I saw why a software package is in a dire state include:
- Insufficient budget: in academia you have to make ends meet with the money (and time) from your research grant. Quite often this budget does not provide the necessary funding for a long term maintenance of your software. If you manage to focus your research on a topic such that you can reuse your software, then it gets much easier to ensure a good code quality. You do have the same issue in industry. Project leaders usually get a fixed budget and limited timeframe to complete a project. Sometimes the given estimates are unrealistic and people end up in task forces where crunch time becomes part of your daily routine. There will be a point where code quality doesn’t matter any more as long as everything compiles and a minimum number of tests seem to pass.
- Lack of experience in the development team: this happens everywhere. It just seems to be that academia cannot hide this issue that well. A chemistry student who works alone on the code for his Master’s’ thesis won’t produce high quality code if he doesn’t get any support. The same is true in industry. If you outsource your software development to an underpaid team in a low-cost country, then you won’t get any outstanding software quality.
- Lack of clear requirements or vision: research software often starts as a proof of concept that is iterated. It’s rather unlikely that you know upfront which numerical approach will work best or what experiments a reviewer would like to have done. This implies that there will be lots code lines where you are not quite sure if you still need them. Will you design your stopping criteria as a fixed number of iterations, or should you rather consider the error decrease? Maybe you need both because some reviewer would like to have the error decrease plotted in function of the number of iterates. This can have an impact on your software design. If you need to account for all possible circumstances, your command line tool is likely to have a messy interface. The same happens in industry. I’ve worked on projects where the focus went from a product that was meant to be a small library targeted at embedded systems to a full SaaS solution running on off-the-shelf computer hardware. These focus shifts would have required a complete restructuring of the underlying software architecture. Due to budget and time constraints this has not been possible. Hence, we’ve delivered a solution which probably has a rather quirky design for the task at hand.
As you can see, none of these reasons is specific to either side. These are issues that software developers on both sides regularly encounter. There seems however to be a certain opinion bias in industry about the software engineering quality of academic research projects. I’m not sure if this bias also exists in the other direction. But it is an issue that I’ve encountered personally on several occasions when I transitioned from academia to industry. Quite often all researchers are lumped together and people from the human resources department will classify your software engineering skills as insufficient (or non-existent) for the job you are applying to just because you have only worked for universities so far. It doesn’t matter that you have a public Gitlab or GitHub profile with all your projects or that you may have worked with Jenkins and Docker at your research lab to ensure a good code quality. Also, it does not seem to make a difference whether you are a graduate student or whether you are an associate professor. You come from a university, and therefore you have no clue how scrum works (and that it is such a complex workflow that you will need months to understand it). Many companies simply won’t believe you, that you’ve got the necessary knowledge to do the job and will offer you a position as a junior developer regardless of your qualifications or past experience.
I think many companies need to rethink their point of view on this topic. The market for good software developers is small and there seems to be a constant shortage. Yet some companies dismiss perfectly fine candidates on unjustified premises. This is frustrating for both parties. Everybody should get a fair chance to prove his knowledge, regardless of who his past employer was.
Notes
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License