By Florian Pichlmueller (University of Auckland) and Christina Straub (ESR)
GPL, MIT, BSD, CC – no they’re not a secret vocabulary, these are in fact acronyms for open source licenses that allow code to be freely used, modified and shared.
And there are many more – take a look at the open source initiative to see a whole new world. Believe it or not there’s even a code acronym called WTFPL which stands for “Do What The F*** You Want To Public License.”
Software licenses are a topic that receives little attention by researchers when it comes to making our own code public.
But with the ever-increasing amounts of computational analysis of ‘omics data and the aim of data reproducibility and code sharing in the scientific publishing world, an appropriate open-source license should be included with your published code to allow proper reuse and adaptation of your code, as well as getting credit for your work.
Although our comments are aimed at the scientific community, the same holds true for anyone uploading a piece of code – no matter if it is an exciting school project, cool little game or clever script to automatically download your favorite free podcasts or radio streams.
Let us start with a truism – every good bioinformatics project starts with an exciting idea, planning of experiments, data collection and subsequent analysis following best practice principles to allow reproducibility of results.
The time comes for publication and suddenly we are confronted with making our code public (ignoring the horror of having to tidy up and comment your code – a nod towards best practices in data science research).
But what are the implications of uploading our code into the ether?
What exactly does it mean to “license” a piece of code or work? The term typically means any transaction in which the creator and or owner of the intellectual property (IP) grants another party the right to use such IP, typically in exchange for some form of acknowledgement, such as source recognition in context with open source software in science (lets skip monetizing work this time).
If you do decide to not attach a license to your code, it does not make your code free to use for anybody. By default, any work is under exclusive copyright by the creator. This means that nobody else can copy, distribute, or modify your work without in theory being at risk of liability for copyright infringement. Hence, this goes against the principles of reproducibility in science.
Furthermore, by publishing your code in a public repository (e.g., GitHub, Gitlab or Bitbucket) you accept their terms of service by allowing others to view and copy (i.e., fork) your repository, so this needs to be considered as well before deciding to skip the licensing step. This is why we should invest a little extra time and get familiar on how to govern the use or redistribution of code by choosing a license.
Some more about open-source licences
Let’s look at the two most common and popular open-source licenses used for bioinformatics projects in life sciences. As mentioned, there are plenty to choose from, for example this GitHub documents page list the 34 most common licenses.
GNU General Public License v3.0 (GNU GPLv3)
“Permissions of this strong copyleft license are conditioned on making available complete source code of licensed works and modifications, which include larger works using a licensed work, under the same license. Copyright and license notices must be preserved. Contributors provide an express grant of patent rights.”
A copyleft license (i.e., share-alike) means that if you distribute any work that is based on this original code, you have to include the original code under this license and the copyright and license must stay the same as used in the original work. This prevents others from using your open-source code and monetize it.
MIT License
“This is a short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code. “
Probably one of the shortest licences and it is essentially saying “do whatever you want with this, but do not hold me legally accountable for my code”. That means private and commercial use, as well as distribution and modification is allowed, but all without any liability or warranty. The only condition is to have a licence and copyright notice attached. The main argument for MIT over GNU-GPL is that it is a permissive license which might encourage more people to use it. There is no legal restriction and it is free to use.
So, next time you start a new project on your favorite code sharing platform attach a README with a short description and pick the most suitable license for you project.
Happy coding!
*Disclaimer, this commentary was written by biologists working in the field of computational genomics, so this is by no means a comprehensive overview about intellectual property (IP) and software licenses. The focus here was on free and open source licenses, ignoring proprietary licensing.
Here are some further resources if you are interested:
https://choosealicense.com/
https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/licensing-a-repository
https://opensource.guide/legal/#which-open-source-license-is-appropriate-for-my-project
Find out more about Bioinformatics Capability here.