Science is benefitting from data and code sharing: join the tribe!
By Valerio Giuffrida, 17 June 2020
Most research projects recently conducted are interdisciplinary and this is actively encouraged by funding bodies. For my PhD work, I had a great opportunity to be part of a project spanning several scientific areas, computer science, plant biology, and a bit of engineering. In my view, newly enrolled PhD students and early-career researchers should have the same splendid opportunities as I had. What are the steps to facilitate these sorts of interactions? One way is to find suitable partnerships that allow a brilliant researcher to access new challenges. But, is there another way? I believe so. It doesn’t matter who you speak to, research projects are based on the analysis of some sort of data. Even though you are a plant biologist, at a certain point you had to learn certain data science skills to analyse the data you have obtained from your experiments.
Then, what would you do with these skills? Probably you would assess whether the results prove (or disprove) your original hypothesis and collect your evidence in a paper to be submitted to a high quality peer-reviewed journal. At this point, a lot of researchers would move on to the next challenge. But, what happens to the vast amounts of data that you have generated in your research? Most of the time, they are kept locked in a drawer, stored on an external drive. Is this the fate they deserve? You have probably spent hours and hours to extract, organise, analyse (without exaggerating) the data you have given your blood, sweat and tears too!
In computer science, researchers are collecting, annotating, and releasing plenty of different kinds of datasets, to allow other researchers to explore, play, and propose new methodologies to tackle a particular challenge. Data sharing is important for scientific progress and I believe that everyone should make an effort to be more open to this. Nowadays, there are open-access repositories, such as Zenodo, where you can share your data with the scientific community. You might be asking now, what are the benefits? I will answer this question with three main points:
(i) academic benefit: you get citations (and I don’t need to explain why this is important to you)
(ii) professional benefit: your CV will look enriched when you are applying to jobs or funding opportunities
(iii) personal benefit: I personally feel rewarded when someone downloads something (not only data, but also software) from my website - I think “someone is using my stuff for their project - wow!”.
Software development is the other point I would like to discuss, to which the same argument applies. Does the code that you have developed deserve to stay in the same drawer? There are many software, tools, and applications in the plant community, but most of the time authors do not release the code. Some people think that releasing the code publicly with an open-source licence agreement will somehow hinder future possibilities. For example, one of the biggest fears I have heard of is that code sharing will prohibit the potential commercialisation of your software. I completely disagree with that and a good answer can be found here. You own the piece of work and you can basically do anything you like, including making a profit in some way. This article discusses several possibilities on how to make a profit from open source software. The second biggest fear is that someone might steal your idea/hard work and make a profit instead of you. It is important here to realise that your efforts cannot be “stolen”, as you own all the intellectual property when you release your code under an open-source licence. If you question again the benefit of sharing your code, please read again the benefits I have outlined above.
Finally, don’t be fearful. Join the tribe of scientists sharing their data and code - this will make a real contribution to a wider audience, shaping young researcher’s careers and setting for them a path towards interdisciplinary projects. I suggest that you visit the OpenAIRE website to get an overview of open access repositories currently available for researchers.
Lecturer in Data Science
Edinburgh Napier University.