Before venturing further into methodology and the results of the replication, it is worthwhile to explore the question of why open source is worth investigating. Many questions persist across the industry of how to safely use open source languages within a clinical submission. Some of these questions demonstrate a misunderstanding of the open source ecosystem, while others are truly difficult to answer. Many groups and initiatives across the industry have set out to address these problems. Regardless, it is worth the discussion to explore why these problems are worth solving in the first place.
Open source has numerous benefits. The public nature of the code welcomes review and feedback from the community to discover and resolves problems. Availability of the software is not tied to the success of any specific corporation. Users can modify or add to software to solve unsolved problems and contribute it back to the community.
This last point is particularly relevant. The R programming language is quite powerful. The base language itself is full of analytical tools to solve many different challenges – but there is much more to R than the base language alone. The massive supply of community contributed packages introduces more capabilities into the R programming language every day. By contributing a package to CRAN, you tell the R community “I have solved this problem so you that you do not have to.”
Before CDISC, data standards existed on a company by company basis. This placed the burden of establishing these standards on the company, and on the reviewing agency to interpret the data. With the establishment of CDISC, data standards were unified. The industry could then speak one language – not only with the reviewing agency, but across companies. Onboarding new team members into a company no longer required learning company specific data standards. Data standards were unified – CDISC solved the problem so we do not have to. The steps in preparing data and analysis results for a submission have quite a bit of redundancy. For this reason, many companies have macros to aid in making the process more efficient. Companies largely treat their code as proprietary. The problem is that in many cases, the same problems are being solved across organizations. By embracing the open source community, the pharmaceutical industry stands to gain a unified approach to programming in the same way that CDISC gave us a unified approach to data standards. If others solve the problem, then why do we have to?