the main character: rio package
the ancillary character: a cleaver setup
the data step
the coding step
the reporting step
where your report will take the needed data from? from data location, of course. and this complete our workflow.
But now let’s have a look to a real example:
A real example: linking togheter R and sas
We start our analysis munging some heavy data within sas environment. We read our data from a txt/csv file into sas, creating two sas7bdat tables which we need to merge togheter in order to associate dimensional data to some kind of customer list. so we run our two or three data step and we end up having one complete table. This table, in was language will be referenced as data.complete_data. We now want to apply some cool and trending statistical models on this data, just to show we are real data scientists, where should we do this if not in R? So here we go, creating a new R script and loading our data in. How do we do it? Leveraging rio! we will have just to run the following line of code, leaving all the hard job to the package:
complete_data <- import('data/complete_data.sas7bdat')
this line of code will create a data.frame object ready to be analysed with our favourite language. and now it’s time to go for the last step: reporting analyses results. this is actually an easy step, since we could also merge it with the r script one: once we can read our sas-munged data into R why not readying them directly within the Rmarkdown file? Anyway, we can handle the last step in different complementary ways:
- producing all final results within a separate R script and saving them to an Rdata object to be loaded within the Rmarkdown file
- producing pieces of results in sas and R joining them only within the Rmarkdown file through the import() function.
- performing the whole analyses from munged data within the Rmarkdown file.
the trade-off you have to consider when choosing within this continuum is between weight and flexibility: loading final results into rmarkdown will may results in lighter objects then raw datasets but will also generally reduce the possibility to let the user interact with your data, for instance with interactive components like the ones from shiny framework. as is often the case, the experience will teach you how to move from one side to the other.
Ok, so everything seems finalised, doesn’t it? not quite, since we have a bonus track: the workfloweR experiment.
bonus track: the workfloweR experiment
since I often use more than one programming language when performing my job and I am a lazy programmer, I decided to automize a bit all the workflow described within previous paragraph, that is where workfloweR comes from.
what it does is asking you ( through a simple shiny app) to define a path where you would like to initialize the analysis workspace and select languages you are going to use for your analyses. Once the selection is done and the usual initialize button is pressed, workfloweR takes care of creating within submitted path all needed data folders and languages scripts. It morover adds proper references to data folder within different scripts, for instance employing the libname statement to create a sas library within the sas script. finally, worflower creates an rmarkdown report ready to be employed to share results of your wonderful work.
how to try workflower
an updated version of workflower is currently hosted on Github an you can try it following this steps:
clone or download the repository from https://github.com/AndreaCirilloAC/workfloweR
specify where to initialize your analysis workspace
specify the languages you are going to use for your analyses
initialize the workspace
just open up your pre-initialised R or sas scripts and start performing you analyses. when you will be done with your work an Rmarkdown file will be waiting for you within the deliverable folder.
may I ask for your help?
if you enjoy the idea of workflower, would you mind give me an help to further expand it? as mentioned you can find the project on github, feel free to clone it and work on it. I can see two main developement paths:
- move the all stuff into an R package, which will make its use really easy directly from Github via devtools
- add more languages beside R and sas.
Languages integration is a powerfull way to let best feature of each one shine, that’s why I thought it was a good idea to share this pragmating and simple approach to exchange data an sum up results from analyses performed with R and other languages. Let me know if you find it useful or if you have got any other similar trick.
about the featured image: the tower of Babel, by Pieter Bruegel the Elder, c. 1563. the rational? we are trying to join togher different languages, aren’t we?