Skip to main content

Connections for the STEM Classroom

GVSU faculty and area experts provide engaging ideas on current topics in research and education

Data, Data Everywhere: R You Ready?

By John Gabrosek and David Zeitler, GVSU, Dept. of Statistics

As computing power has increased over the last 15 years the ability to collect, store, and transmit massive data sets has led to an explosion of Big Data.  Parallel developments in statistical software packages have made analysis of these large data sets feasible.  Until 15 years ago the main players in the statistics software market were SPSS, SAS, and Minitab.  Each of these software packages (and many others) have their uses; we certainly would never denigrate any of them.  However, one stumbling block for use of these software packages is price.  Costs of a fully-loaded version of any of these can run into the thousands or tens of thousands for a business.  Fortunately, a solution is available.

The open source software R has progressed to the point that it is a viable alternative to a pricey statistics software package for many users.  R feels more like a programming language than do many of the other software packages available.  If you date yourself to the late 1970s as we do, R feels a bit like Fortran 77.  You can write code in R to create objects, do statistical analysis (including numerical and graphical summaries), generate reports, and do just about anything any other programming language or statistical software can do.  Of course, there is an investment in time to learn R coding, but, heck, programming is fun!  Once you go down the rabbit-hole of R programming, you may never return. 

Two features of R that are particularly appealing is that R is open source and free!  By being open source, the statistics and data science communities have quickly become invested in R.  New programs to do particular tasks are being written every day.  The R community collects programs that do a particular task (such as analyze time to event data for medical clinical trials of new drugs) in what are called packages.  Some packages make using R easier for first time users. Examples are the mosaic package that provides simplified commands for first time statistics users and Rcmdr which provides a menu driven environment similar to those provided with SPSS, Minitab, or SAS. These packages are free for download, also.  The experienced R programmer leverages many packages to do the specific tasks needed. 

Okay, this is a lot to chew on. Another great thing about R is you’re never alone in the journey. Check out the West Michigan R Users Group where R users from complete newbies to long time users get together to discuss R.  If you want to explore on your own, you can check out the R Development Team website where you can find how to download your own copy of R, or you can follow online blogs like R-bloggers or the Microsoft R folks.  You will also want to check out the RStudio website. Along with their very user-friendly interface for R, they have a lot of great learning aides, cheat sheets and webinars to help out.  Finally, if you really want to start programming in R, you might consider taking a Coursera course or enrolling in Grand Valley’s STA 418/518 Statistical Computing and Graphics with R.