Welcome, Guest
Username: Password: Remember me

TOPIC: POTM-05 (October): Overplot with SWMPr

POTM-05 (October): Overplot with SWMPr 8 months 3 weeks ago #32

  • Marcus Beck
  • Offline
  • Administrator
  • Posts: 32
  • Thank you received: 5
  • Karma: 4
Good morning! October's 'Plot of the Month' is a follow-up to the June post created by Kim Cressman that described plotting multiple variables on the same y-axis. I have mixed feelings about these 'overplots' (see the end of this post) but the fact remains that this is a highly requested plotting feature that wasn't available in SWMPr, until now. This post will describe the new 'overplot' function that was included in a recent release of SWMPr. You'll need to download the latest version to use the function.
## install and load SWMPr
The first task is to look at the help file for 'overplot' to see the required data and arguments.
## look at the help file
You should see something like this:

I usually skip to the examples section at the bottom of the help file when I'm using a function for the first time. The examples should almost always include a 'minimum working example' with code and data that shows how to use the function. In this case, the examples for 'overplot' create two plots using a sample dataset included with SWMPr.
## import data
dat <- qaqc(apacpwq)
## first plot

You'll notice that the function automatically plotted temperature and conductivity for the entire time series. The colors and line types were also chosen automatically. A common feature of functions in SWMPr (and most R functions) is that they are designed to work with minimal input from the user. You'll see from the help file that there are 15 arguments for 'overplot' that can be changed by the user. Can you imagine having to input a value for each one of the arguments to create the plot? Many of the arguments have default values that can be changed by the user as needed. For example, the two most common arguments you would want to change are the parameters and date ranges to plot. Maybe we want to see how dissolved oxygen and turbidity change by tidal cycles in a one-week period.
overplot(dat, select = c('depth', 'do_mgl', 'turb'),
 subset = c('2021-07-01 0:0', '2021-07-08 0:0'), 
 cols = c('red', 'green', 'blue'))

The true value of this function is that all the hard work to create the actual plot is taken care of in the background. I'm sure you can ask Kim how long it took to create the code for her post. In fact, most of the code used in the 'overplot' function was adapted from her examples.

Another useful feature of this function is that it works with arbitrary datasets, in addition to SWMP data. A useful feature of the R language is the ability to create 'methods' for different data objects. The help file will always show which types of data objects can be used with a function. For example, the 'overplot' help file shows an 'S3 method for class swmpr' and a 'default S3 method'. This simply means that the function works with 'swmpr' data objects (which are created automatically if you use any of the import functions in SWMPr) or any other generic data structure as a default. The default method usually works best with a 'data.frame' since this is the most common data structure in R. This example shows how you might use the 'overplot' function with an arbitrary dataset. We'll use some fake data for the example but you can see how this is comparable to data you might import from a csv or text file.
## some fake data
## dates and number of observations
rng <- as.POSIXct(c('2021-09-01 0:0', '2021-09-30 0:0'))
dates <- seq(rng[1], rng[2], by = 'hour')
nobs <- length(dates)
## variables to plot, random but correlated through time
x1 <- diffinv(rnorm(nobs - 1))
x2 <- x1 + rnorm(nobs, 0, 3)
dat <- data.frame(dates, x1, x2)  
## plot
overplot(dat, 'dates', select = c('x1', 'x2'))

The fake data is a data.frame with a date column and two additional columns for the x1 and x2 variables. The x1 variable was generated from a random normal distribution and summed to simulate a serially-correlated error structure for time series data. The x2 variable is the same as x1 but with an additional error component to simulate additional noise. The example shows how 'overplot' can be used with the arbitrary dataset. Both the name of the date column and which variables to plot have to be included for the default method. Additional arguments are often required for default methods so R knows how to handle the data from a non-specific data class (e.g., not a swmpr object).

Finally, I mentioned at the beginning that the use of multiple y-axes can be problematic to visualize trends in data. I'm kind of on the fence with the issue but to illustrate the problem, have a look at the response to the question here. A more quantitative justification for not using these plots is described here. The general consensus is that comparing variables on an arbitrary scale can lead to inaccurate conclusions and plots can be easily manipulated to imply causation. That being said, I don't think they are completely inappropriate but be cautious in using them to understand covariance between variables.

Until next time...

Last Edit: 8 months 3 weeks ago by Todd.OBrien. Reason: This is POTM-05. POTM-04 was last month.
The administrator has disabled public write access.
Time to create page: 0.080 seconds
Powered by Kunena Forum