Welcome, Guest

TOPIC: POTM-06 (December): Colorful Multi-Bar Charts

POTM-06 (December): Colorful Multi-Bar Charts 7 months 3 days ago #53

Kari_Pohl
Offline
Moderator
Posts: 9
Thank you received: 2
Karma: 2

This Plot of the Month will show you how to make a grouped and stacked bar chart using 11 years of precipitation data from the CBNERR site at Jug Bay, Maryland.

But first, let’s talk about getting the data ready for our plots.

Part of my work involves looking at how many days each year had a certain amount of precipitation fall. For example, how many days in 2005 did more than 20 mm of liquid precipitation fall?

We can use SWMPr and a simple loop to get our data from 15 minute millimeter data to days per year.

#Step 1: Load in your data
library(SWMPr)
mypath <- 'C:/Users/Kari/Documents/RData/OPC'
#Import Meteorological Data previously downloaded from Jug Bay 
mydataMET <- import_local(mypath, 'CBMJBMET', trace = T)
#Step 2: Aggregate total precipitation into daily sum
Daily_PRCP<-aggreswmp(mydataMET, 'days', function(x) sum(x, na.rm = T), 
params = 'totprcp')

For our plots, we will look at 3 types of operationally defined rainy days.
Wet Days are when more than 10 mm of liquid precipitation fell.
Very Wet Days is more than 20 mm.
Extremely Wet Days is more than 30 mm.

The code below is a very simple loop and filter.

#Step 3: Define our rainy days
#creates a new empty vector for our new data
Wet_Days = NULL
Very_Wet_Days = NULL
Extremely_Wet_Days = NULL
#format date
DATE<-as.Date(Daily_PRCP$datetimestamp, format="%Y-%m-%d %H:%M:%S", tz="UTC")
tmpyear <- as.numeric(format((DATE),"%Y"))
#begin loop
#note: change the iyear range to suit your time series 
i=1
for (iyear in 2004:2014 ) {
#sets start of each year
Wet_Days[i] <- length(Daily_PRCP$totprcp[tmpyear==iyear & Daily_PRCP$totprcp > 10])
Very_Wet_Days[i] <- length(Daily_PRCP$totprcp[tmpyear==iyear & Daily_PRCP$totprcp > 20])
Extremely_Wet_Days[i] <- length(Daily_PRCP$totprcp[tmpyear==iyear & Daily_PRCP$totprcp > 30])
i = i+1
}

You can use this script to define your own thresholds of daily precipitation or another variable. Maybe you only care about huge 100 mm rain events. Or maybe you only care about water temperatures above an organism's thermal tolerance. Endless possibilities!

#Step 4: Finally the Plot(s)
#First create a matrix
#we want it to have 3 columns, each with 11 rows (11 years of data)
Rainy_Days <-matrix(c(Wet_Days, Very_Wet_Days,Extremely_Wet_Days),
   nrow=11,  ncol=3)
#creates labels for your x-axis
Years<-c(2004:2014)
#Grouped Multi-bar chart
barplot(t(Rainy_Days), #this transposes the matrix
main="Rainy Days", ylab= "Days", #label you title and y-axis
beside=TRUE, col=heat.colors(3), #pick your colors!
names.arg=(Years)) #label your x-axis
legend("topright", c("10mm","20mm","30mm"), 
cex=1.5, bty="n", fill=heat.colors(3))
#Gray Scale Version (Cheaper for Printing!)
#In R, 0 is black and 1 is white, so fractions are different grays
barplot(t(Rainy_Days), main="Rainy Days", ylab= "Days",
   beside=TRUE, col=gray.colors(3, start = 0.2, end = 0.8),
names.arg=(Years))
legend("topright", c("10mm","20mm","30mm"), cex=1.5,
   bty="n", fill=gray.colors(3, start = 0.2, end = 0.8))
#Stacked Bar Chart
barplot((Rainy_Days), #no longer a transposed matrix
main="Rainy Days", ylab= "Days", 
names.arg=c("10mm", "20mm", "30mm"), #labels each stacked histogram
   col=topo.colors(11), #the color palette of 11 repeating colors
ylim=c(0,325)) #sets bounds of y-axis
legend("topright", c("2004","2005","2006","2007","2008","2009",
"2010","2011","2012","2013","2014"), #your legend
   bty="n", fill=topo.colors(11),horiz=F, ncol=3,cex=1.25)

I picked these color palettes as examples since I find them pretty, but also to avoid using red/green color combinations, which make data interpretation difficult for anyone with colorblindness.

Lastly, as a technical note, those pesky NA’s can affect our analysis. In this example, all NA’s will be interpreted as days without rainfall, which could bias the results. For my work, I only use years with <10% of days missing (standard for this analysis).

The code below will allow you to check how many days each year are missing and whether you should consider throwing out a year!

#Step 5: A note about NAs
#This time when we aggregate data, do not pass NA's (na.rm=F)
Daily_PRCP<-aggreswmp(mydataMET, 'days', function(x) sum(x, na.rm = F), 
params = 'totprcp')
#Are there any NA's. Spoiler, yes!
sum(is.na(Daily_PRCP))
#where are these NAs?
 Daily_PRCP$datetimestamp[is.na(Daily_PRCP$totprcp)]
#format our date
DATE<-as.Date(Daily_PRCP$datetimestamp, format="%Y-%m-%d %H:%M:%S", tz="UTC")
tmpyear <- as.numeric(format((DATE),"%Y"))
#loop through each year to determine what percentage of the year is missing
#to make it easy, I used 325.25 as a way to "avoid" leap years
#create a new empty vector for our missing days
Missing_Days = NULL
#begin loop 
i=1
for (iyear in 2004:2014 ) {
#sets start of each year
Missing_Days[i] <- sum(is.na(Daily_PRCP)[tmpyear==iyear])/365.25
i = i+1
}
print(Missing_Days)
Missing<-zoo((Missing_Days*100),(c(2004:2014)))
> Missing
      2004       2005       2006       2007       2008       2009       2010 
25.4620123 14.2368241  2.4640657  7.6659822  1.9164956  0.0000000  0.2737851 
      2011       2012       2013       2014 
 4.1067762  0.0000000  0.0000000  0.2737851

It looks like 2004 and 2005 had some missing data, which could affect our analysis! I would either throw these years out or fill them by extrapolation.

This may not be the best or most efficient way to look at precipitation exceedances but practice makes perfect! These bar charts can be adapted to your own liking!

Hope it helps!
-Kari

The administrator has disabled public write access.

POTM-06 (December): Colorful Multi-Bar Charts 7 months 2 days ago #54

Marcus Beck
Offline
Administrator
Posts: 32
Thank you received: 5
Karma: 4

Very cool, thanks for sharing Kari! As always, excellent use of SWMPr to import and organize the data. This is a perfect example of how the package bridges the gap from CDMO to analysis.

I couldn't help but think of an existing R function that might be helpful. The 'cut' function can categorize a continuous variable, similar to how you've separated the cumulative daily rainfall data. The advantage is that it lets you get around use of the loop. Looping can be slow for very large datasets so I like to avoid them as a general rule of thumb.

Here's an alternative approach using cut with a sample dataset from SWMPr.

library(SWMPr)
library(ggplot2)
# preprocess data as before
met <- qaqc(apaebmet)
met <- aggreswmp(met, by = 'days', function(x) sum(x, na.rm = T), 
    params = 'totprcp')
# create new variable w/ cut
met$rain <- cut(met$totprcp,
  breaks = c(10, 20, 30, Inf), 
  labels = c('Wet', 'Very Wet', 'Extremely Wet')
  )
met <- na.omit(met) # remove 'dry' days 
# create a year variable
met$year <- strftime(met$datetimestamp, '%Y')

It's very easy to create the barplots with ggplot.

# rainfall by year
ggplot(met, aes(x = year, fill = rain)) + 
  geom_histogram(position = 'dodge')
# years by rainfall
ggplot(met, aes(x = rain, fill = year)) + 
  geom_histogram(position = 'stack')

So, there's more than one way to skin a cat. R is nice because there's never a single way to get to the same end. It's more important to pick a method that makes sense and works for your needs. I just wanted to share this approach as an alternative.

-Marcus

The administrator has disabled public write access.

POTM-06 (December): Colorful Multi-Bar Charts 7 months 2 days ago #55

Kari_Pohl Offline Moderator Posts: 9 Thank you received: 2 Karma: 2	Thank you Marcus, This is great! Thanks for the code alternatives. I knew there had to be an easier way to avoid a loop and cut the data. The ggplot package is next on my list to explore! The fun part of R is that you can learn something new every day! -Kari
	The administrator has disabled public write access. The following user(s) said Thank You: Marcus Beck

POTM-06 (December): Colorful Multi-Bar Charts 6 months 4 weeks ago #56

Kim_Cressman Offline Moderator Posts: 19 Karma: 2	Thank you for this, Kari! And Marcus for the alternative info! One of my colleagues recently used SWMP data to see how many days out of a year salinity and DO were below oysters' tolerances. This seems like a good way to look at that sort of thing over multiple years - so like you said, endless possibilities!
	The administrator has disabled public write access.

Forum

The Plot of the Month

POTM: Single Variable Exploration

POTM-06 (December): Colorful Multi-Bar Charts

SWMPrats.net

TOPIC: POTM-06 (December): Colorful Multi-Bar Charts

POTM-06 (December): Colorful Multi-Bar Charts 7 months 3 days ago #53

POTM-06 (December): Colorful Multi-Bar Charts 7 months 2 days ago #54

POTM-06 (December): Colorful Multi-Bar Charts 7 months 2 days ago #55

POTM-06 (December): Colorful Multi-Bar Charts 6 months 4 weeks ago #56