June POTM: Using FFTs for time series analysis

1 year 3 months ago - 1 year 3 months ago #132
In Marcus Beck's Plot of the Month (POTM) for April, 2016 (
https://fawda123.github.io/swmp_rats/POTM/April_POTM.
, he did a great job showing how tides are composed of various spectral components, i.e., sine and cosine waves. In this month's POTM, we'll start to look at how you can examine a time series to find out the frequencies at which your signals vary the most.

To begin, we'll need some data. For this introduction, let's make our own out of some sine and cosine waves. I'm doing this for simplicity and so we can check the answers. I'm also going to keep this very straightforward and try to avoid obscuring what is happening by using some of the more R-specific features like data frames. That may make the code a little clunky, perhaps, for R-purists, but hopefully make the process clear for those who may use other languages more frequently.

Remember, the general form for a sine wave is y(t) = Amplitude * sin (2*pi*Frequency(t)). For the artifical data below, I've made a couple of semidiurnal tidal components, a diurnal component, a seasonal component, a constant bias and an overall trend. There is also an error term, which will add random noise into the signal.
```# Time Series Spectral Decomposition Demonstration
#
#  Define parameters to create a time series:
numYears = 4.0  # number of years for the time series
T_Start = 0.    #  Start time:
T_End = T_Start + numYears*365 # Converts into days, which will be our basic unit of time (ignoring leap year, I know)
#  Time step = sampling interval.  Let's assume 60 minute data initially, converted in fraction of day:
delta_T = 60. / (60 * 24) # 60 minute data, converted into fraction of a day: min/(minutes/hour * hours/day)
#delta_T = 15. / (60 * 24) # for 15 minute data
#  Create an array of times for each data point:
Time <- seq(T_Start,T_End,by=delta_T)
#  CREATE OUR FREQUENCY ARRAYS
Step <- seq(from=0, to=length(Time)-1,by=1)  # Number of steps
f <- Step/(T_End-T_Start)                    # Frequency in per days
fYearly <- Step*365./(T_End-T_Start)         # Frequency in per days
# Any consistent offset for our artificial data:
bias = 6.5
#And any linear trend for the data:
slope = 10./(T_End-T_Start)  # End 10 units higher than began
# Create the time series from adding different frequency sine and/or cosine waves:
var_M2      <- 1.2*sin(2*pi*Time*1.9)           # Frequency of just under twice per day
var_SNK2    <- 0.55*sin(2*pi*Time*2.0)          # Frequency of exactly twice per day
var_KOP1    <- 1.2 * sin(2*pi*Time*1.025)       # Frequency of just over per day
var_Seasonal <- 2.0 * cos(2*pi*Time/182.5)      # Frequency of about twice per year
var_Annual  <- 4.0* (-1*cos(2*pi*Time/365))     #  Freq. of 1/year. Starts negative, goes positive
var_slope   <- (Time * slope) # From 0 to max slope
var_err     <- 0.0 * runif(length(Time),min=-1.0,max = 1.0) # No error at first
# Now plot the various components to see what we are adding together.
par(mai=c(0.7,0.6,0.15,0.15),mfrow=c(3,1),mgp=c(2.2,.8,0))
plot(Time[1:50],var_M2[1:50],'l',xlab="Time (Days)",ylab='Daily Signals')
lines(Time[1:50],var_SNK2[1:50],'l',col='red')
lines(Time[1:50],var_KOP1[1:50],'l',col="green")
lines(Time[1:50],var_err[1:50],'l',col='blue')
plot(Time,var_Annual,'l',xlab="Time (Days)",ylab="Longer & Trend/2")
lines(Time,var_Seasonal,'l',col="blue")
lines(Time,var_slope/2,'l',col='red')```
Now we have the various components that will make up our test variable var. Note how we made arrays for the frequency in terms of both per days and per years. That is helpful because when we derive the components of the spectra directly using a Fourier transform method, there is no inherent frequency information produced by the transformation. Instead, the results are will be in arrays that are related to units of 1 over the sample number. The f calculations above take those somewhat arbitrary units and translate them into more physically useful units. If you are confused, don't worry, it will hopefully become clear(er).

Now we can add the components together and take a look at the sum. We'll go through the steps I'd use for a new, totally unknown time series. Plotting it out to look at it is the very first. Let's also use R's capabilities to calculate and plot the mean value as well.
```var <- var_M2 +
var_SNK2 +
var_KOP1 +
var_Seasonal +
var_Annual +
var_slope +
bias +
var_err
#The James J. O'Brien trained way (as best I can remember):
plot(Time,var,'l',col='black',xlab="Time (Days)",ylab="Total Signal")
# BECAUSE YOU ALWAYS LOOK AT THE DATA FIRST!!!
ts_avg = mean(var)
abline(h = ts_avg, col='blue')  # Add a line for the mean```

Uh oh. There is clearly a trend (of course, we added it, but we are pretending this is new data.) To do a meaningful Fourier decomposition, we need to have stationary data, meaning that the mean and variance of the data are more or less the same across the time series. It looks like the variance of the data is pretty consistent, but any overall trend indicates that the mean is changing. The easiest way to correct for this is just to do a linear regression and subtract out the trendline and any offset, as well. This may seem odd for a cyclical data set, but it will work. We are trying to remove the overall trend. We'll find the cyclical "trends" soon.
```# Hmm, clearly a trend, so lets find and plot the linear portion:
regr_var <- lm(var ~ Time)
regr_var
abline(regr_var,'l',col='red')
# Now, calculate that trend and remove it:
T_trend = predict.lm(regr_var)
#par(mai=c(0.9,0.9,0.2,0.2),mfrow=c(1,1),mgp=c(2.,0.75,0))
#plot(Time,T_trend,xlim=c(T_Start,T_End)) # Plot this to check that it is correct
#abline(regr_var,'l',col='red',xlim=c(T_Start,T_End)) #compare to slope & intercept
# And it looks correct, so remove it:
detrend_var = var - T_trend
par(mai=c(0.9,0.9,0.2,0.2),mfrow=c(2,1),mgp=c(2.,0.75,0))
plot(Time,detrend_var,'l',xlim=c(T_Start,T_End),ylab="Detrended")
plot(Time[1:length(f)/50],detrend_var[1:length(f)/50],'l',xlim=c(T_Start,T_End)/50,
xlab="Time (Days)",ylab="Detrended")```

Check that out! The top line is the total data set, but now there is no overall trend. There are clearly four big signals and some other stuff going on, but the variability seems pretty constant overall. Looking more closely at the first 30 days of the time series, you can see the spring-neap tidal dynamics are coming through even with this very simple model.

Alright, are you ready for the magic? Let's see if we can find out what frequencies are driving this signal. This is the time to do it and it only takes 3 lines of code:
```# Now the data are STATIONARY, i.e., the MEAN and variability are ~constant thru time.
# So lets find what frequencies are dominating the signal!
#FOURIER TRANSFORM WORK
Y <- fft(detrend_var)
mag <- sqrt(Re(Y)^2+Im(Y)^2)*2/length(detrend_var)
phase <- atan(Im(Y)/Re(Y))```

Hmm, OK, so what did we just do? We first created a new variable array, Y, which is the Fast-Fourier Transformation (FFT) of our data set. Y is a complex variable, where its magnitude, mag gives the relative power in the spectral domain of our periodic time series var/i]. We scale that by 2 over the length of the time series and that puts mag into the same units and scale that our initial data were in. Therefore, since this is artificial data, the high values in Y, should be very close to what we know their amplitudes should be. They should also correspond to the frequencies we gave them. The phase of Y is given by the arctangent of the ratio of the imaginary to real components. We won't be worrying about it any more than to say that.

OK, so what did those three lines of code get us? Let's take a look:
```#Plot the frequency spectra results
plot(f[1:length(f)/10],mag[1:length(f)/10],'l',
xlab="Freq., day^-1",ylab="Magnitude")
abline(h=1,col='red')
abline(h=2,col='red')
abline(h=4,col='red')
plot(fYearly[1:length(f)/500],mag[1:length(f)/500],'l',
xlab="Freq., Year^-1",ylab="Magnitude")
abline(h=2,col='red')
abline(h=4,col='red')```

Notice that their are four spikes in the upper frequency plot, which is plotted against frequency in per day. One is almost at zero, one near 1.0, and two more near 2.0. Recall our diurnal signal had a frequency of just over once daily (1.025/day actually), and our diurnal frequencies were at exactly two per day and a bit less at 1.95 per day. Remember, these are the frequencies of our signals, not the periods, that we are talking about! That can be confusing. Also note that the signals magnitudes are about what we would expect from our initial data.

If you look at the bottom graph, which is plotting only a small fraction (1/500th) of the length of the time series against frequency in units of per year, we see that the initial spike is really two distinct spikes, one at 1/year and another at 2/year. Low and behold! We've recovered the initial periodic functions out of the combined signal and removed an overall trend and a bias in the process.

Note that in this example, I didn't add in any random error. Please do by changing the multiplier in the line:
`var_err     <- 0.0 * runif(length(Time),min=-1.0,max = 1.0) # No error at first`
Just change that 0.0 to something bigger. Note that the error will be scaled to between plus and minus whatever you make that factor. Go for it! I think you will be astounded at how much error (noise in the real world, or other processes) you can have in the system and still recover the underlying periodic signals.

This is just an introduction to what can be done. I'll try for a July POTM to show this with some real SWMP data.
Attachments:
Last edit: 1 year 3 months ago by Dave Eslinger. Reason: fixing it up

1 year 3 months ago #133
Awesome follow-up Dave! I like this approach because it lets the data tell you the main signals. The harmonic regression method I showed in my last post looked for signals with an a priori period, so less flexible.

Also, just wanted to show how this can be reconstructed from a tidal perspective using the oce package. If we create the time series as a 'sealevel' object from the oce package, we can create a plot of the time series that includes a spectral decomposition similar to what Dave showed. The frequency is in cycles per hour and you'll see that the three spikes are identical to those that Dave showed in the second to last plot above, i.e., the M2, S2, and K1 spikes (with some margin of error).
```library(oce)
# convert Time to a date/time object
timestamp <- as.POSIXct(Time * 60 * 60 * 24, origin = '2021-01-01 6:0', tz = 'America/Regina')
# create a sealevel object from the oce package
datsl <- as.sealevel(elevation = detrend_var, time = timestamp)
# plot the tidal time series
plot(datsl)```