## Quantitative Modeling for Algorithmic Traders – Primer

**Quantitative Modeling** techniques enable traders to mathematically identify, what makes data “tick” – no pun intended 🙂 .

They rely heavily on the following core attributes of any sample data under study:

**Expectation**– The mean or average value of the sample**Variance**– The observed spread of the sample**Standard Deviation**– The observed deviation from the sample’s mean**Covariance**– The linear association of two data samples**Correlation**– Solves the dimensionality problem in Covariance

## Why a dedicated primer on Quantitative Modeling?

Understanding how to use the five core attributes listed above in practice, will enable you to:

- Construct diversified DARWIN portfolios using Darwinex’ proprietary
**Analytical Toolkit**. - Conduct
**mean-variance analysis**for validating your DARWIN portfolio’s composition. - Build a solid foundation for implementing more sophisticated quantitative modeling techniques.
- Potentially improve the
**robustness**of trading strategies deployed across multiple assets.

Hence, a post dedicated to defining these core attributes, with practical examples in R (statistical computing language) should hopefully serve as good reference material to accompany existing and future posts.

### Why R?

- It facilitates the analysis of large price datasets in short periods of time.
- Calculations that would otherwise require multiple lines of code in other languages, can be done much faster as R has a mature base of libraries for many quantitative finance applications.
- It’s free to download here.

** Sample data ( EUR/USD and GBP/USD End-of-Day Adjusted Close Price) used in this post was obtained from Yahoo, where it is freely available to the public.*

### Before progressing any further, we need to download EUR/USD and GBP/USD sample data from Yahoo Finance (time period: January 01 to March 31, 2017)

In R, this can be achieved with the following code:

`library(quantmod)`

`getSymbols("EUR=X",src="yahoo",from="2017-01-01", to="2017-03-31")`

`getSymbols("GBP=X",src="yahoo",from="2017-01-01", to="2017-03-31")`

**Note:** “EUR=X” and “GBP=X” provided by Yahoo are in terms of US Dollars, i.e. the data represents USD/EUR and USD/GBP respectively. Hence, we will need to convert base currencies first.

To achieve this, we will first extract the **Adjusted Close Price** from each dataset, convert base currency and merge both into a new data frame for use later:

`eurAdj = unclass(`EUR=X`$`EUR=X.Adjusted`)`

`# Convert to EUR/USD`

`eurAdj = 1/eurAdj `

`gbpAdj <- unclass(`GBP=X`$`GBP=X.Adjusted`)`

`# Convert to GBP/USD`

`gbpAdj <- 1/gbpAdj`

`# Extract EUR dates for plotting later.`

`eurDates = index(`EUR=X`) `

`# Create merged data frame.`

`eurgbp_merged <- data.frame(eurAdj,gbpAdj)`

Finally, we merge the prices and dates to form one single dataframe, for use in the remainder of this post:

`eurgbp_merged = data.frame(eurDates, eurgbp_merged)`

`colnames(eurgbp_merged) = c("Dates", "EURUSD", "GBPUSD")`

### The **mean ***μ* of a price series is its average value.

*μ*of a price series

It is calculated by adding all elements of the series, then dividing this sum by the total number of elements in the series.

Mathematically, the mean *μ* of a price series *P*, where elements *p ∈ P*, with *n* number of elements in *P*, is expressed as:

**\(μ = E(p) = \frac{1}{n} ∑ (p_1 + p_2 + p_3 + … + p_n)\)**

In R, the **mean of a sample** can be calculated using the **mean()** function.

For example, to calculate the mean price observed in our sample of EUR/USD data, ranging from January 01 to March 31, 2017, we execute the following code to arrive at mean 1.065407:

`mean(eurgbp_merged$EURUSD)`

`[1] 1.065407`

Using the **plotly** library in R, here’s the mean overlayed graphically on this EUR/USD sample:

`library(plotly)`

`plot_ly(name="EUR/USD Price", x = eurgbp_merged$Dates, y = as.numeric(eurgbp_merged$EURUSD), type="scatter", mode="lines") %>%`

`add_trace(name="EUR/USD Mean", y=(as.numeric(mean(eurgbp_merged$EURUSD))), mode="lines")`

### The **variance** **σ²** of a price series is simply the mean or expectation, of the square of (how much price deviates from the mean).

**σ²**

It characterises the range of movement around the mean, or “spread” of the price series.

Mathematically, the **variance σ²** of a price series P, with elements *p ∈ P*, and mean *μ*, is expressed as:

**\(σ²(p) = E[(p – μ)²]\)**

**Standard Deviation** is simply the square root of variance, expressed as **σ**:

**\(σ = \sqrt{σ²(p)} = \sqrt{E[(p – μ)²]}\)**

In R, the **standard deviation of a sample** can be calculated using the **sd()** function.

For example, to calculate the standard deviation observed in our sample of EUR/USD data, ranging from January 01 to March 31, 2017, we execute the following code to arrive at s.d. 0.00996836:

`sd(eurgbp_merged$EURUSD)`

`[1] 0.00996836`

Using the **plotly** library in R again, we can overlay a single (or more)** positive and negative standard deviation** from the mean, as follows:

`plot_ly(name="EUR/USD Price", x = eurgbp_merged$Dates, y = as.numeric(eurgbp_merged$EURUSD), type="scatter", mode="lines") %>%`

`add_trace(name="+1 S.D.", y=(as.numeric(mean(eurgbp_merged$EURUSD))+sd(eurgbp_merged$EURUSD)), mode="lines", line=list(dash="dot")) %>%`

`add_trace(name="-1 S.D.", y=(as.numeric(mean(eurgbp_merged$EURUSD))-sd(eurgbp_merged$EURUSD)), mode="lines", line=list(dash="dot")) %>%`

`add_trace(name="EUR/USD Mean", y=(as.numeric(mean(eurgbp_merged$EURUSD))), mode="lines")`

### The sample **covariance** of two price series, in this case EUR/USD and GBP/USD, each with its respective sample mean, describes their linear association, i.e. how they move together in time.

Let’s denote EUR/USD by variable ‘*e’* and GBP/USD by variable ‘*g*‘.

These price series will then have respective sample means of \(\overline{e}\) and \(\overline{g}\) respectively.

Mathematically, their **sample covariance, Cov(e, g)**, where both have *n* number of data points \((e_i, g_i)\), can be expressed as:

**\(Cov(e,g) = \frac{1}{n-1}\sum_{i=1}^{n}(e_i – \overline{e})(g_i – \overline{g})\)**

In R, **sample covariance** can be calculated easily using the **cov()** function.

Before we calculate covariance, let’s first use the **plotly** library to draw a scatter plot of EUR/USD and GBP/USD.

To visualize linear association, we will also perform a** linear regression** on the two price series, followed by drawing this as a line of best fit on the scatter plot.

This can be achieved in R using the following code:

`# Perform linear regression on EUR/USD and GBP/USD`

`fit <- lm(EURUSD ~ GBPUSD, data=eurgbp_merged)`

`# Draw scatter plot with line of best fit`

`plot_ly(name="Scatter Plot", data=eurgbp_merged, y=~EURUSD, x=~GBPUSD, type="scatter", mode="markers") %>%`

`add_trace(name="Linear Regression", data=eurgbp_merged, x=~GBPUSD, y=fitted(fit), mode="lines")`

Based on this plot, EUR/USD and GBP/USD have a positive linear association.

To **calculate the sample covariance of EUR/USD and GBP/USD** between January 01 and March 31, 2017, we execute the following code to arrive at covariance 7.629787e-05:

`cov(eurgbp_merged$EURUSD, eurgbp_merged$GBPUSD)`

`[1] 7.629787e-05`

**Problem:** Being *dimensional* in nature, calculating just Covariance makes it difficult to compare price series with significantly different variances.

**Solution:** Calculate **Correlation**, which is *Covariance normalized by the standard deviations of each price series, *hence making it *dimensionless* and a more interpretable ratio of linear association between two price series.

Mathematically, Correlation ρ(e,g) of EUR/USD and GBP/USD, where \(σ_e\) and \(σ_g\) are their respective standard deviations, can be expressed as:

**\(ρ(e,g) = \frac{Cov(e,g)}{σ_e σ_g} = \frac{\frac{1}{n-1}\sum_{i=1}^{n}(e_i – \overline{e})(g_i – \overline{g})}{σ_e σ_g}\)**

- Correlation = +1 indicates EXACT positive association.
- Correlation = -1 indicates EXACT negative association.
- Correlation = 0 indicates NO linear association.

In R, **correlation** can be calculated easily using the **cor()** function.

For example, to calculate the correlation between EUR/USD and GBP/USD, from January 01 to March 31, 2017, we execute the following code to arrive at 0.5169411:

`cor(eurgbp_merged$EURUSD, eurgbp_merged$GBPUSD)`

`[1] 0.5169411`

0.5169411 implies reasonable positive correlation between EUR/USD and GBP/USD, which is what we visualized earlier with our scatter plot and line of best fit.

In future blog posts, we will examine how to construct diversified DARWIN Portfolios using the information above in practice.

Trade safe,

The Darwinex Team

—

**Additional Resource: Learn more about DARWIN Portfolio Risk (VIDEO)
**

** please activate CC mode to view subtitles.*

*Do you have what it takes? –* *Join the Darwinex Trader Movement!*