In this case study, you will analyse historic data of three major stock indices, the Dow Jones, the DAX, and the Nikkei, and the exchange rates between the US dollar, the Euro, and the Yen. Using this data, you will address several questions.
To address these questions, you will import several data files, while tuning import parameters to match the idiosyncrasies of the data. You will merge the data files into a single data frame, and mutate the data to reflect changes in index price and exchange rate. You will analyze correlations of stock indices among themselves and to exchange rates and create illustrative plots for each of the analyses.
Below you will find several tasks that will guide you through these steps. For the most part these tasks require you to make use of what you have learned in the sessions Data I/O, Data Wrangling, and Statistics. However, they will also go beyond what you have learned, in particular when it comes to plotting. In those cases, the tasks will provide the necessary code and guidance.
File | Rows | Columns | Description |
---|---|---|---|
^DJI.csv | 8364 | 7 | Dow Jones Industrial stock history |
^GDAXI.csv | 7811 | 7 | DAX stock history |
^N225.csv | 13722 | 7 | Nikkei stock history |
euro-dollar.txt | 6545 | 1 | Euro to dollar exchange rates |
yen-dollar.txt | 8754 | 7 | Yen to dollar exchange rates |
Variables of data sets ^DJI.csv
, ^GDAXI.csv
, ^N225.csv
Variable | Description |
---|---|
Date | Current day |
Open | Day’s price when the stock exchange opened |
High | Day’s highest price |
Low | Day’s lowest price |
Close | Day’s closing price |
Adj Close | Adjusted closing price that has been amended to include any distributions and corporate actions that occurred at any time prior to the next day’s open |
Variables of data sets “euro-dollar.txt”, “yen-dollar.txt”
Variable | Description |
---|---|
Date (currently unnamed) | Current day |
Rate (currently unnamed) | Day’s exchange rate in terms of 1 US Dollar. E.g., a value of .75 means that the respective currency is worth a fraction of .75 of 1 US Dollar |
Open your baselrbootcamp
R project. It should already have the folders 1_Data
and 2_Code
. Inside of the 2_Code
folder, you should have all of the datasets listed in the Datasets section above!
Open a new R script and save it as a new file in your R
folder called financial_casestudy.R
. At the top of the script, using comments, write your name and the date. Then, load the tidyverse
package. Here’s how the top of your script should look:
In this practical, you will load three external datasets \^DJI.csv
, \^GDAXI.csv
, and \^N225.csv
. However, two of these data files are a bit messy. Specifically, \^GDAXI.csv
and \^N225.csv
use a specific character string to represent missings in the data. To identify the NA-character string in the data open one of them in a text viewer, (via RStudio or via, e.g., textedit). Do you see the string value that indicates missing data?.
Once you know what string value indicates missing data, us the read_csv()
function to load in the stock index data sets, “^DJI.csv”, “^GDAXI.csv”, and “^N225.csv”, from your 1_Data
folder. In so doing, set an explicit na
-argument to account for the fact that “^GDAXI.csv” “^N225.csv” use a specific character string to represent missings in the data.
Load in the exchange rate data sets, euro-dollar.txt
and yen-dollar.txt
, from the 1_Data
folder as two new objects called eur_usd
and yen_usd
. To do this, use the read_delim()
-function and \t
as the delim
-argument. This tells R that the data is tab-delimited.
Print the eur_usd
and yen_usd
objects. Are all the data types and variable names correct?
Date
and Rate
(and not be taken from the first row). To fix this, load the data again with the following additional arguments:col_names
-argument and assign to it a vector that contains the variable names.Date
-variable is of type date
. To achieve this, use the parse_date
function with the argument format = '%d %b %Y'
. This specifies the exact format the dates are formatted in, and overwrite the existing Date
-variable.Finally you want to make sure that the variable Date
is actually of type date
. Fix this using the parse_date()
function with format = '%d %b %Y'
. This specifies the exact format the dates are formatted in. Overwrite the existing Date
-variable.
Ok your datasets should be correct now! Get a first impression of the them using print()
, typeof()
and str()
. What type are the data objects and what variables do they contain?
Now it’s time to create a single data frame that contains the data from all five datasets including only the variables containing the dates (Date
), the stock index (unadjusted) closing prices (Close
), and the exchange rates. Do this by piping (i.e., using %>%
) together several inner_join()
-functions, joining the data sets one-by-one using the Date
variable. Note, in joining two data sets you can control the naming of overlapping variable names using the suffix
-argument of the inner_join()
-function. Observe that inner_join()
takes care of the fact that different dates are available in each of the data sets by only retaining dates for which all data sets provide data on.
At the end, rename the variables appropriately using the rename()
-function.
Create new variables containing the change in index price and exchange rate for each variable using the mutate
-function(). To compute the change, use the diff()
-function. Since diff()
will return n - 1
change values, add an NA
at the first position of the change variable à la c(NA, diff(my_variable))
.
Create a variable containing only the year of the date variable using mutate()
and year()
from the lubridate
-package (which is part of the tidyverse
).
Currently your data is in the “wide” format. Let’s convert it to the long format, in which variables occupy different rows rather than columns, using the gather()
-function from the tidyr
-package (also part of the tidyverse
). To do this use the command below to create the long_data
data frame. You may have to (or want to) change the object/variable names. The first two arguments (given we used pipes) to the gather
-function specify the names of the new variables, the third and fourth specify the names of the variable containing the dates and years with a leading hyphen. (Check out the example in the ?gather
-help file).
Print your long_data
object. How does it compare to your original data
object?
Using the following code, plot the price development of the three stock indices using the following command (using the long data object). The code uses ggplot2
, which you will be introduced to on the second weekend of the course. For the code to work, you may have to adjust the object and variable names to match the ones in your data frame. Inspecting the plot, do you see a significant drop anywhere?
Calculate the overall stock index price change per year. To do this, ise group_by()
, summarise()
, and sum()
on the stock index change variables. Remember there were NA
’s in two of the stock index price variables. When was the biggest drop in stock index prices?
Now that you know when the biggest drop occured, let’s find out which stock index suffered the biggest loss? To do this, compare the overall losses to the index price on the first trading day of that year. To do this, first identify the first date available for that year and then filter()
the data based on the respective character string to retrieve that day’s closing prices. Then divide the overall stock price change by that years first closing price. Which stock index suffered the greatest relative loss?
One driver behind the results observed for the last two tasks is that modern financial markets are closely intertwined, to the extent that a change in one market can bring about a change in the markets. Evaluate this aspect of financial markets by correlating the stock index change variables among each other using cor()
. Note that cor()
can take a data frame as an argument to produce the full correlation matrix among all variables. Also, don’t forget about the NA
s - there is an argument for cor()
to deal with them. How closely are the stock indices related and which ones are most closely related?
Evaluate the stability of the relationships between financial markets by analyzing the correlations for each year. Note that here have to specify each pairwise correlation separately in order to summarise()
the correlation for each year (rather than computing the entire correlation matrix).
Another important index of the financial markets is the exchange rate to other currencies. Generally, it is assumed that a strong economy translates in both a strong stock index and a strong currency relative to other currencies. First, evaluate whether changes in exchange rates correlate with each other the same way that stock indices do.
Now evaluate whether exchange rates vary as a function of the difference between stock indices. For example, does a large difference between Dow Jones and DAX translate into a strong Dollar relative to the EURO? Based on the above intuition this should be the case. However, economic theory has also produced an alternative hypothesis. That is, foreign investors who benefit from a rise in stock index price may be motivated to sell their holdings and exchange them for their own currency to maintain a neutral position. This would, in effect, depreciate the currency at the same time as the stock index is outperforming, thus producing a negative relationship. What do you think? Ask the data.
Now, evaluate the stability of the above relationship for each year separately? Can you make out a stable pattern? No? I guess it depends.