FTSE100 Historical Consituents​​

An ongoing project I am working on is to be able to have a time series database containing recent and historical members of the FTSE100 index. This can then be used to backtest strategies using the FTSE100 universe to choose stocks from, also research has been made into the price performance preceding stocks that are added to or deleted from an index.

As part of this project, I needed to find historical members however numerous google searches later and I could not find a source for this data. Luckily for me on ftse.com  there is a pdf available for download containing historical additions and deletions. Turns out that R has some quite powerful tools for handling extracting data from PDF documents. With the following code, we are able to retrieve a list of additions/deletions of the FTSE100.

require(data.table)
require(stringr)

url <- "https://www.ftse.com/products/downloads/FTSE_100_Constituent_history.pdf"

dfl <- pdf_text(url)
dfl <- dfl[2:(length(dfl) - 1)]

# Getting rid of the last line in every page
dfl <- gsub("\nFTSE Russell \\| FTSE 100 – Historic Additions and Deletions, November 2018[ ]+?\\d{1,2} of 12\n", "", dfl)

# Splitting not just by \n, but by \n that goes right before a date (positive lookahead)
dfl <- str_split(dfl, pattern = "(\n)(?=\\d{2}-\\w{3}-\\d{2})")

# For each page...
dfl <- lapply(dfl, function(df) {
# Split vectors into 4 columns
  df <- str_split_fixed(df, "(\n)*[ ]{2,}", 4)
 # Replace any remaining 
  df <- gsub("(\n)*[ ]{2,}", " ", df)
  colnames(df) <- c("Date", "Added", "Deleted", "Notes")
  df[df == ""] <- NA
  data.frame(df[-1, ])
})

df <- do.call("rbind",dfl)

If the code has run correctly you should be left with a data table that resembles something like this;

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: