screenshot-www.56n.dk 2015-07-30 01-12-50

Mapping the past and the future with Leaflet

I have been working on mapping things for a while and I must say that I really like the Leaflet package from Rstudio. It makes it very easy and straight forward to make leaflet maps.

A while back I stumbled upon an interactive graphic from The Times, that used census data to compare each US state with the country as a whole to then show if this state was comparable to the US in the past, present or future (go to the link if this sounds confusing).

So I decided I wanted to do the same with danish data. Statistics Denmark has three relevant data sets for creating such a map:

  1. FRDK115: Population projections 2015 for the country by ancestry, sex and age
  2. FOLK2: Population 1. January by sex, age, ancestry, country of origin and citizenship
  3. FOLK1: Population at the first day of the quarter by municipality, sex, age, marital status, ancestry, country of origin and citizenship

FRDK115 show us the future of Denmark, FOLK2 shows us the past and FOLK1 show us the present.

In this data I can see age and ancestry, so I am able to make two kinds of maps: one that shows the present mean age of each danish municipality and comparing them with the country as a whole from 1980 – 2050, and one map that shows the present ancestry distribution in each danish municipality and comparing them with the country as a whole as well.

I colored the map to show what year each municipality was indicative of where yellow indicates the past and red the future. Each municipality got a little pop-up box that when you click it explains why its colored the way it is.

So two patterns emerge:

  1. Urban Denmark looks like Denmark in the past when it comes to age
  2. Urban Denmark looks like Denmark in the future when it comes to ancestry

If we want to see what kind of challenges Denmark will see in the future regarding an aging population we can look to some of the deep red municipalities in the outskits of Denmark on the age-map, whereas if we want to see what kind of challenges Denmark will see in the future regarding a more diverse ancestry we can look to some of the deep red municipalities in and around the capital area on the ancestry-map.

Here is an example:

screenshot-www.56n.dk 2015-07-30 01-12-50

You can see the full interactive leaflet maps here:

The Past and Future of Age in Denmark

The Past and Future of Ancestry in Denmark

 

Now go make your very own cool maps! There is a guide on R-bloggers that gives a good introduction to the Leaflet package.

Bogstavsplaceringer

Where do letters occur in words

A while back I encountered an interesting graphic showing where letters were located in english words (http://www.prooffreader.com/2014/05/graphing-distribution-of-english.html). The other day I decided to do a similar one for letters in danish words and for this I used R.

I downloaded all abstracts from the danish Wikipedia and made my own version as you can see here:

Bogstavsplaceringer

Here is how you can do it:

# First you need to load in some text

library(rvest)

# I’ll grab an article from FiveThirtyEight.com as a show case.
# I did my analysis on all the danish abstracts from Wikipedia (took a while!)
# When you do your final analysis you’ll want as much text as possible too.

# We grab the html data
html_data <- html(“http://fivethirtyeight.com/features/how-to-read-the-mind-of-a-supreme-court-justice/”)

# We extract some text
textfile <- html_data %>% html_nodes(“p”) %>% html_text(trim=TRUE)

# We collapse it in to a single string
textfile <- paste(textfile, collapse= ” “)

# Then we need to do a little string manipulation

library(stringr)

# We set all text to lower case
textfile <- str_to_lower(textfile)

# We remove all punctuation and all digits
textfile <- str_replace_all(textfile, “[[:punct:]]|[[:digit:]]”, “”)

# Then we split the string into individual words
words <- unique(unlist(str_split(textfile, ” “)))

# And we count the letters in each word
word_length <- unlist(lapply(words, function(x) nchar(x)))

# And we split each word in to its individual letters
split_words <- str_split(words, “”)

# Then we create a loop to find the position of each letter in each word
# If you have national letters like we do in Denmark you icnlude them like this: for(i in c(letters, “æ”, “ø”, “å”))

for(i in letters){ # We loop through all the letters

# Create empty list to hold data later
letter_place.list <- c()

# We find the position of each letter in the words (that we split apart)
letter_data <- lapply(split_words, function(x) which(x == i))

# A nested loop calculates the relative position of the letter in each word
for(y in 1:length(word_length)){

# We find the relative position
letter_place <- unlist(lapply(letter_data[y], function(x) x/word_length[y]))

# We add that position to a lit of positions
letter_place.list <- c(letter_place.list, letter_place)
}

# We create a new list to hold all the data and we then add the results from the loop
if(!exists(“letter_place.data”)) letter_place.data <- list(letter_place.list) else letter_place.data <- append(letter_place.data , list(letter_place.list))

# We make sure to name each list properly
names(letter_place.data)[length(letter_place.data)] <- i

}

# Now we have a nested list with the data we need, but first we’ll convert it to a long form data frame

# We create an empty data frame to hold the data
letter_place.data.df <- data.frame()

# Then we create a loop to put the data from each letter list into the data frame
for(z in 1:length(letter_place.data)){ # We loop through each nested list

tryCatch({ # I add the tryCatch so the loop doesn’t break if there is an error (can occur with if a letter is missing)

# Here we extract the data from the letter list and create a data frame
loop_data <- data.frame(letter = names(letter_place.data)[z], value = letter_place.data[[z]], stringsAsFactors = F)

# We then bind all the data frames together
letter_place.data.df <- rbind(letter_place.data.df, loop_data)

}, error=function(e){}) # Ends the tryCatch
}

# We check to see if we have all the letters
unique(letter_place.data.df$letter)

# We change the letters back to upper case for aesthetics in the graphic
letter_place.data.df$letter <- str_to_upper(letter_place.data.df$letter)

library(ggplot2)

# We create a density plot with free y scales to show the distribution, we choose a red fill colour and then we facet wrap it to show each individual letter
p <- ggplot(letter_place.data.df, aes(x=value)) + geom_density(aes(fill=”red”)) + facet_wrap( ~ letter, scales=”free_y”)

# We add appropriate text to titles and axis
p <- p + labs(title = “Where do letters typically appear in english words”, y = “Appearance”, x = “Word length”, fill=””)

# We set a deeper red, choose the minimal theme, remove axis markers and grid, and remove the legend
p <- p + scale_fill_brewer(palette = “Set1″) + theme_minimal() +
theme(axis.ticks = element_blank(), axis.text.y = element_blank(), axis.text.x = element_blank(),
legend.position=”none”, panel.grid.major = element_blank(), panel.grid.minor = element_blank())

# Voila! Here it is
p

 

I hope the post inspired you to do one in your own language. If you do I’ll love to see it.

And if you want more inspiration on cool projects to do check out: http://www.r-bloggers.com/