Away Goals Rule

What If Away Goals were always weighted at a Premium?

This is just a quick post that I have been meaning to do for a while on the away goals rule.

What You Need To Know About The Away Goals Rule

There are some tournaments like the Champions League where teams play two legs: one home and one away. The winner is the team that scores the most goals, in aggregate, over the two legs. However, if at the end of the two legs the aggregate score is tied then the team that scored more away goals wins.

For example, if team A plays the first leg at home and that game ends 1-1 and then team A travels and plays as the away team and that game ends 2-2 then team A wins even though both games ended in a draw because team A scored 2 away goals while the other team only scored 1 away goal.

Basically, this rule places a small premium on goals scored away from home. I have always wondered: if we are going to say that away goals are more difficult then home goals and we want to say they are worth just a little more then why not apply this to league play as well as tournament games. The code below imagines just such a scenario for the 2016-2017 English Premier League season.

R Code Included and Future Plans

Whether you care about this hypothetical set of outcomes or not, here is the R code included which may be of interest. There is a decent amount of dplyr to wrangle up the raw data and create summary tables. This is the first time that I used the case_when function which is great. This is also my first time creating a slope chart and I used the example from Top 50 ggplot2 Visualizations to make it happen.

This is just for one season. Next, I want to map this same script over as many past seasons as I can and wrap that all up in a Shiny app so anyone interested can select whichever season they want and see a chart similar to the one made below.

Load the Data and Libraries

Read More

Burnley Margins

Fine Margins

Let’s start by stating that Burnley are having an incredible Premier League campaign. Clarets boss Sean Dyche is notable for using his post-match interviews to point out that the club’s success has been decided by fine margins which is to say that many of their victories have been the result of outscoring the opposing team by a single goal.

Just how fine are the margins?

After hearing this sentiment echoed consistently by the manager, I had the idea to look at a metric for evaluating just how fine the margins have been. While all goals are valuable, the goal that makes the total scored one greater than the other team is the most valuable.

For people that don’t follow this sport, 3 points are awarded for a win and 1 point for a draw (As an American, saying the name of this sport is problematic and both options are sub-optimal). Goals above and beyond the goal needed to win have value in the event of a points tie at the end of the season as well as providing psychological and entertainment value however hopefully we can agree that the goal needed to win has the most value.

In some ways the metric we are building treats goals above and beyond the one needed to win as well as those scored in a loss as wasted effort. While not entirely true, for many teams it is critical to get as many points as possible from the relatively few goals they are able to score. With all this said, we can now somewhat easily calculate the point value of every goal to compare Burnley’s margins or maybe better yet their efficiency with the rest of the league.

Load the Data and Libraries

Football Data UK has an impressive amount of freely available data. We will load the results from the current Premier League season.

Read More

Figure Skating Medal Distribution

Medals! Who Wants One?

So, I was talking some figure skating with a co-worker as you do and since I know little about the sport I asked if there was an optimal age for a figure skater. I was told that 20 is the golden age. That stuck in my head and I wondered if anyone had looked at this. Likely someone has however a very cursory Google search produced no results while also helpfully leading me to a Wikipedia page with the exact tables that I would need to look at this myself.

So, is 20 the golden age?

First, let’s get the table and extract the age for male figure skaters:

Read More

Tea Time Lyric Match - Part 2

Follow up function wrap for song lyric matching

Following up on this post this is just a quick follow-up that takes the example from the last post and wraps it in a function. The function looks for trigrams, bigrams and word matches sans stop stop words. It then takes any of those three possible matches and puts them all together in one data frame.

Here is the code with an example:

Read More

Tea Time Theme Time Term Matching

A quick code snippet to find matching terms between song lyrics

I love BBC 6 Music. On my morning train rides into LA I like to listen to the Radcliffe and Maconie Show. The hosts play a game at the end of their show where they play three tracks and ask listeners to guess what connects all three tracks.

I have yet to win this game which has prompted me to try to see if I can gain an advantage by analyzing some data about the the themes submitted. I have noticed that since the start of 2018, the theme relies on an exact term or phrase match among the lyrics 17% of the time.

Read More

Giving Vehicle Analysis

There was an interesting question on PRSPCT-L a few weeks ago:

Do you know of or have experience with demographic modeling and data appends that help predict a donor’s preferred giving method (i.e., event, direct mail, online, etc.)?

I thought I would take a minute to look into this. I actually couldn’t initially think of which variables to include aside from age. My starting point was looking at those that made gift by check through the mail, online, over the phone or in person during this fiscal year. I was evaluating the data at the household level so gifts from married households only count once.

I decided to not use prior giving at all though I think that is likely the best predictor. That is, those that gave online last year will likely do so again this year. I also think other giving stats like length of giving, average amount and total amount would also be helpful in making predictions.

However, my interpretation of the original question was that this should just be an analysis of demographic data and so that is the path that I took.

I eventually choose:

  • age
  • constituent type: company, individual, etc.
  • spouse constituent type: for those married (with an ‘N’ for those not married)
  • a variable called type2 which is maybe not the greatest variable ever
    • type2 = 1 means married
    • type2 = 2 means not married and constituent is male
    • type2 = 3 means not married and constituent is female
    • type2 = 4 means company
    • type2 = 5 means foundation
    • type2 = 6 means missing information or other
  • phone: do they have a phone number in our system or not
  • email: do they have an email in our system or not
  • mail: do they have a good mailing address in our system or not
  • assigned: is this constituent assigned to a gift officer
  • I added a 1 or 0 if they had any of the following no contact flags:
    • no contact at all
    • no contact by phone
    • no contact by email
    • no contact by mail
    • no personal contact
    • no solicitations
  • I also added a code for geographic distance from the university:
    • 1 = in our tri-county area
    • 2 = in our state
    • 3 = everyone else
  • Lastly, I added a count of the number of solicitations they received by type

My approach to problems like this is to get a decent sized set of data points together based on intuition like the ones described and then create a really quick model. I will skip the modelling code here for now but it is in my fundraising analytics folder along with the data set that I used.

This next part is just the report out from the model. I cannot seem to hide it so just scroll through this to get to the plots.

Read More

Giving By Major

To perform this analysis, you will need a csv file with the unique IDs for your alumni as well as their majors and if they are a donor or not as a binary variable (1 or 0). You can also add in any other variables that you want to include.

First, load the ggplot2 library (use install.packages(“ggplot2”) if you don’t have this package yet. We will use this for plotting. Also, set the working directory to the folder where this csv file is located:

Read More

You're up and running!

Next you can update your site name, avatar and other options using the _config.yml file in the root of your repository (shown below).

Read More