Improving Actuarial Communication



Jason Ash FSA, CERA, MAAA

Zachary Brown CFA, FRM, PRM



August 28, 2018

About the speaker

Jason Ash is the founder of Ash Analytics, an actuarial consulting business focused within the fintech and insurtech space.

Jason is a former consulting actuary at Milliman with a background in financial risk management. He was an analytics manager at LendingClub, a fintech company that operates a peer-to-peer credit marketplace, and he has worked at an early stage insurtech startup based in San Francisco.

Jason can be reached at jason@ashanalytics.com, or via www.jtash.com

Agenda

Motivations for visual communication

Defining properties of an effective visualization

Methods for presenting data

Python visualization libraries you can use immediately
Python-powered, reproducible jupyter notebooks

Presentation

The presentation itself demonstrates new and exciting visual technology powered by jupyter notebooks and reveal.js. It combines code, charts, images, video, and an interactive slide format that is best viewed in a web browser.

Motivations for visual communication

"As the knowledge of mankind increases, and transactions multiply, it becomes more and more desirable to abbreviate and facilitate the modes of conveying information..."

"Men of high rank, or active business, can only pay attention to outlines… It is hoped that, with the Assistance of these Charts, such information will be got without the fatigue and trouble of studying the particulars."


--William Playfair

Excel

Alt + n + n = chart

Python

pandas - the premier data science and visualization library

In [2]:
import pandas as pd

spx = pd.read_csv('data/spx.csv',index_col=0,parse_dates=[0])
spx.plot(legend=False,ylim=(0,3000));

Motivations for visual communication

Goal: to quickly understand large, complex data

Goal: use effective visual design to communicate useful insights to your audience

Example

Communicating data vs. perspective

Susie Lu, www.susielu.com

Susie Lu, www.susielu.com

Example

"Above all, show the data" --Edward Tufte

New York Times, www.nytimes.com

New York Times, www.nytimes.com

New York Times, www.nytimes.com

New York Times, www.nytimes.com

New York Times, www.nytimes.com

New York Times, www.nytimes.com

New York Times, www.nytimes.com

"Above all, show the data" --Edward Tufte

... but remember

Most data is generated by inherently subjective processes

It can be influenced by societal norms and biases

Sometimes the most important insights are revealed by the absence of data.

Wall Street Journal, www.wsj.com

Pew Research Center, www.pewsocialtrends.org

Motivations for visual communication

Understand your data - emphasis on speed, observing trends, personal consumption

Communicate your data - emphasis on effective design, sharing a perspective, being aware of data limitations, and influencing an audience

Properties of an effective visualization

Explicit - show data clearly, but understand its limitations, label directly when possible

Implicit - intentional design, clear perspective, beware of defaults

Methods for presenting data

Methods for presenting data

Technology

Python and the pandas library: a "supercharged excel"
Jupyter notebooks: shareable, reproducible analytics
Open source data and code

Design

Emphasizing perspective, context, and clear messaging

"The more sophisticated science becomes, the harder it is to communicate results."

"Scientific results today are as often as not found with the help of computers. That’s because the ideas are complex, dynamic, hard to grab ahold of in your mind’s eye. And yet by far the most popular tool we have for communicating these results is the PDF—literally a simulation of a piece of paper. Maybe we can do better."

--"The Scientific Paper Is Obsolete," The Atlantic

Technology

Jupyter Notebook

Shareable, source-controlled document viewed in a web browser

Open source, free to use, continually developed and improved

Available for nearly every programming language

Example

Follow your charts

In [6]:
# hedging example
import numpy as np
np.random.seed(42)

d = pd.DatetimeIndex(start=pd.datetime(2015,1,1),end=pd.datetime(2018,8,1),freq='w')
unhedged = np.random.normal(0.003,0.05,d.shape[0])
hedged = np.random.normal(0.001,0.005,d.shape[0])
In [7]:
print(d[:12])
DatetimeIndex(['2015-01-04', '2015-01-11', '2015-01-18', '2015-01-25',
               '2015-02-01', '2015-02-08', '2015-02-15', '2015-02-22',
               '2015-03-01', '2015-03-08', '2015-03-15', '2015-03-22'],
              dtype='datetime64[ns]', freq='W-SUN')
In [8]:
print(hedged[:18])
[ 0.00063586 -0.00323397 -0.00657424 -0.00123257  0.00528199  0.00207047
 -0.00522869  0.0018659   0.00292659 -0.00341929  0.00176863  0.00129104
 -0.00471485  0.00278894  0.00380392  0.00641526  0.00626901 -0.00588835]

Follow your charts

Important to document methodology, data sources, anomalies, etc.

Ideal to understand your audience, refine your perspective, set context

How to get started

One package that includes python, jupyter notebooks, and useful libraries

Start by importing a .csv and tinker with excel-like functionality in pandas

Explore online resources: github, stack overflow

Example

Open data from fivethirtyeight

Hundreds of datasets used in analytic journalism

Understand your data

In [9]:
url = 'https://raw.githubusercontent.com/fivethirtyeight/data/master/daily-show-guests/daily_show_guests.csv'

guests = pd.read_csv(url,parse_dates=['Show'])
guests.columns = ['year','occupation','episode','group','name']
guests.group = guests.group.str.title()
guests.head(10)
Out[9]:
year occupation episode group name
0 1999 actor 1999-01-11 Acting Michael J. Fox
1 1999 Comedian 1999-01-12 Comedy Sandra Bernhard
2 1999 television actress 1999-01-13 Acting Tracey Ullman
3 1999 film actress 1999-01-14 Acting Gillian Anderson
4 1999 actor 1999-01-18 Acting David Alan Grier
5 1999 actor 1999-01-19 Acting William Baldwin
6 1999 Singer-lyricist 1999-01-20 Musician Michael Stipe
7 1999 model 1999-01-21 Media Carmen Electra
8 1999 actor 1999-01-25 Acting Matthew Lillard
9 1999 stand-up comedian 1999-01-26 Comedy David Cross
plt.style.use('fivethirtyeight')
fig, ax = plt.subplots()
plt.suptitle("Who got to be on 'The Daily Show'?",fontsize=24,ha='left',x=0.02)
plt.title('Occupation of guests, by year',color='0.4',fontsize=18,ha='left',x=-0.065)

# data
out.plot(ax=ax,legend=False,color=['#0099ff','#990099','#ff3333'])

# annotation
ax.annotate('Acting, Comedy & Music',(2001,0.8),color='#0099ff',fontsize=18,fontweight='bold')
ax.annotate('Media',(2007.5,0.52),color='#990099',fontsize=18,fontweight='bold')
ax.annotate('Government and Politics',(2008.5,0.05),color='#ff3333',fontsize=18,fontweight='bold')

# chart style
ax.axhline(0,color='0.6',lw=2.5)
ax.set_ylim(-0.05,1.05)
ax.set_xlim(1998,2016)
ax.set_xticks([2000,2004,2008,2012])
ax.set_xticklabels(['2000',"'04","'08","'12"])
ax.set_yticks([0,0.25,0.5,0.75,1])
ax.set_yticklabels(['0%','25%','50%','75%','100%'])

plt.show()

Communicate your data

Example

Fivethirtyeight Riddler

Two weekly puzzles based on math, probability, and logic

Thousands of respondents, only a handful of featured solutions

Riddler hot potato

A class of 30 children is playing a game where they all stand in a circle along with their teacher. The teacher is holding two things: a coin and a potato.

The game progresses like this: The teacher tosses the coin. Whoever holds the potato passes it to the left if the coin comes up heads and to the right if the coin comes up tails. The game ends when every child except one has held the potato, and the one who hasn’t is declared the winner.

How do a child’s chances of winning change depending on where they are in the circle? In other words, what is each child’s win probability?

Understand your data

In [12]:
import random

def model(trials, students=30):

    for _ in range(trials):

        s = [0] + [1]*students
        i = 0
        
        while sum(s) > 1:
            i += random.choice([-1,1])
            s[i] = 0

        yield s.index(1)
In [13]:
trials = 100000
results = np.array(list(model(trials)))
In [14]:
out = np.histogram(results,bins=range(1,32),normed=True)[0]
out
Out[14]:
array([ 0.03313,  0.03346,  0.03447,  0.03304,  0.03361,  0.03251,
        0.03395,  0.0331 ,  0.0334 ,  0.03253,  0.03321,  0.03345,
        0.03331,  0.03452,  0.03215,  0.03176,  0.03364,  0.03297,
        0.03354,  0.03215,  0.03383,  0.03275,  0.03309,  0.033  ,
        0.03379,  0.03376,  0.03397,  0.03432,  0.03446,  0.03313])
In [16]:
fig, ax = plt.subplots()
plt.plot(np.arange(out.shape[0]),out);
In [17]:
fig, ax = plt.subplots()
ax.set_ylim(0,0.05)
ax.axhline(1/30,ls='--',c='0.8')
plt.plot(np.arange(out.shape[0]),out);

Communicate your data

A class of 30 children is playing a game where they all stand in a circle along with their teacher. The teacher is holding two things: a coin and a potato.

The game progresses like this: The teacher tosses the coin. Whoever holds the potato passes it to the left if the coin comes up heads and to the right if the coin comes up tails. The game ends when every child except one has held the potato, and the one who hasn’t is declared the winner.

How do a child’s chances of winning change depending on where they are in the circle? In other words, what is each child’s win probability?

Communicate your data

Design: Portray game mechanics in addition to the data

Perspective: equality of outcomes is the unexpected result: highlight this

Audience: emphasize creativity while preserving accuracy

fig, ax = plt.subplots(1,2,figsize=(12,6),subplot_kw=dict(projection='polar'))
plt.tight_layout()

# plot data
points = np.deg2rad(np.linspace(0,360,31,endpoint=False))

ax[0].plot(points[1:],p,alpha=0.9)
ax[0].fill_between(points[1:],p,0,alpha=0.1)
ax[1].plot(points[path],np.linspace(1,0.25,path.shape[0]),alpha=0.9)
ax[1].axvline(points[22],0.25,1,color='darkgreen',alpha=0.8)

# domain and range
ax[0].set_ylim(0.031,0.034)
ax[0].set_yticks([round(p.min(),6),round(p.max(),6)])
ax[0].set_yticklabels([])
ax[1].set_ylim(0,1)
ax[1].set_yticks([])

# annotations
kwargs = dict(ha='center',color='0.3',style='italic',size=11)
ax[0].annotate('minimum: {:,.2%}'.format(p.min()),xy=(np.pi,p.min()*0.99),**kwargs)
ax[0].annotate('maximum: {:,.2%}'.format(p.max()),xy=(np.pi,p.max()*1.003),**kwargs)
ax[1].annotate('winner!',xy=(points[22]-0.05,0.9),color='darkgreen',size=14,alpha=0.8)

# style, circumference labels, 'teacher' label at the bottom
for i in range(2):
    ax[i].spines['polar'].set_visible(False)
    ax[i].xaxis.grid(True,color='0.95')
    ax[i].set_xticks(points)
    ax[i].set_xticklabels(['teacher'] + list(range(1,31)),size=14,color='0.4')
    ax[i].set_theta_offset(-np.pi/2)

plt.show()

Final Thoughts

Understand your data
Find ways to use new tools for familiar tasks, e.g. Python, jupyter notebooks instead of excel. It takes less time than you think.

Communicate your data
For each visual you create, find the opportunity to communicate a perspective or justify an action, rather than simply showing the data