I created this site to write about things that I find interesting: probability & Bayesian inference, data visualization, puzzles & games, finance, and books.
I'm announcing a new personal project, called pyesg - Python Economic Scenario Generator. Economic Scenario Generators, or ESGs, are used to simulate possible future markets, like stock prices, interest rates, or volatilities. Actuaries use ESGs to determine the potential values of insurance portfolios in the future. This helps them ensure that their companies will have enough money to pay claims even under the worst scenarios. Other professionals might use ESGs to understand how business decisions today could affect company value in the future. The Python ecosystem has amazing libraries for data analysis, machine learning, and many other fields, but not for generating economic scenarios. I hope that an ESG library for Python will make this type of analysis easier and more widely adopted.
I'm a sucker for a good maze problem from the Riddler Express. Let's overengineer it using networkx and python to extend the problem and see what we can learn!
This was a clever version of a classic problem for this week's Riddler Classic. With just two denominations of currency, what is the largest amount we can't create from a combination of bills?
This week's Riddler is a twist on the classic birthday problem. The birthday problem tells us that among a group of just 23 people, we are 50% likely to find at least one pair of matching birthdays. But what if we want to find three matching birthdays instead?
Forgive the pun - this was a fantastic Riddler Express challenging you to calculate your odds of winning a million dollars!
What happens when you create a baseball league from three teams with peculiar specialties? That's the objective of this week's Riddler. We're asked to determine whether it's better to specialize in home runs, doubles, or walks as a strategy to tally the most wins from a season of baseball in the Riddler League.
This week's riddler was an entertaining blend of probability and one of my favorite sports, cycling. We're asked to choose the ideal pace for a team time trial - trying to balance the rewards of a competitive time with the risks of pushing our riders too hard and having them crack due to the effort. Plus, there's a bonus extra credit problem!
I will be presenting at the Society of Actuaries Predictive Analytics Symposium this week in Philadelphia. The topic is "Bayesian models in insurance". I'll introduce Bayesian concepts, including the prior, likelihood, and posterior; I'll demonstrate probabilistic programming in Python and pymc3, and discuss how these techniques can be applied to the insurance industry. The slides can be found within.
The fivethirtyeight riddler this week asks us to make connections between states. Specifically, we want to map the connections between state abbreviations (e.g. CA for California). We've been tasked with finding the longest string of connections where the last letter from one state is the first letter from another, without repeating any states. With 59 state abbreviations to choose from, what is the longest string we can create?
This week's fivethirtyeight riddler was created by yours truly! It was the first puzzle I've submitted to the riddler, and I hope you enjoyed it. This week we attempt to fool a bank with counterfeit hundred dollar bills.
Based on an actual statistical analysis problem from World War II, this week's Riddler asks us to estimate the population of German tanks given uncertain information about the tanks we've observed. Fortunatley, despite the uncertainty in our observations, we can still provide reasonably accurate estimates for the total German tank population. We'll rely heavily on Bayesian analysis to solve this problem.
Computers continue to fascinate me. The Riddler this week deals with an explosion of combinations and math that is nearly impossible to grasp without the help of a computer. Specifically, we're interested in crafting an ideal strategy for the "numbers game" from the UK television show Countdown. The numbers game asks contestants to use four mathematical operations (addition, subtraction, multiplication, and division) with six numbers as inputs to solve for a single, three digit target. Most of the time, this can be quite difficult, especially with a 30-second time limit. However, with the help of a computer, we can solve for every possible combination of input and output to identify the strategy that gives the humans the best chance to win!
This was a colorful Riddler Express. We start with a maze comprised of edges of different colors. Our task is to identify the shortest path from start to finish using only edges of certain colors. This was a great opportunity to take python's networkx library for a spin! We can build the maze as a network, where each edge has a "color" attribute, and use powerful solvers to do the path-finding for us!
This week's Riddler pits the army of the dead vs. the army of the living. As the two armies battle, any fallen soldiers from the living army rise to fight with the dead. How many soldiers would each side need to make it a fair fight?
The Riddler this week asks us about random points on the edge of a circle. Specifically, if we generate $n$ random points around the circumference of a circle, how likely are those points to fall on only one side?
Another weekly Riddler, this time with both an analytical and simulated solution!
I have a distinct memory of participating in my elementary school's spelling bee when I was in second grade. I was the unlikely runner-up, even though I was competing against children in third and fourth grade. What was the secret to my overperformance? Not my natural spelling ability, but rather the rules of the game - a participant is eliminated from the spelling bee after failing to spell a word correctly, which means that going last is an advantage. I was lucky enough to be the near the tail-end of the participants in my spelling bee, which surely improved my final ranking. This week's Riddler asks us to quantify that advantage.
This week's riddler asks us to simulate a game of baseball using rolls of a dice. To solve this problem, we're going to treat the game of Baseball like a markov chain. Under the simplified dice framework, we identify various states of the game, a set of transition probabilities to subsequent states, and associated payoffs (runs scored) when certain states are reached as a result of game events. Using this paradigm, we can simulate innings probabilistically, count the runs scored by each team, and determine the winner.
The Riddler Express this week asks us about collecting sets of cards. In particular, we're interested in collecting a complete set of 144 unique cards. We purchase one random card at a time for $5 each. How many purchases should we expect to make - and how much money should we expect to spend - in order to collect at least one of every card?
I've learned that there are many automatic differentiation libraries in the Python ecosystem. Often these libraries are also machine learning libraries, where automatic differentiation serves as a means to an end - for example in optimizing model parameters in a neural network. However, the autograd library might be one of the purest, "simplest" (relatively speaking) options out there. Its goal is to efficiently compute deriviatives of numpy code, and its API is as close to numpy as possible. This means it's easy to get started right away if you're comfortable using numpy. In particular autograd claims to be able to differentiate as many times as one likes, and I thought a great way to test this would be to apply the Taylor Series approximation to some interesting functions.
This week's holiday Riddler is a twist on the classic "birthday problem". The birthday problem asks us to calculate the probability that at least two people at a party have the same birthday. Most people hearing this problem for the first time are surprised at how few people you need - roughly 23 people results in 50% odds of finding at least one pair of birthdays! For this problem, we're interested in calculating how likely we are to hear the same song more than once from a shuffled playlist. Moreover, what can we infer about the size of the playlist, given that we hear repeats roughly half the time?
As a follow up to my prior article on Black-Scholes in PyTorch, I wanted to explore more complex applications of automatic differentiation. As I showed before, automatic differentiation can be used to calculate the sensitivities, or "greeks", of a stock option, even if we use monte carlo techniques to calculate option price. Many exotic options can only be priced using monte carlo techniques, so automatic differentiation may be able to provide more accurate sensitivities in less time than traditional methods.
I've been experimenting with several machine learning frameworks lately, including Tensorflow, PyTorch, and Chainer. I'm fascinated by the concept of automatic differentiation. It's incredible to me that these libraries can calculate millions of partial derivatives of virtually any function with only one extra pass through the code. Automatic differentiation is critical for deep learning models, but I wanted to see how it could be applied to value financial derivatives.
I wrote yesterday about tracking my steps with a Garmin watch. Perhaps to keep me motivated and active, Garmin provides a daily step goal that moves up or down based on my activity. I've always been curious about how this algorithm works, but I couldn't find any resources that described it. Let's see if I can reverse engineer it instead.
Without a doubt, getting a puppy changes your life for the better. But I wanted to quantify this somehow. I used Bayesian inference to identify whether I logged more steps in the days since our puppy arrived.