Adrià Garriga's blog

Writing feels like outputting words until you approximate your thoughts.

An Alternative Population Ethics

Posted in Philosophy

Let’s say we believe consequentialism and utilitarianism. Roughly, we hold that the morality of an action depends only on its consequences, and the consequences we care about are only changes in the amount of happiness in the world. We wish to increase the amount of happiness, as much as possible. We are rational, so every time we act we will choose the action that we expect to lead to the outcome with maximum happiness. (This is essentially what effective altruists try to do.)

Now, the world we live in has many human (and maybe nonhuman) individuals with moral weight. To make our life easier, let’s say we can quantify the happiness (also called welfare or utility) of an individual. That utility is a real number, positive if the individual’s life is worth living, zero if it is neutral, and negative if the individual is better off not existing. Let’s also assume we know, for each action, the happiness outcome of each individual.1

If none of the actions change identity or number of the agents that exist, choosing is easy: the best action is that which maximises the sum2 of happiness.

Population axiologies

However, what if the actions can also change the number of individuals in the population? That happens in many cases: deciding whether to have a child personally, designing the structure of money incentives and tax changes to encourage/discourage children, giving away or selling contraception… For these cases, we need to be able to compare populations with a different number of individuals.

Thus we get to the problem of population axiology: give an “at least good as” complete, transitive ordering of all possible populations, solely based on people’s utility.

Borrowing notation from Arrhenius (2000). Let $A$, $B$, $C$, $B \cup C$, … be populations, or sets of individuals. We may indicate the population size: $A_n$ is a population of $n$ individuals: $a_1, a_2, \dots, a_n$. The numerical representation of $a_i$’s welfare is given by the function $v(a_i)$. Since it is transitive, we can safely represent the ordering with a “$\geq$” symbol. The problem can then be written: for any two populations $A$ and $B$, is $A_n \geq B_m$ or $A_n \leq B_m$ ?

Let’s take care of the case where $n \neq m$. We could, again, just care about the sum of utility, so $A \geq B$ if and only if $\sum_{a_i \in A} v(a_i) \geq \sum_{b_i \in B} v(b_j)$. But this opens us up to the Repugnant Conclusion (Parfit, 1984) (video explanation by Julia Galef): a population of $10^{100}$ people with welfare 1 each would be much better than a population of 10 billion with welfare $10^{80}$ each.

And this, to most people, is unpalatable. However, some philosophers think we should accept the Repugnant Conclusion. Tännsjö (2002) argues that when we picture a “life barely worth living” we picture precarious hygienic conditions, lack of food, and depending on the weather for your food crop. But actually it is our first-world materially comfortable lives that are barely worth living. Also, we do not care about the obvious goodness of the enormous increase in population because of our scope insensitivity bias.

Still, the Repugnant Conclusion created the problem of population axiology and led to the following findings.

Impossibility theorems

Many alternatives to summing utility have been proposed: counting average utility. Having each additional life have less weight, thus having any sum of bounded utilities converge. Having a “critical level” which marks a minimum for well-being that is above a life barely worth livin. See introduction by Greaves (2016).

Arrhenius showed, in 2011, the impossibility of a satisfactory population ethics. After his already discouraging results in 2000, his findings challenged population ethics even as moral framework. Both papers present really compelling conditions that a population axiology ought to satisfy, represent them formally, and mathematically prove them impossible to hold in the same axiology. I will describe the conditions of the 2011 paper:

  • The ordering is complete (no two populations are incomparable) and transitive ($A \geq B$ and $B \geq C$ implies $A \geq C$).

  • Given two populations of the same size and with all the individuals having the same welfare, the one with higher sum3 of welfare is not worse.

  • Let $A_n$ be a population of $n-1$ people with very high welfare and one person with welfare $w$. Let $B_n$ be a population of $n-1$ people with very low positive welfare and one person with welfare $w+1$. Then, for any population $X$, there exists an $n$ sufficiently large such that $A \cup X \geq B \cup X$. Roughly, if you change enough people from low positive to really high welfare in a population, that compensates for one individual with a slightly lower welfare.

  • Let $B_n$ be a population of $n$ individuals with welfare $w$. Let $A_n$ be a population of an individual with welfare $w+1$, and $n-1$ individuals with welfare $c$, $c < w$. Given a population $X$ with the welfare of everyone being between $c$ and $w$, there exists a sufficiently large $n$ such that $B_n \cup X \geq A_n \cup X$. Roughly, making a lot of people better off at the expense of very little of one person’s welfare, such that these people mentioned are all equally happy, does not make things worse.

  • There is at least one negative welfare level $w<0$, such that adding any number of people with positive welfare $v>0$ is not worse than adding a person with welfare $w$.

  • For any population $X$, there exists a perfectly equal very-high-welfare population $A_n$, and very-negative-welfare population $B_m$, such that $X \cup A_n \geq X \cup B_m \cup C_l$, for all, arbitrarily large, values of $l$. Population $C$ is composed of very low positive lives. This means avoidance of a Repugnant Conclusion on steroids: not only is there a vast number of barely-worth-living lives, but there also exist many lives that are much worse than not existing.

Reiterating: it is impossible to have all these properties in the same population axiology.

Side-stepping the issue: dropping the “axiology” requirement

Notice that the desirable properties of the “population axiology” can be framed as transitions. This is a very similar form to (one of) the intended use(s) for them, namely, deciding which is the most moral of any two actions with known4 outcomes. Let’s take that strong hint, and re-frame the choice not as deciding between any two populations, but between any two populations starting from the status quo.

I propose ascribing moral weight only to individuals who currently exist. When taking a decision, choose the outcome that maximises the sum5 of the utility of the people who are alive at that moment. This way of looking at population ethics is unfortunately not an axiology. An axiology would allow us to output a single number measuring the utility of the whole population, and using that as a drop-in replacement for single-agent utility in all the game theory (and game practice, also called Artificial Intelligence, AI) that we know. Instead, our criterion is only useful when an agent is taking a decision based on a model of the world and on future outcomes of their current action.

We can illustrate this more easily within the somewhat more restricted framework of AI as online search (Russell and Norvig, 2009, Section 4.5). In the following figure, we show a decision tree. At each node we face a decision, with as many branches as possible courses of action we can take. We are currently in the bottom-most node, the root, at time instant $t=0$. At each time step, the individuals in our population have a positive or negative experience. At $t=3$ we have the expected return6 7 for each chain of actions. We depict surrounded in green the utility of the individuals who are alive when taking the decision, and in purple the utility of those who start existing later. The dotted green-purple line marks the moment where, in the possible future of each path, at least one new individual exists.

Illustration of our decision process for populations.

The decision process can be viewed as an adversarial search (Russell and Norvig, 2009, Section 5) process. In this game we have two players, represented as blue and green edges in the picture. Blue plays in states where at least one new individual has been born, that is, above the dotted line. Green plays below. Green plays at the root node, which means the agent that is deciding is Green. Green cares about maximising the return of the people that exist in $t=0$, that is, the return we depict surrounded in green. Blue wants to maximise the utility of all the people, that is the sum of the numbers surrounded in green and purple. The edge that is highlighted in the relevant player’s colour is the edge the player would choose if he were to play from that edge’s parent node. Notice that, in the right-most outcome, Blue never plays, since nobody is born.

In our example, we choose the bubble with (13, -4), because it is the one with the highest Green-utility out of the options Blue leaves available.

This system behaves as if we decided to increase the utility of the existing people now, knowing that, when new people are born, we will care about them too. It resembles our current society’s treatment of babies: you can use contraception to prevent a baby’s existence all you want, but once you have a baby, if you mistreat it or kill it you shall face severe consequences. Barring contraception accidents, children exist because their parents thought they would be happier raising them.8

Does this avoid the Repugnant Conclusion?

Yes. We start from a population of $n$ individuals with a level of well-being. The Repugnant Conclusion entails sacrificing these individuals utility to create many more individuals with low positive welfare, so at some point we have to allow a birth. If we do not have a choice over this it is useless to apply any moral theory. Thus we can decide, at some point in the middle, to bring into existence an individual, or not to. We know that, if we bring into existence that individual, we will be morally forced to decrease the others’ well-being for it. Thus, we decide for the good of the existing beings, and do not bring into existence the individual.9

This can fail when the decider lacks sufficient foresight, and incorrectly predicts that the new individual will not decrease the others’ well-being. But that is not a problem of our decision criterion, rather, of our prediction mechanism. Consider the following situation of an agent maximising its utility:

Insufficient prediction power causes bad decision.

The agent chooses the right-most node, because it thinks it will lead to 14 return. But, one time-step later than the agent has looked, it gets very low reward, thus very low return. This should be alleviated in some way, but that is not in the scope of this post.

So when the prediction mechanism works as intended, and the world does not take such drastic turns, the decision criterion avoids the Repugnant Conclusion.

Does this fail in any other way?

Yes. The criterion as-is needs at least one amendment. Currently, an agent deciding by this criterion will not hesitate to create arbitrarily many lives with negative utility, to increase the utility of the people who are alive just a little. One may think the commitment to a life when it exists would fix that, but that is not the case. If Green can restrict its decisions in some way, it can take all the necessary decisions before anyone is born, and thus completely prevent Blue from playing. In a real world example, Green becomes president, passes laws mandating the construction of inhuman but cheap factory farms, laws preventing anyone from messing with those farms, and then removes itself out of office before any farm is in operation, for example by committing a crime. Thus Green will become very sad when the farms start working, but its own purpose at the moment of decision will be fulfilled, namely, maximum utility for the beings who were alive back then.

We could forbid the creation of any lives with negative welfare entirely. But that would outlaw really positive scenarios, such as one person working a shitty job and having nobody to talk to, in order for $10^{100}$ people to live lives with $10^{100}$ welfare. No, instead, we need to forbid really bad trade-offs.

A possible rule for this would be: when playing as Green, find the Green-best outcome such that no purple life has a negative welfare. Subtract that from the absolute Green-best outcome. The difference is the maximum price, in negative purple-welfare, that you are able to pay. All choices outside of the budget are outlawed for Green.

In our previous example, the best Green outcome without negative welfare is the right-most node, with a green-utility of 10. The absolute best Green outcome, and indeed the one it chose previously, is 13. Thus, the maximum price to pay is $13-10=3$. The choice Green made last time has -4 utility for purple people, so it is out of budget. Within budget, the best outcome is again the rightmost node, which gives 10 utility to the existing population, while creating no new individuals.

In practice, if we end up having an AI that needs to make this sort of decisions, I suspect this will lean towards just not creating any lives, and creating lives with negative welfare should be rare (what for? Humans are slow and inefficient at working, and they decrease utility when unhappy). If it’s an economist or politician, they can use this rule, but they wouldn’t do anything truly absurd; they are human.

Closing thoughts

Based on insights from AI and a few moral intuitions, I have created a decision criterion for a changing population, when considering possible futures. The criterion avoids the Repugnant Conclusion, behaves like a simple sum of utilities when population doesn’t change, mirrors our current accepted morals for treatment of nonexistent individuals, and doesn’t seem to have any glaring holes.

Of course, such holes may be found, if you do find one please comment, email me, or otherwise let me know.


(I recommend the Google Scholar Button browser add-on for comfortable fetching of papers, but I’ve included links for convenience here.)

[Blog] Armstrong, S. Integral versus differential ethics.

[PDF] Arrhenius, G., 2000, An Impossibility Theorem for Welfarist Axiologies. Economics and Philosophy, 16: 247–266.

[PDF] Arrhenius, G., 2011. The impossibility of a satisfactory population ethics. Descriptive and normative approaches to human behavior.

[PDF] Greaves, H., 2016. Population axiology.

[Book] Parfit, D., 1984. Reasons and persons. OUP Oxford.

[Book] Russell, S. and Norvig P., 2009. Artificial Intelligence: A Modern Approach. 3rd. Prentice Hall Press, Upper Saddle River, NJ, USA. ISBN: 0136042597, 9780136042594.

[PDF] Tännsjö, T., 2002. Why we ought to accept the repugnant conclusion. Utilitas, 14(3), pp.339-359.

  1. I sometimes write “people” in this essay because it reads natural, but you may substitute for “individuals with moral weight” every time. Also, since we quantify utility as numeric, a moral weight can be really a numeric weight, to multiply that individual’s utility by. [return]
  2. Or average, if you prefer, it makes no difference. Quick proof: Let $A$ be the set of possible actions, $U(i, a) : I \times A \mapsto \mathbb{R}$ is the function giving the utility, a real number, of individual $i$ when taking action $a$. Given the set of individuals $I$ and actions $A$, clearly $$ \forall a, a’ \in A \, \sum_i U(i, a) \geq \sum_i U(i, a’) \iff \frac{\sum_i U(i, a)}{|I|} \geq \frac{\sum_i U(i, a’)}{|I|} $$ since $|I|=|I|$. [return]
  3. Again, or average, see footnote 2. [return]
  4. If you believe in expected value utility, you can extend that to population outcomes with probabilities. While I don’t know of any alternative to it, expected value utility has some problems, namely Pascal’s Mugging. [return]
  5. See footnotes 2 and 3. Aren’t populations with the same number of individuals wonderful? [return]
  6. This term is taken from the artificial intelligence literature, specifically from sequential decision processes, which is what are facing here. The idea is that, at each time step $i$, the agent receives a numeric reward $r_i$. The agent also has a discount rate $\gamma$, $0 \leq \gamma \leq 1$, which models how much the agent cares about receiving rewards sooner.10 The return at time $t$, $R_t$, is the sum of all discounted future rewards until then, that is: $$ R_t = r_0 + \gamma r_1 + \dots + \gamma^t r_t = \sum_{i=0}^t \gamma^i r_i $$ The path of actions that has the highest return at time $t$, when the agent’s decisions end ($t$ can be infinite too, if the decisions never end), is the path that the rational agent prefers. [return]
  7. Why are we using returns and not utility over the whole lifetime now? We have gone down into the realm of making decisions while the population is changing, while individuals experience pain or pleasure and are born and die. It makes much more sense to think about populations and the pain or pleasure experienced during a time interval of each individual, rather than during their whole lifetime. This is the notion of returns, and thus our use of them, and why we will use the term somewhat interchangeably with “utility” around this passage. [return]
  8. Be it because they enjoy the raising, because their child being happy makes them happy, or to make their own parents happy. But it all reflects back to the parents’ own projected well-being. [return]
  9. This idea very much resembles that of Integral Ethics (Armstrong, 2014), which was inspiration for this post’s approach. [return]
  10. In economic terms, $\gamma$ is the time preference of the agent. Note that the terms “high” and “low” are reversed for time preferences and discount rates. A high $\gamma$ means a low time preference, and vice versa. [return]


Posted in Language

Attention! For those of you who do not understand Catalan, There is an English translation below!

La meva germana i jo vam començar a fer la primera d’aquestes frases. No em semblen especialment difícils de dir, però això potser és perquè els he escrit jo. Vaig gaudir del procés i el resultat, i per això vaig escriure les altres tres. Aquí van! Prova de dir-les!

  • La mangosta Agustina i la llagosta Gustau, vaguistes, gasten en gustos a l’Agost.

  • Manca de xancles, xanques blanques i tanques de branques fan que tanquin encara la banca.

  • La rara barana que el pare i la mare preparen fa cara la vara on para la Sara i s’encara al sarau i gatzara.

  • Fa poc que moc el soc del lloc al foc i toco un roc. Jo sóc un cóc de mocs que xoca al boc.

Estan creades de forma que tenen una seqüència concreta de sons repetida moltes vegades. Excepte la primera, també alternen síl·labes tòniques i àtones, per donar ritme. Noteu també la diferència entre “o”s obertes [IPA ɔ] i tancades [o] en l’última.

Traduir aquesta entrada m’ha fet adonar-me que, sí, les llengües romàniques i l’anglès tenen gairebé la mateixa gramàtica. Que fort.

In English

Title: Tongue Twisters

My sister and I started composing the first of these sentences. They don’t seem hard to say to me, but that may be because I wrote them. I enjoyed the process and results, so I wrote the other three. Here they go! Try to read them out loud!

  • La mangosta Agustina i la llagosta Gustau, vaguistes, gasten en gustos a l’Agost. (Augustine (female) the mongoose and Gustaf the lobster, strikers, spend on pleasures during August.)

  • Manca de xancles, xanques blanques i tanques de branques fan que tanquin encara la banca. (Lack of flipflops, white stilts and fences made of branches make the bank still be closing.)

  • La rara barana que el pare i la mare preparen fa cara la vara on para la Sara i s’encara al sarau i gatzara. (The rare railing that father and mother prepare makes expensive the rod where Sara stops and turns to face the party and rejoicing.)

  • Fa poc que moc el soc del lloc al foc i toco un roc. Jo sóc un cóc de mocs que xoca al boc. (For a short time, I have been moving the stump from the place to the fire and touching a rock. I am a snot pie that shocks the buck.)

They are created to have a certain sequence of sounds repeated many times. Except for the first one, they also alternate stressed and unstressed syllables, to give rhythm. Note also the difference between open [IPA ɔ] and closed [o] “o”s in the last one.

Translating this post made me realize that yes, latin-derived languages and English have almost the same grammar. Damn.


Frictionless OS X screenshot sharing with Nyazo

Posted in Tutorials

OS X has a nice screenshot feature. It’s fast and easy to use: just press Cmd-Shift-4 and select your shot, which will appear in the desktop. By pressing space, you can switch to window shot.

The problems come when you want to share your screenshot on, say, IRC. Direct transfer (DCC) is usually too much hassle for that, and some clients do not support it. What you usually do is upload the picture to the web, and link it.

You could just manually upload the file the screenshot feature produces. But I’m spoiled by the use of Nyazo on Linux, and I wanted better.

So in a few minutes you can tweak Nyazo’s Linux script and end up with this:

screencapture $1 /tmp/nyazo.png
link=`curl -s -F "id=''" -F "imagedata=@/tmp/nyazo.png"`
rm /tmp/nyazo.png
open $link
echo $link | pbcopy

Save it as something like /usr/local/bin/nyazo and make it executable (chmod +x <file>). Then open Automator and create a new Service. Add the “Run Shell Script” action to it, write /usr/local/bin/nyazo -i to it, and save it.

Click the “Run” button at the top right of the window to test the service works as intended. A coordinates selecting cursor should appear. If you select a region, it should be uploaded to Nyazo.

Now open “Keyboard” in “System Preferences”. Go to the “Shortcuts” tab, and select “Services” on the left. At the end of everything, you can bind the interactive Nyazo Service you just created. I have it bound to Option-Cmd-Shift-4.

Of course, you can also bind the fullscreen screenshot Nyazo to keyboard. Just repeat the same steps, removing the “-i” option from the service, and binding to something else.

Happy shooting!


Contest writeup: Murcia qualifiers 2015

Posted in Problem Writeup

This is a writeup of the problems my team solved in the Murcia contest, which we were participating in as preparation for SWERC. Those people are way better than us, but we will not give up!

The problems are given the UVa numbers that will soon appear.

12962 Average Reuse Distance

At first read this problem seems easier than it really is. But it’s not a hard problem, it’s actually a fairly straightforward simulation.

The Problem

Given a sequence of memory accesses to different addresses (represented by the characters 'a' to 'z'), give its Average Reuse Distance. The Reuse Distance of a memory access to address $x$ is the number of other addresses accessed since the last access of $x$. First accesses to any given position do not count towards the average.

The solution

The core of this problem was keeping track of the different memory accesses since the last access of each address, and then counting them in an efficient way. We solved this by storing which addresses had been accessed as a bitmask. You can very easily reset the bitmask by setting it to 0, and count the number of ones in the bitmask with GCC and Clang’s __builtin_popcount.

The algorithm is the following:

  • Total sum = 0
  • For each character $c$ of input read:
  • If $c$ hasn’t been encountered before, mark it as encountered.
  • Otherwise, add the number of ones in the bitmask to the total sum and add 1 to the number to divide the sum by.
  • For all characters a-z, set their mask’s th bit to 1.
  • Reset $c$’s bitmask to 0

In the end, you output the mean from the sum you gathered, or NaN if the divider is 0.

Here is the code.

12963 Astragalus Nitidiflorus

We technically did not solve this problem within the contest, but came very close. The night after the contest, when sitting down to look at the problems again, we found what was wrong in 5 minutes.

The problem

You are given a grid with characters on it. You have to find the horizontal instances of the word “GARBANCILLOS” and the vertical instances of the word “ASTRAGALUS”. Whenever they intersect is a possible location, but each word instance can only be used once. Determine the number of locations.

The grid is at most 150x150.

The solution

You need to do two things here. First, find the strings’ locations and where they intersect. Then, compute the maximum matching of vertical and horizontal strings.

To find the strings’ location in $\Theta(n^2)$, we used a rolling hash. The longest of the strings is 12 characters, $12 \cdot 5 = 60 < 64$, so a hash with 5 bits per character can be used without collisions. We discard all the characters outside of the range from 'A' to 'Z', and give them the mask 0x1f.

First we look at the character matrix horizontally, and store the position of the found “GARBANCILLOS” in a sorted vector, one for each row. We also store its future index in the graph of intersecting strings in an std::map. Then we look at the character matrix vertically, and when we find a match we binary search for the possible horizontal matches in each row. If one is find, the edge is added to the maximum-matching matrix.

Finally, we output the maximum matching of the matrix we built.

Here is the code.

12965 Angry Bids

Nice pun.

The problem

You have some product’s producers, and some consumers. Each of them has an ideal price. The consumers will get angry if the price is higher than their ideal price. The producers will get angry if the price is lower than their ideal price. Find the price such that the number of angry people is minimal, and how many angry people there are.

The prices range from $0$ to $10^9$. There are up to $10^4$ producers and $10^4$ consumers.

The solution

It’s amazing how many problems are solved this way. If we check every possible price in order (unfeasible), we will have more consumers progressively getting angry, and more producers progressively getting happy.

Thus, we assign producers the number 1, and consumers the number -1. We create pairs of <price, assigned number> for each of them, and sort those pairs. We increase the price of the consumers in the pairs by 1 because that is when the consumer will start being unhappy. We sort this sequence and then iterate over it, keeping a sum of all the assigned numbers we have found. The point where this sum is maximum is the optimal point.

Here is the code

12967 Spay Graphs

However, the name of the problem in the contest is “Spray Graphs”. And the page’s title is “Spary graphs”. Haha.

The problem

You are given the graph structure in the picture. In this picture, each “level” of the graph has an assigned letter. We can build more levels by branching every node in right angles as shown in the figure.

Spray graph of size 4

Spray graph of size 4 Problem by Facultad de Informática, Universidad de Murcia

Given a number of levels, compute how many paths there are to get to any of the leaves. The levels range from 1 to 30.

The solution

Notice two things about this graph. First, it has four-way symmetry around the central node, so we can just compute the paths of a quarter of the graph and then multiply by 4, we can get the answer.

Radial symmetry in spray graphs

Radial symmetry in spray graphs

It is also very much like a grid, each node’s paths come only from the nodes to the left and up.

We can then build a grid with the number of paths for each node. The grid’s top row will be counting $1, 2, 3, …$ since the nodes in the top row of the separated portions in the previous picture have that number of paths. The first column will be all $1$s, since there is only one straight path down there. Then, we can compute each node’s number of paths as the paths coming from the left plus the paths coming from the top.

Here’s the grid for a size 5 graph:

1 2 3 4
1 $2+1=3$ $3+3=6$
1 $3+1=4$

Finally, just add up the leaves (the last diagonal), and multiply by 4. Bingo!


Simple way of proving this series converges?

Posted in Short

Doing my university homework, I came across the following.

Prove that $\sum_{i=0}^n r^i = 1 + r + r^2 + … + r^n$ is $O(1)$ if $0 < r < 1$.

With that range of $r$, $r^a > r^b$ iff $a < b$. Since $0 < 1 < … < n$, $1 > r > r^2 > … > r^n$. Thus, $\sum_{i=0}^n r^i = O(1)$

Equivalently, as $n$ approaches infinite, the series converges to a finite value.

Is this proof even correct? Stay tuned to find out!


$\sum_{i=0}^n r^i = O(1)$ is not implied by $1 > r > … > r^n$. The correct proof is the following.

This is the geometric series, so its value (only for $r \neq 1$) is $\frac{1-r^{n+1}}{1 - r}$.

If $0 < r < 1$, $r^{n+1}$ approaches $0$ as $n$ tends to infinite, and for no value of $n$ such that $n \geq 0$ this is greater than $r$. Thus, the expression is something with a constant upper bound divided by a constant, which has a constant upper bound.

Since there exists a constant $c$ for which $c \geq \frac{1-r^{n+1}}{1 - r}$ for $n \geq 0$, $\sum_{i=0}^n r^i = O(1)$ . Thus it has been proven.


How to multiply polynomials in Θ(n log n) time

Posted in Algorithms

The information in this post is taken from the sources listed in the Sources section at the end of it. If you don’t care about understanding the algorithm and need an implementation now, skip to the Implementation subsection.


Why would anyone want to multiply polynomials in $\Theta (n \log n)$ instead of the naive $\Theta(n^2)$ time? Sometimes happens. I did when I got confronted with the UVa Online Judge 12879 Golf Bot problem. I knew I needed a discrete convolution, but then I got stuck because $200 000^2$ is way too high and the only way to calculate it I knew was quadratic.

If you mean “Why would you write this post?”, well, talking about an algorithm I recently learned to solve a problem is proving to be an easy way to kick-start my blog.

Gathering building blocks

So how do we do it? We are going to have to talk a bit about the representation of polynomials in computer memory first. (Haha get it?)

Coefficient representation

As far as I know, there are two useful ways of doing that. The first, and most obvious, is to store the coefficients of every term in an array, vector, list, you name it. Let $A(x)$ be a polynomial of degree-bound $n$:

$$A(x) = a_0 + a_1x + a_2x^2 + ... + a_{n-1}x^{n-1}$$

The “degree-bound $n$” is any $n$ strictly greater than the degree of the polynomial. We will always be talking about the smallest $n$ that fulfills this condition, that is, the degree plus one. The polynomial’s coefficients are $a_0, a_1, …, a_{n-1}$, and that’s what we store. For example, the polynomial:

$$3 + 2x^2 + 5x^3$$

Has the representation $3, 0, 2, 5$. Notice there is no “$x = x^1$” term, so the coefficient in position 1 is 0.

There is a straightforward way to evaluate the polynomial’s value for any $x_0$ in $\Theta(n)$. We can elevate $x_0$ to the required value for each coefficient, multiply for every coefficient and add all that up. I like Horner’s rule though:

$$A(x_0) = a_0 + x_0(a_1 + x_0(... + x_0(a_n)))$$

Multiplying two polynomials of degree-bound $n$ and $m$ takes $\Theta(nm)$ time. One has to multiply every coefficient of one with every coefficient of the other. The resulting polynomial will be of degree-bound $m + n - 1$. You’ve probably done this by hand countless times when trying to solve or simplify equations.

Point-value representation

To represent a polynomial $A(x)$ of degree-bound $n$, we store $n-1$ point-value pairs:

$$(x_0, A(x_0)), (x_1, A(x_1)), \; …, \; (x_n, A(x_n))$$

The algorithm to multiply polynomials in point-value representation is as follows. Let $A(x)$ and $B(x)$ be polynomials of degree-bound $n$ and $m$ respectively. They are both of degree-bound $n+m-1$ too. (Proof: $n+m-1 \geq n$ and $n+m-1 \geq m$ if $n, m \geq 1$, which is the case with degree-bounds). Represent $A(x)$ and $B(x)$ as vectors of $n+m-1$ point-value pairs, with the same points. Then multiply the values of each corresponding point-value pair.

Intuitively, because the value at each point of the result is the multiplication of the values at each point of the two polynomials we are multiplying, that is exactly what we do!

Let us do an example. We multiply:

$$A(x) = 3x^2 + x$$
$$B(x) = -5x^3 + 2x^2 - x + 4$$

Their minimum degree-bounds are, respectively, $3$ and $4$. Thus, we need $3+4-1=6$ point-value pairs for each:

$x_0$ $A(x_0)$ $B(x_0)$ $C(x_0) = A(x_0) \cdot B(x_0)$
0 $A(0) = 0$ $B(0) = 4$ $C(0) = 0$
1 $A(1) = 4$ $B(1) = 3$ $C(1) = 12$
2 $A(2) = 14$ $B(2) = -6$ $C(2) = -84$
3 $A(3) = 30$ $B(3) = -35$ $C(3) = -1050$
4 $A(4) = 52$ $B(4) = -96$ $C(4) = -4992$
5 $A(5) = 80$ $B(5) = -201$ $C(5) = -16080$

$C(x)$ on the right is the result of $A(x) \cdot B(x)$, and its point-value representation is $(0, 0), (1, 12), (2, -84), (3, -1050), (4, -4992), (5, -16080)$.

“That’s it!”, you may think. “Problem solved, and it’s $\Theta(n)$!“. Not so fast.

To convert from coefficient to point-value representation we need to evaluate $A(x)$ at $n$ points. The time needed to do that with what we know, Horner’s rule, is of course $\Theta(n^2)$.

Cutting time with FFT (Fast Fourier Transform)

FFT is a very important algorithm in signal analysis. It computes the DFT (Discrete Fourier Transform) of a signal in $\Theta(n \log n)$ time. The DFT maps the time domain to the frequency domain, that is; from a sequence of samples it extracts which frequencies are present and in which intensity. It is used, for example, to measure the pitch of an sound input, in applications such as Learn by Music Technology Group or one of those singing games.

But we are more interested in the mathematical definition of the DFT.

$$X_k = \sum_{n=0}^{N-1} x_n \cdot e^{-j2\pi k n / N}$$

Let’s break this definition down and rearrange it to understand why it is relevant to us. We also change the variables $A_k = X_k$, $a = x$, $i = n$, to reflect the correspondences with the naming conventions we used for polynomials.

$$A_k = \sum_{i=0}^{N-1} a_i \cdot \left( e^{-j2\pi k/N} \right)^i$$

For each $k$, the DFT is the sum of all the input coefficients $a_i$ multiplied by $\left(x_k\right)^i = \left( e^{-j2\pi k/N} \right)^i$. By doing this, we are effectively evaluating the polynomial expressed by the coefficients $a_i$ in all the points $x_k$.

We can calculate the coefficient representation from a point-value representation with points $\left(x_k\right)^i = \left( e^{-j2\pi k/N} \right)^i$ by using the inverse DFT. Turns out this is quite the simple reversal:

$$a_i = \frac{1}{N} \sum_{k=0}^{N-1} A_k \cdot \left( e^{j2\pi k/N} \right)^i$$

You can look up the proof elsewhere.

The Algorithm

The building blocks of our algorithm are complete.

Input: $A_v$ and $B_v$, coefficient representation of polynomials $A(x)$ and $B(x)$ respectively.

Output: $C_v$, coefficient representation of polynomial $C(x) = A(x) \cdot B(x)$.

  1. Let $\left| A_v \right|$ be the length of $A_v$, that is, the lowest degree-bound of $A(x)$. Let $\left| B_v \right|$ be the length of $B_v$. Let $\left| C_v \right| = \left| A_v \right| + \left| B_v \right| - 1$ be the lowest degree-bound of the result $C_v$.

  2. Let $n = \min_i 2^i$ subject to $2^i \geq \left| C_v \right|$. That is, the smallest power of two that is greater or equal than $\left| C_v \right|$. To compute it: iterate from $i=0$, increasing by $1$ every time, until $2^i \geq \left| C_v \right|$.

  3. Let $A_v$ be the vector representing $A(x)$ padded with zeroes so that its total length is $n$. Let $B_v$ be the vector representing $B(x)$ padded with zeroes so that its total length is $n$.

  4. Let $A_v’ = FFT(A_v)$ and $B_v’ = FFT(B_v)$.

  5. Let $C_v’ = A_v’ \cdot B_v’$, the dot product of $A_v’$ and $B_v’$. That is, multiplying every element of $A_v’$ with the element in the same position $B_v’$, and assigning that to the same position in $C_v’$.

  6. Let $C_v = FFT^{-1}(C_v’)$. $C_v$ is $C(x)$ in coefficient representation. Terminate.


We show, as formally as I can, why this algorithm is $\Theta(n \log n)$. You may skip this section if it’s obvious to you, or try and figuring out yourself before reading it.

  1. is $\Theta(1)$ if we know the length of the inputs. It may be up to $O(n)$ if we have to measure them, for example by iterating until we find a sentinel at the end.

  2. is $\Theta(\log n)$. Proof: $n = 2^i$, $i = \log n$.

  3. is $\Theta(n)$. We have to add $\left( n - \left| A_v \right| \right) + \left( n - \left| B_v \right| \right)$, which a linear function of $n$.

  4. is $\Theta (n \log n)$, because the FFT is too.

  5. is $\Theta(n)$, because we iterate once for each elemnt of $C_v’$.

  6. is $\Theta (n \log n)$. The inverse FFT is like a regular FFT, with then dividing every element by $N$. Thus, it’s $\Theta(n \log n + n) = \Theta(n \log n)$

Thus, the whole algorithm is:

$$\Theta(n + \log n + n + n \log n + n + n \log n) = \Theta(n \log n)$$

Isn’t it easy to simplify expressions in asymptotic notation?


If you are writing an application that needs an FFT of any form, I suggest using a library. Numerous smart people have spent their time making said library bug-free and performant, and you can greatly profit from their effort. I only know of one FFT library, the Fastest Fourier Transform in the West, but there are probably others that are just as fast, or bindings for this one to other languages.

However, if you must embed the routine in your program, for instance in a programming contest, you have to “implement it yourself”. The algorithm is not very complicated, but time spent debugging it is time wasted. I suggest copying Stanford University ACM Team’s Notebook. I endorse copying many things from there, actually. At the very least, their implementation is clear enough that you can base yours off of it.

Assuming we have included Stanford’s implementation earlier in the C++ file:

// Assume we have at most 2^18 = 262144 coefficients
const int MAX_LEN = 262144 * 2;

// For those of you who don't want to read the Stanford source:
// "cpx" represents a complex number.
// FFT(input, output, internal=1, size of input, direction)
// Direction is 1 for FFT, -1 for inverse FFT

int A_len, B_len, C_len;

/* set the appropriate coefficients in the inputs A and B's real-valued part,
 * and their length in A_len and B_len.

// find the immediately greatest power of two
for(C_len = 1; !(C_len > A_len + B_len - 1); C_len *= 2);

// This is assumed in the 12879 Golf Bot problem
assert(C_len < MAX_LEN); 

// pad A and B with zeroes
memset(A + A_len, 0, (C_len - A_len) * sizeof(cpx));
memset(B + B_len, 0, (C_len - B_len) * sizeof(cpx));

// fill C with A's point-value
FFT(A, C, 1, C_len, 1);

// fill A with B's point-value
FFT(B, A, 1, C_len, 1);

// Multiply the polynomials in point-value
for(int i=0; i<C_len; i++)
    A[i] = A[i] * C[i];

// Convert back to coefficient representation
FFT(C, A, 1, C_len, -1);
for(int i=0; i<C_len; i++)
    C[i].a /= C_len;

// now C[i].a (the real-valued parts) contain the result

Solving 12879 Golf Bot

To solve 12879 Golf Bot, what we need to do is: have a vector that is 1 at the distances which the golf bot can shoot and 0 at the distances which it cannot. We need to add that vector to a vector of zeroes itself starting from every 1 it itself has. This is the operation called discrete convolution, and it’s the equivalent of multiplying two polynomials in coefficient form. So you do that, but you don’t need to divide by C_len at the end, because we only need to know if a given position is $>0$.

It’s a very quick explanation, but the problem isn’t the focus of this post :)

Final thoughts

Hope this post was enjoyable, easy to read, and useful.

Don’t use this in production over the simple naive quadratic algorithm if you do not handle big enough inputs. How big is big enough? Measure.


This pdf from Portland State University’s Algorithm Design and Analysis class. Contains more information about operations, properties and algorithms related to polynomials.

Wikipedia’s definition of the DFT and inverse DFT.

A Stack Exchange answer with an example of this algorithm. Helped me understand this a lot.