You have to run your program on a remote server. However your favourite editor
with your favourite configuration isn’t available over SSH. Or its over-SSH mode
is very slow (looking at you, TRAMP).
Also, you’d like to work online sometimes.

Unison is like Dropbox but without an intermediary server.

Setting it up

First, you need to have two^{1} directories, one in your local machine and one in
your remote machine, that will be synchronised. For example,
/local/path/to/my/software in your local machine and /remote/path/software
in your remote server. You must have SSH access to the server, which is for
example at user@host.

Then you download and
install Unison.
Preferrably use your system’s package. You need two executables: unison and
unison-fsmonitor. The second one is not always available with the unison
package: it is not there, for example on Ubuntu or OS X. To install it:

Ubuntu: install the ocaml package to compile
Unison
from source. Then
just run make. It might give an error, but with a little bit of luck you
will have compiled the unison-fsmonitor executable before that. Copy it to
your $PATH.

This will synchronize all files ending in .py, .txt, .md… or having no
“.” in their filename, between the two directories mentioned at the beginning.
Read more about how the ignore and ignorenot rules
work
here and
here.

Let’s say we believe
consequentialism and utilitarianism.
Roughly, we hold that the morality of an action depends only on its
consequences, and the consequences we care about are only changes in the amount
of happiness in the world. We wish to increase the amount of happiness, as much
as possible. We are rational, so every time we act we will choose the action
that we expect to lead to the outcome with maximum happiness. (This is
essentially what effective altruists try
to do.)

Now, the world we live in has many human (and maybe nonhuman) individuals with moral
weight. To make our life easier, let’s say we can quantify the happiness (also
called welfare or utility) of an individual. That utility is a real number,
positive if the individual’s life is worth living, zero if it is neutral, and
negative if the individual is better off not existing. Let’s also assume we
know, for each action, the happiness outcome of each individual.^{1}

If none of the actions change identity or number of the agents that exist,
choosing is easy: the best action is that which maximises the sum^{2} of
happiness.

Population axiologies

However, what if the actions can also change the number of individuals in the
population? That happens in many cases: deciding whether to have a child
personally, designing the structure of money incentives and tax changes to
encourage/discourage children, giving away or selling contraception… For these
cases, we need to be able to compare populations with a different number of
individuals.

Thus we get to the problem of population axiology: give an “at least good as”
complete, transitive ordering of all possible populations, solely based on
people’s utility.

Borrowing notation from
Arrhenius (2000).
Let $A$, $B$, $C$, $B \cup
C$, … be populations, or sets of individuals. We may indicate the population
size: $A_n$ is a population of $n$ individuals: $a_1, a_2, \dots, a_n$. The
numerical representation of $a_i$’s welfare is given by the function $v(a_i)$.
Since it is transitive, we can safely represent the ordering with a “$\geq$”
symbol. The problem can then be written: for any two populations $A$ and $B$, is
$A_n \geq B_m$ or $A_n \leq B_m$ ?

Let’s take care of the case where $n \neq m$. We could, again, just care about
the sum of utility, so $A \geq B$ if and only if $\sum_{a_i \in A} v(a_i)
\geq \sum_{b_i \in B} v(b_j)$. But this opens us up to the
Repugnant Conclusion (Parfit, 1984)
(video explanation by Julia Galef):
a population of $10^{100}$ people with welfare 1 each would be much better than a
population of 10 billion with welfare $10^{80}$ each.

And this, to most people, is unpalatable. However, some philosophers think we
should accept the Repugnant Conclusion. Tännsjö (2002) argues
that when we picture a “life barely worth living” we picture precarious hygienic
conditions, lack of food, and depending on the weather for your food crop. But
actually it is our first-world materially comfortable lives that are barely
worth living. Also, we do not care about the obvious goodness of the enormous
increase in population because of our
scope insensitivity bias.

Still, the Repugnant Conclusion created the problem of population axiology and
led to the following findings.

Impossibility theorems

Many alternatives to summing utility have been proposed: counting average
utility. Having each additional life have less weight, thus having any sum of
bounded utilities converge. Having a “critical level” which marks a minimum for
well-being that is above a life barely worth livin. See introduction by
Greaves (2016).

Arrhenius showed, in
2011, the impossibility of a satisfactory population ethics.
After his
already discouraging results in
2000,
his findings challenged population ethics even as moral framework. Both papers present
really compelling conditions that a population axiology ought to satisfy,
represent them formally, and mathematically prove them impossible to hold in the
same axiology. I will describe the conditions of the 2011 paper:

The ordering is complete (no two populations are incomparable) and
transitive ($A \geq B$ and $B \geq C$ implies $A \geq C$).

Given two populations of the same size and with all the individuals having
the same welfare, the one with higher sum^{3} of welfare is not worse.

Let $A_n$ be a population of $n-1$ people with very high welfare and one
person with welfare $w$. Let $B_n$ be a population of $n-1$ people with
very low positive welfare and one person with welfare $w+1$. Then, for any
population $X$, there exists an $n$ sufficiently large such that $A \cup X
\geq B \cup X$. Roughly, if you change enough people from low positive to
really high welfare in a population, that compensates for one individual
with a slightly lower welfare.

Let $B_n$ be a population of $n$ individuals with welfare $w$. Let $A_n$
be a population of an individual with welfare $w+1$, and $n-1$ individuals
with welfare $c$, $c < w$. Given a population $X$ with the welfare of
everyone being between $c$ and $w$, there exists a sufficiently large $n$
such that $B_n \cup X \geq A_n \cup X$. Roughly, making a lot of people
better off at the expense of very little of one person’s welfare, such that
these people mentioned are all equally happy, does not make things worse.

There is at least one negative welfare level $w<0$, such that adding any
number of people with positive welfare $v>0$ is not worse than adding a person
with welfare $w$.

For any population $X$, there exists a perfectly equal very-high-welfare
population $A_n$, and very-negative-welfare population $B_m$, such that $X
\cup A_n \geq X \cup B_m \cup C_l$, for all, arbitrarily large, values of
$l$. Population $C$ is composed of very low positive lives. This means
avoidance of a Repugnant Conclusion on steroids: not only is there a
vast number of barely-worth-living lives, but there also exist many lives
that are much worse than not existing.

Reiterating: it is impossible to have all these properties in the same
population axiology.

Side-stepping the issue: dropping the “axiology” requirement

Notice that the desirable properties of the “population axiology” can be
framed as transitions. This is a very similar form to (one of) the intended
use(s) for them, namely, deciding which is the most moral of any two actions
with known^{4} outcomes. Let’s take that strong hint, and re-frame the choice
not as deciding between any two populations, but between any two populations
starting from the status quo.

I propose ascribing moral weight only to individuals who currently exist.
When taking a decision, choose the outcome that maximises the sum^{5} of the
utility of the people who are alive at that moment. This way of looking at
population ethics is unfortunately not an axiology. An axiology would allow us
to output a single number measuring the utility of the whole population, and
using that as a drop-in replacement for single-agent utility in all the game
theory (and game practice, also called Artificial Intelligence, AI) that we
know. Instead, our criterion is only useful when an agent is taking a decision
based on a model of the world and on future outcomes of their current action.

We can illustrate this more easily within the somewhat more restricted framework
of AI as online search
(Russell and Norvig, 2009, Section 4.5). In the following figure,
we show a decision tree. At each node we face a decision, with as many branches
as possible courses of action we can take. We are currently in the bottom-most
node, the root, at time instant $t=0$. At each time step, the individuals in our
population have a positive or negative experience. At $t=3$ we have the expected
return^{6}^{7} for each chain of actions. We depict surrounded in green the utility
of the individuals who are alive when taking the decision, and in purple the
utility of those who start existing later. The dotted green-purple line marks
the moment where, in the possible future of each path, at least one new
individual exists.

The decision process can be viewed as an adversarial search
(Russell and Norvig, 2009, Section 5) process. In this game we
have two players, represented as blue and green edges in the picture. Blue plays
in states where at least one new individual has been born, that is, above the
dotted line. Green plays below. Green plays at the root node, which means the
agent that is deciding is Green. Green cares about maximising the return of the
people that exist in $t=0$, that is, the return we depict surrounded in green. Blue
wants to maximise the utility of all the people, that is the sum of the numbers
surrounded in green and purple. The edge that is highlighted in the relevant
player’s colour is the edge the player would choose if he were to play from that
edge’s parent node. Notice that, in the right-most outcome, Blue never plays,
since nobody is born.

In our example, we choose the bubble with (13, -4), because
it is the one with the highest Green-utility out of the options Blue leaves
available.

This system behaves as if we decided to increase the utility of the existing
people now, knowing that, when new people are born, we will care about them
too. It resembles our current society’s treatment of babies: you can use
contraception to prevent a baby’s existence all you want, but once you have a
baby, if you mistreat it or kill it you shall face severe consequences. Barring
contraception accidents, children exist because their parents thought they would
be happier raising them.^{8}

Does this avoid the Repugnant Conclusion?

Yes. We start from a population of $n$ individuals with a level of well-being.
The Repugnant Conclusion entails sacrificing these individuals utility to create
many more individuals with low positive welfare, so at some point we have to
allow a birth. If we do not have a choice over this it is useless to apply
any moral theory. Thus we can decide, at some point in the middle, to bring into
existence an individual, or not to. We know that, if we bring into existence
that individual, we will be morally forced to decrease the others’ well-being
for it. Thus, we decide for the good of the existing beings, and do not bring
into existence the individual.^{9}

This can fail when the decider lacks sufficient foresight, and incorrectly
predicts that the new individual will not decrease the others’ well-being. But
that is not a problem of our decision criterion, rather, of our prediction
mechanism. Consider the following situation of an agent maximising its utility:

The agent chooses the right-most node, because it thinks it will lead to 14
return. But, one time-step later than the agent has looked, it gets very low
reward, thus very low return. This should be alleviated in some way, but that is
not in the scope of this post.

So when the prediction mechanism works as intended, and the world does not take
such drastic turns, the decision criterion avoids the Repugnant Conclusion.

Does this fail in any other way?

Yes. The criterion as-is needs at least one amendment. Currently, an agent
deciding by this criterion will not hesitate to create arbitrarily many lives
with negative utility, to increase the utility of the people who are alive just
a little. One may think the commitment to a life when it exists would fix that,
but that is not the case. If Green can restrict its decisions in some way, it
can take all the necessary decisions before anyone is born, and thus completely
prevent Blue from playing. In a real world example, Green becomes president,
passes laws mandating the construction of inhuman but cheap factory farms, laws
preventing anyone from messing with those farms, and then removes itself out of
office before any farm is in operation, for example by committing a crime. Thus
Green will become very sad when the farms start working, but its own purpose at
the moment of decision will be fulfilled, namely, maximum utility for the beings
who were alive back then.

We could forbid the creation of any lives with negative welfare entirely. But
that would outlaw really positive scenarios, such as one person working a shitty
job and having nobody to talk to, in order for $10^{100}$ people to live lives
with $10^{100}$ welfare. No, instead, we need to forbid bad trade-offs.^{10}

The solution is: when a purple life has negative welfare, it gets counted into
the green utility. Thus, future net negative lives have the same weight as
present net negative lives. This reflects the Asymmetry Intuition: we may be
indifferent about making happy people, while we prefer making people happy, not
making miserable people, and not making people miserable.
(Benatar, 2008)

In practice, if we end up having an AI that needs to make this sort of
decisions, I suspect this will lean towards just not creating any lives, and
creating lives with negative welfare should be rare (what for? Humans are slow
and inefficient at working, and they decrease utility when unhappy). If it’s an
economist or politician, they can use this rule, but they wouldn’t do anything
truly absurd; they are human.

Closing thoughts

Based on insights from AI and a few moral intuitions, I have created a decision
criterion for a changing population, when considering possible futures. The
criterion avoids the Repugnant Conclusion, behaves like a simple sum of
utilities when population doesn’t change, mirrors our current accepted morals
for treatment of nonexistent individuals, and doesn’t seem to have any glaring
holes.

Of course, such holes may be found, if you do find one please comment,
email me, or otherwise
let me know.

References

(I recommend the
Google Scholar Button
browser add-on for comfortable fetching of papers, but I’ve included links for convenience here.)

[Blog] Armstrong, S. Integral versus differential ethics.

[PDF]
Arrhenius, G., 2000, An Impossibility Theorem for Welfarist Axiologies.
Economics and Philosophy, 16: 247–266.

[PDF]
Arrhenius, G., 2011. The impossibility of a satisfactory population ethics.
Descriptive and normative approaches to human behavior.

[Book]
Benatar, D., 2008. Better never to have been: The harm of coming into
existence. Page 32. Oxford University Press. Chicago

[Book] Parfit, D., 1984.
Reasons and persons. OUP Oxford.

[Book] Russell, S. and Norvig P., 2009. Artificial Intelligence: A Modern
Approach. 3rd. Prentice Hall Press, Upper Saddle River, NJ, USA. ISBN:
0136042597, 9780136042594.

[PDF]
Tännsjö, T., 2002. Why we ought to accept the repugnant conclusion. Utilitas,
14(3), pp.339-359.

I sometimes write “people” in this essay because it reads natural, but you may substitute for “individuals with moral weight” every time. Also, since we quantify utility as numeric, a moral weight can be really a numeric weight, to multiply that individual’s utility by.
^{[return]}

Or average, if you prefer, it makes no difference. Quick proof: Let $A$ be the set of possible actions, $U(i, a) : I \times A \mapsto \mathbb{R}$ is the function giving the utility, a real number, of individual $i$ when taking action $a$. Given the set of individuals $I$ and actions $A$, clearly $$ \forall a, a’ \in A \, \sum_i U(i, a) \geq \sum_i U(i, a’) \iff \frac{\sum_i U(i, a)}{|I|} \geq \frac{\sum_i U(i, a’)}{|I|} $$ since $|I|=|I|$.
^{[return]}

If you believe in expected value utility, you can extend that to population outcomes with probabilities. While I don’t know of any alternative to it, expected value utility has some problems, namely Pascal’s Mugging.
^{[return]}

See footnotes 2 and 3. Aren’t populations with the same number of individuals wonderful?
^{[return]}

This term is taken from the artificial intelligence literature, specifically from sequential decision processes, which is what are facing here. The idea is that, at each time step $i$, the agent receives a numeric reward $r_i$. The agent also has a discount rate $\gamma$, $0 \leq \gamma \leq 1$, which models how much the agent cares about receiving rewards sooner.^{11} The return at time $t$, $R_t$, is the sum of all discounted future rewards until then, that is: $$ R_t = r_0 + \gamma r_1 + \dots + \gamma^t r_t = \sum_{i=0}^t \gamma^i r_i $$ The path of actions that has the highest return at time $t$, when the agent’s decisions end ($t$ can be infinite too, if the decisions never end), is the path that the rational agent prefers.
^{[return]}

Why are we using returns and not utility over the whole lifetime now? We have gone down into the realm of making decisions while the population is changing, while individuals experience pain or pleasure and are born and die. It makes much more sense to think about populations and the pain or pleasure experienced during a time interval of each individual, rather than during their whole lifetime. This is the notion of returns, and thus our use of them, and why we will use the term somewhat interchangeably with “utility” around this passage.
^{[return]}

Be it because they enjoy the raising, because their child being happy makes them happy, or to make their own parents happy. But it all reflects back to the parents’ own projected well-being.
^{[return]}

This idea very much resembles that of Integral Ethics (Armstrong, 2014), which was inspiration for this post’s approach.
^{[return]}

In economic terms, $\gamma$ is the time preference of the agent. Note that the terms “high” and “low” are reversed for time preferences and discount rates. A high $\gamma$ means a low time preference, and vice versa.
^{[return]}

Attention! For those of you who do not understand Catalan, There is an English
translation below!

La meva germana i jo vam començar a fer la primera d’aquestes frases. No em
semblen especialment difícils de dir, però això potser és perquè els he escrit
jo. Vaig gaudir del procés i el resultat, i per això vaig escriure les altres
tres. Aquí van! Prova de dir-les!

La mangosta Agustina i la llagosta Gustau, vaguistes, gasten en gustos a
l’Agost.

Manca de xancles, xanques blanques i tanques de branques fan que tanquin
encara la banca.

La rara barana que el pare i la mare preparen fa cara la vara on para la Sara
i s’encara al sarau i gatzara.

Fa poc que moc el soc del lloc al foc i toco un roc. Jo sóc un cóc de mocs que
xoca al boc.

Estan creades de forma que tenen una seqüència concreta de sons repetida moltes
vegades. Excepte la primera, també alternen síl·labes tòniques i àtones, per
donar ritme. Noteu també la diferència entre “o”s obertes [IPA ɔ] i tancades [o] en
l’última.

Traduir aquesta entrada m’ha fet adonar-me que, sí, les llengües romàniques i
l’anglès tenen gairebé la mateixa gramàtica.
Que fort.

In English

Title: Tongue Twisters

My sister and I started composing the first of these sentences. They don’t seem
hard to say to me, but that may be because I wrote them. I enjoyed the process
and results, so I wrote the other three. Here they go! Try to read them out
loud!

La mangosta Agustina i la llagosta Gustau, vaguistes, gasten en gustos a
l’Agost. (Augustine (female) the mongoose and Gustaf the lobster, strikers,
spend on pleasures during August.)

Manca de xancles, xanques blanques i tanques de branques fan que tanquin
encara la banca. (Lack of flipflops, white stilts and fences made of
branches make the bank still be closing.)

La rara barana que el pare i la mare preparen fa cara la vara on para la Sara
i s’encara al sarau i gatzara. (The rare railing that father and mother
prepare makes expensive the rod where Sara stops and turns to face the party and
rejoicing.)

Fa poc que moc el soc del lloc al foc i toco un roc. Jo sóc un cóc de mocs que
xoca al boc. (For a short time, I have been moving the stump from the place
to the fire and touching a rock. I am a snot pie that shocks the buck.)

They are created to have a certain sequence of sounds repeated many times.
Except for the first one, they also alternate stressed and unstressed syllables,
to give rhythm. Note also the difference between open [IPA ɔ] and closed [o] “o”s in
the last one.

Translating this post made me realize that yes, latin-derived languages and
English have almost the same grammar. Damn.

OS X has a nice screenshot feature. It’s fast and easy to use: just press
Cmd-Shift-4 and select your shot, which will appear in the desktop. By
pressing space, you can switch to window shot.

The problems come when you want to share your screenshot on, say, IRC. Direct
transfer (DCC) is usually too much hassle for that, and some clients do not
support it. What you usually do is upload the picture to the web, and link it.

You could just manually upload the file the screenshot feature produces. But I’m
spoiled by the use of Nyazo on Linux, and I wanted better.

So in a few minutes you can tweak Nyazo’s Linux script and end up with this:

Save it as something like /usr/local/bin/nyazo and make it executable (chmod
+x <file>). Then open Automator and create a new Service. Add the “Run Shell
Script” action to it, write /usr/local/bin/nyazo -i to it, and save it.

Click the “Run” button at the top right of the window to test the service works
as intended. A coordinates selecting cursor should appear. If you select a
region, it should be uploaded to Nyazo.

Now open “Keyboard” in “System Preferences”. Go to the “Shortcuts” tab, and
select “Services” on the left. At the end of everything, you can bind the
interactive Nyazo Service you just created. I have it bound to
Option-Cmd-Shift-4.

Of course, you can also bind the fullscreen screenshot Nyazo to keyboard. Just
repeat the same steps, removing the “-i” option from the service, and binding to
something else.

This is a writeup of the problems my team solved in the Murcia contest, which we
were participating in as preparation for SWERC. Those people are way better than
us, but we will not give up!

The problems are given the UVa numbers that will
soon appear.

12962 Average Reuse Distance

At first read this problem seems easier than it really is. But it’s not a hard
problem, it’s actually a fairly straightforward simulation.

The Problem

Given a sequence of memory accesses to different addresses (represented by the
characters 'a' to 'z'), give its Average Reuse Distance. The Reuse Distance
of a memory access to address $x$ is the number of other addresses accessed
since the last access of $x$. First accesses to any given position do not count
towards the average.

The solution

The core of this problem was keeping track of the different memory accesses
since the last access of each address, and then counting them in an efficient
way. We solved this by storing which addresses had been accessed as a
bitmask. You can very easily reset the bitmask by setting it to 0, and count the
number of ones in the bitmask with GCC and Clang’s __builtin_popcount.

The algorithm is the following:

Total sum = 0

For each character $c$ of input read:

If $c$ hasn’t been encountered before, mark it as encountered.

Otherwise, add the number of ones in the bitmask to the total sum and
add 1 to the number to divide the sum by.

For all characters a-z, set their mask’s th bit to 1.

Reset $c$’s bitmask to 0

In the end, you output the mean from the sum you gathered, or NaN if the divider is 0.

We technically did not solve this problem within the contest, but came
very close. The night after the contest, when sitting down to look at
the problems again, we found what was wrong in 5 minutes.

The problem

You are given a grid with characters on it. You have to find
the horizontal instances of the word “GARBANCILLOS” and the vertical
instances of the word “ASTRAGALUS”. Whenever they intersect is a
possible location, but each word instance can only be used
once. Determine the number of locations.

The grid is at most 150x150.

The solution

You need to do two things here. First, find the strings’ locations and
where they intersect. Then, compute the maximum matching of vertical
and horizontal strings.

To find the strings’ location in $\Theta(n^2)$, we used a rolling
hash. The longest of the strings is 12 characters, $12 \cdot 5 = 60 <
64$, so a hash with 5 bits per character can be used without
collisions. We discard all the characters outside of the range from
'A' to 'Z', and give them the mask 0x1f.

First we look at the character matrix horizontally, and store the
position of the found “GARBANCILLOS” in a sorted vector, one for each
row. We also store its future index in the graph of intersecting
strings in an std::map. Then we look at the character matrix vertically,
and when we find a match we binary search for the possible horizontal
matches in each row. If one is find, the edge is added to the
maximum-matching matrix.

Finally, we output the maximum matching of the matrix we built.

You have some product’s producers, and some consumers. Each of them
has an ideal price. The consumers will get angry if the price is
higher than their ideal price. The producers will get angry if the
price is lower than their ideal price. Find the price such that the
number of angry people is minimal, and how many angry people there
are.

The prices range from $0$ to $10^9$. There are up to $10^4$ producers
and $10^4$ consumers.

The solution

It’s amazing how many problems are solved this way. If we check every
possible price in order (unfeasible), we will have more consumers
progressively getting angry, and more producers progressively getting
happy.

Thus, we assign producers the number 1, and consumers the number
-1. We create pairs of <price, assigned number> for each of them,
and sort those pairs. We increase the price of the consumers in the
pairs by 1 because that is when the consumer will start being
unhappy. We sort this sequence and then iterate over it, keeping a sum
of all the assigned numbers we have found. The point where this sum is
maximum is the optimal point.

However, the name of the problem in the contest is “Spray Graphs”. And the
page’s title is
“Spary graphs”. Haha.

The problem

You are given the graph structure in the picture. In this picture,
each “level” of the graph has an assigned letter. We can build more levels by
branching every node in right angles as shown in the figure.

Given a number of levels, compute how many paths there are to get to any of the
leaves. The levels range from 1 to 30.

The solution

Notice two things about this graph. First, it has four-way symmetry around the
central node, so we can just compute the paths of a quarter of the graph and
then multiply by 4, we can get the answer.

It is also very much like a grid, each node’s paths come only from the nodes to
the left and up.

We can then build a grid with the number of paths for each node.
The grid’s top row will be counting $1, 2, 3, …$ since the nodes in the top row of
the separated portions in the previous picture have that number of paths. The
first column will be all $1$s, since there is only one straight path down
there. Then, we can compute each node’s number of paths as the paths coming from
the left plus the paths coming from the top.

Here’s the grid for a size 5 graph:

1

2

3

4

1

$2+1=3$

$3+3=6$

1

$3+1=4$

1

Finally, just add up the leaves (the last diagonal), and multiply by 4. Bingo!

Doing my university homework, I came across the following.

Prove that $\sum_{i=0}^n r^i = 1 + r + r^2 + … + r^n$ is $O(1)$ if
$0 < r < 1$.

With that range of $r$, $r^a > r^b$
iff $a < b$. Since
$0 < 1 < … < n$, $1 > r > r^2 > … > r^n$. Thus,
$\sum_{i=0}^n r^i = O(1)$

Equivalently, as $n$ approaches infinite, the series converges to a
finite value.

Is this proof even correct? Stay tuned to find out!

Incorrect!

$\sum_{i=0}^n r^i = O(1)$ is not implied by $1 > r > … > r^n$. The
correct proof is the following.

This is the geometric series,
so its value (only for $r \neq 1$) is $\frac{1-r^{n+1}}{1 - r}$.

If $0 < r < 1$, $r^{n+1}$ approaches $0$ as $n$ tends to infinite, and for no
value of $n$ such that $n \geq 0$ this is greater than $r$. Thus, the expression
is something with a constant upper bound divided by a constant, which has a
constant upper bound.

Since there exists a constant $c$ for which $c \geq \frac{1-r^{n+1}}{1 - r}$ for
$n \geq 0$, $\sum_{i=0}^n r^i = O(1)$ . Thus it has been proven.

The information in this post is taken from the sources listed in the
Sources section at the end of it. If you don’t care
about understanding the algorithm and need an implementation now, skip to the
Implementation subsection.

Why?

Why would anyone want to multiply polynomials in $\Theta (n \log n)$ instead
of the naive $\Theta(n^2)$ time? Sometimes happens. I did when I got confronted
with the UVa Online Judge
12879 Golf Bot
problem. I knew I needed a discrete convolution, but then I got stuck because
$200 000^2$ is way too high and the only way to calculate it I knew was quadratic.

If you mean “Why would you write this post?”, well, talking about an algorithm I
recently learned to solve a problem is proving to be an easy way to kick-start
my blog.

Gathering building blocks

So how do we do it? We are going to have to talk a bit about the representation
of polynomials in computer memory first. (Haha get it?)

Coefficient representation

As far as I know, there are two useful ways of doing that. The first, and most
obvious, is to store the coefficients of every term in an array, vector, list,
you name it. Let $A(x)$ be a polynomial of degree-bound $n$:

The “degree-bound $n$” is any $n$ strictly greater than the degree of the
polynomial. We will always be talking about the smallest $n$ that fulfills this
condition, that is, the degree plus one. The polynomial’s coefficients
are $a_0, a_1, …, a_{n-1}$, and that’s what we store. For
example, the polynomial:

$$3 + 2x^2 + 5x^3$$

Has the representation $3, 0, 2, 5$. Notice there is no “$x = x^1$” term, so the
coefficient in position 1 is 0.

There is a straightforward way to evaluate the polynomial’s value for any $x_0$
in $\Theta(n)$. We can elevate $x_0$ to the required value for each
coefficient, multiply for every coefficient and add all that up. I like Horner’s
rule though:

$$A(x_0) = a_0 + x_0(a_1 + x_0(... + x_0(a_n)))$$

Multiplying two polynomials of degree-bound $n$ and $m$ takes $\Theta(nm)$
time. One has to multiply every coefficient of one with every coefficient of the
other. The resulting polynomial will be of degree-bound $m + n - 1$. You’ve
probably done this by hand countless times when trying to solve or simplify
equations.

Point-value representation

To represent a polynomial $A(x)$ of degree-bound $n$, we store $n-1$ point-value pairs:

The algorithm to multiply polynomials in point-value representation is as
follows. Let $A(x)$ and $B(x)$ be polynomials of degree-bound $n$ and $m$
respectively. They are both of degree-bound $n+m-1$ too. (Proof: $n+m-1 \geq n$
and $n+m-1 \geq m$ if $n, m \geq 1$, which is the case with
degree-bounds). Represent $A(x)$ and $B(x)$ as vectors of $n+m-1$ point-value
pairs, with the same points. Then multiply the values of each corresponding
point-value pair.

Intuitively, because the value at each point of the result is the
multiplication of the values at each point of the two polynomials we
are multiplying, that is exactly what we do!

Let us do an example. We multiply:

$$A(x) = 3x^2 + x$$

$$B(x) = -5x^3 + 2x^2 - x + 4$$

Their minimum degree-bounds are, respectively, $3$ and $4$. Thus, we need $3+4-1=6$
point-value pairs for each:

$x_0$

$A(x_0)$

$B(x_0)$

$C(x_0) = A(x_0) \cdot B(x_0)$

0

$A(0) = 0$

$B(0) = 4$

$C(0) = 0$

1

$A(1) = 4$

$B(1) = 3$

$C(1) = 12$

2

$A(2) = 14$

$B(2) = -6$

$C(2) = -84$

3

$A(3) = 30$

$B(3) = -35$

$C(3) = -1050$

4

$A(4) = 52$

$B(4) = -96$

$C(4) = -4992$

5

$A(5) = 80$

$B(5) = -201$

$C(5) = -16080$

$C(x)$ on the right is the result of $A(x) \cdot B(x)$, and its point-value
representation is
$(0, 0), (1, 12), (2, -84), (3, -1050), (4, -4992), (5, -16080)$.

“That’s it!”, you may think. “Problem solved, and it’s $\Theta(n)$!“. Not so fast.

To convert from coefficient to point-value representation we need to evaluate
$A(x)$ at $n$ points. The time needed to do that with what we know,
Horner’s rule, is of course $\Theta(n^2)$.

Cutting time with FFT (Fast Fourier Transform)

FFT is a very important algorithm in signal analysis. It computes the DFT
(Discrete Fourier Transform) of a signal in $\Theta(n \log n)$ time. The DFT
maps the time domain to the frequency domain, that is; from a sequence of
samples it extracts which frequencies are present and in which intensity. It is
used, for example, to measure the pitch of an sound input, in applications such as
Learn by Music Technology Group or one
of those singing games.

But we are more interested in the mathematical definition of the DFT.

$$X_k = \sum_{n=0}^{N-1} x_n \cdot e^{-j2\pi k n / N}$$

Let’s break this definition down and rearrange it to understand why it is
relevant to us. We also change the variables $A_k = X_k$, $a = x$, $i = n$, to
reflect the correspondences with the naming conventions we used for polynomials.

For each $k$, the DFT is the sum of all the input coefficients $a_i$ multiplied
by $\left(x_k\right)^i = \left( e^{-j2\pi k/N} \right)^i$. By doing this, we
are effectively evaluating the polynomial expressed by the coefficients $a_i$
in all the points $x_k$.

We can calculate the
coefficient representation from a point-value representation with points
$\left(x_k\right)^i = \left( e^{-j2\pi k/N} \right)^i$ by using the inverse
DFT. Turns out this is quite the simple reversal:

Let $\left| A_v \right|$ be the length of $A_v$, that is, the lowest
degree-bound of $A(x)$. Let $\left| B_v \right|$ be the length of
$B_v$. Let $\left| C_v \right| = \left| A_v \right| + \left| B_v
\right| - 1$ be the lowest degree-bound of the result $C_v$.

Let $n = \min_i 2^i$ subject to $2^i \geq \left| C_v \right|$. That is,
the smallest power of two that is greater or equal than $\left| C_v
\right|$. To compute it: iterate from $i=0$, increasing by $1$ every time, until
$2^i \geq \left| C_v \right|$.

Let $A_v$ be the vector representing $A(x)$ padded with zeroes so that its
total length is $n$. Let $B_v$ be the vector representing $B(x)$ padded with
zeroes so that its total length is $n$.

Let $A_v’ = FFT(A_v)$ and $B_v’ = FFT(B_v)$.

Let $C_v’ = A_v’ \cdot B_v’$, the dot product of $A_v’$ and $B_v’$. That
is, multiplying every element of $A_v’$ with the element in the same
position $B_v’$, and assigning that to the same position in $C_v’$.

Let $C_v = FFT^{-1}(C_v’)$. $C_v$ is $C(x)$ in coefficient
representation. Terminate.

Analysis

We show, as formally as I can, why this algorithm is $\Theta(n \log n)$. You may
skip this section if it’s obvious to you, or try and figuring out yourself
before reading it.

is $\Theta(1)$ if we know the length of the inputs. It may be up to
$O(n)$ if we have to measure them, for example by iterating until we
find a sentinel at the end.

is $\Theta(n)$. We have to add $\left( n - \left| A_v \right| \right) +
\left( n - \left| B_v \right| \right)$, which a linear function of $n$.

is $\Theta (n \log n)$, because the FFT is too.

is $\Theta(n)$, because we iterate once for each elemnt of $C_v’$.

is $\Theta (n \log n)$. The inverse FFT is like a regular FFT, with then
dividing every element by $N$. Thus, it’s $\Theta(n \log n + n) = \Theta(n \log
n)$

Thus, the whole algorithm is:

$$\Theta(n + \log n + n + n \log n + n + n \log n) = \Theta(n \log n)$$

Isn’t it easy to simplify expressions in asymptotic notation?

Implementation

If you are writing an application that needs an FFT of any form, I suggest using
a library. Numerous smart people have spent their time making said library
bug-free and performant, and you can greatly profit from their effort. I only
know of one FFT library, the
Fastest Fourier Transform in the West, but there are
probably others that are just as fast, or bindings for this one to other
languages.

However, if you must embed the routine in your program, for instance in a
programming contest, you have to “implement it yourself”. The algorithm is not
very complicated, but time spent debugging it is time wasted. I suggest copying
Stanford University ACM Team’s
Notebook. I endorse
copying many things from there, actually. At the very least, their
implementation is clear enough that you can base yours off of it.

Assuming we have included Stanford’s implementation earlier in the C++ file:

// Assume we have at most 2^18 = 262144 coefficients
constintMAX_LEN=262144*2;// For those of you who don't want to read the Stanford source:
// "cpx" represents a complex number.
// FFT(input, output, internal=1, size of input, direction)
// Direction is 1 for FFT, -1 for inverse FFT
cpxA[MAX_LEN],B[MAX_LEN],C[MAX_LEN];intA_len,B_len,C_len;/* set the appropriate coefficients in the inputs A and B's real-valued part,
* and their length in A_len and B_len.
*/// find the immediately greatest power of two
for(C_len=1;!(C_len>A_len+B_len-1);C_len*=2);// This is assumed in the 12879 Golf Bot problem
assert(C_len<MAX_LEN);// pad A and B with zeroes
memset(A+A_len,0,(C_len-A_len)*sizeof(cpx));memset(B+B_len,0,(C_len-B_len)*sizeof(cpx));// fill C with A's point-value
FFT(A,C,1,C_len,1);// fill A with B's point-value
FFT(B,A,1,C_len,1);// Multiply the polynomials in point-value
for(inti=0;i<C_len;i++)A[i]=A[i]*C[i];// Convert back to coefficient representation
FFT(C,A,1,C_len,-1);for(inti=0;i<C_len;i++)C[i].a/=C_len;//nowC[i].a(thereal-valuedparts)containtheresult

Solving 12879 Golf Bot

To solve
12879 Golf Bot,
what we need to do is: have a vector that is 1 at the distances which
the golf bot can shoot and 0 at the distances which it cannot. We need
to add that vector to a vector of zeroes itself starting from every 1
it itself has. This is the operation called discrete convolution,
and it’s the equivalent of multiplying two polynomials in coefficient
form. So you do that, but you don’t need to divide by C_len at the
end, because we only need to know if a given position is $>0$.

It’s a very quick explanation, but the problem isn’t the focus of this post :)

Final thoughts

Hope this post was enjoyable, easy to read, and useful.

Don’t use this in production over the simple naive quadratic algorithm if you do
not handle big enough inputs. How big is big enough? Measure.

Sources

This pdf
from Portland State University’s Algorithm Design and Analysis class. Contains
more information about operations, properties and algorithms related to
polynomials.

Wikipedia’s
definition of the DFT and inverse DFT.

A
Stack Exchange
answer with an example of this algorithm. Helped me understand this a lot.