itemprop="text">
We've all seen the interesting
subdomains that you get automatically assigned when you deploy an app to Heroku with a
bare "heroku create".
Some examples:
blazing-mist-4652, electric-night-4641, morning-frost-5543, radiant-river-7322, and so
on.
It seems they all follow a
adjective-noun-4digitnumber pattern (for the most part). Did they simply type out a
dictionary of some adjectives and nouns, then choose combinations from them at random
when you push an app? Is there a Ruby gem that accomplishes this, perhaps provides a
dictionary which one could search by parts of speech, or is this something to be done
manually?
Engineer at the Heroku API team
here: we went with the simplest approach to generate app names, which is basically what
you suggested: keep arrays of adjectives and nouns in memory, pick an element from each
at random and combine it with a random number from 1000 to
9999.
Not the most thrilling code I've written,
but it's interesting to see what we had to do in order to scale
this:
At first we
were picking a name, trying to INSERT
and then rescuing the
uniqueness constraint error to pick a different name. This worked fine while we had a
large pool of names (and a not-so-large set of apps using them), but at a certain scale
we started to notice a lot of collisions during name
generation.
To make it more resilient we decided
to pick several names and check which ones are still available with a single query. We
obviously still need to check for errors and retry because of race conditions, but with
so many apps in the table this is clearly more
effective.
It also has the added benefit of
providing an easy hook for us to get an alert if our name pool is low (eg: if 1/3 of the
random names are taken, send an
alert).
The first time we
had issues with collisions we just radically increased the size of our name pool by
going from 2 digits to 4. With 61 adjectives and 74 nouns this took us from ~400k to
~40mi names (61 * 74 *
8999
).
But by the time we
were running 2 million apps we started receiving collision alerts again, and at a much
higher rate than expected: About half of the names were colliding, what made no sense
considering our pool size and amount of apps
running.
The culprit as you might have guessed
is that rand
is a pretty bad href="http://en.wikipedia.org/wiki/Pseudorandom_number_generator">pseudorandom number
generator. Picking random elements and numbers with
SecureRandom
instead radically lowered the amount of
collisions, making it match what we expected in first
place.
With so
much work going to scale this approach we had to ask whether there's a better way to
generate names in first place. Some of the ideas discussed
were:
Make the
name generation a function of the application id. This would be much faster and avoid
the issue with collisions entirely, but on the downside it would waste a lot of names
with deleted apps (and damn, we have A LOT of apps being created and deleted shortly
after as part of different integration
tests).
Another option to
make name generation deterministic is to have the pool of available names in the
database. This would make it easy to do things like only reusing a name 2 weeks after
the app was
deleted.
Excited
to see what we'll do next time the collision alert
triggers!
Hope this helps anyone working on
friendly name generation out there.
No comments:
Post a Comment