Saturday 20 January 2018

ruby - How can I programmatically generate Heroku-like subdomain names?

itemprop="text">

We've all seen the interesting
subdomains that you get automatically assigned when you deploy an app to Heroku with a
bare "heroku create".



Some examples:
blazing-mist-4652, electric-night-4641, morning-frost-5543, radiant-river-7322, and so
on.



It seems they all follow a
adjective-noun-4digitnumber pattern (for the most part). Did they simply type out a
dictionary of some adjectives and nouns, then choose combinations from them at random
when you push an app? Is there a Ruby gem that accomplishes this, perhaps provides a
dictionary which one could search by parts of speech, or is this something to be done
manually?



Answer





Engineer at the Heroku API team
here: we went with the simplest approach to generate app names, which is basically what
you suggested: keep arrays of adjectives and nouns in memory, pick an element from each
at random and combine it with a random number from 1000 to
9999.



Not the most thrilling code I've written,
but it's interesting to see what we had to do in order to scale
this:




  • At first we
    were picking a name, trying to INSERT and then rescuing the
    uniqueness constraint error to pick a different name. This worked fine while we had a
    large pool of names (and a not-so-large set of apps using them), but at a certain scale
    we started to notice a lot of collisions during name
    generation.



    To make it more resilient we decided
    to pick several names and check which ones are still available with a single query. We
    obviously still need to check for errors and retry because of race conditions, but with
    so many apps in the table this is clearly more
    effective.



    It also has the added benefit of
    providing an easy hook for us to get an alert if our name pool is low (eg: if 1/3 of the
    random names are taken, send an
    alert).



  • The first time we
    had issues with collisions we just radically increased the size of our name pool by
    going from 2 digits to 4. With 61 adjectives and 74 nouns this took us from ~400k to
    ~40mi names (61 * 74 *
    8999
    ).


  • But by the time we
    were running 2 million apps we started receiving collision alerts again, and at a much
    higher rate than expected: About half of the names were colliding, what made no sense
    considering our pool size and amount of apps
    running.



    The culprit as you might have guessed
    is that rand is a pretty bad href="http://en.wikipedia.org/wiki/Pseudorandom_number_generator">pseudorandom number
    generator. Picking random elements and numbers with
    SecureRandom instead radically lowered the amount of
    collisions, making it match what we expected in first
    place.




With so
much work going to scale this approach we had to ask whether there's a better way to
generate names in first place. Some of the ideas discussed
were:




  • Make the
    name generation a function of the application id. This would be much faster and avoid
    the issue with collisions entirely, but on the downside it would waste a lot of names
    with deleted apps (and damn, we have A LOT of apps being created and deleted shortly
    after as part of different integration
    tests).



  • Another option to
    make name generation deterministic is to have the pool of available names in the
    database. This would make it easy to do things like only reusing a name 2 weeks after
    the app was
    deleted.




Excited
to see what we'll do next time the collision alert
triggers!



Hope this helps anyone working on
friendly name generation out there.


No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print &q...