Saturday 20 January 2018

c++ - Image Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition

itemprop="text">

One of the most interesting projects
I've worked on in the past couple of years was a project about href="https://en.wikipedia.org/wiki/Image_processing" rel="noreferrer">image
processing. The goal was to develop a system to be able to recognize Coca-Cola
'cans' (note that I'm stressing the word 'cans', you'll see
why in a minute). You can see a sample below, with the can recognized in the
green rectangle with scale and
rotation.



src="https://i.stack.imgur.com/irQtR.png" alt="Template
matching">



Some constraints on the
project:





  • The
    background could be very noisy.

  • The
    can could have any scale or
    rotation or even orientation (within reasonable
    limits).

  • The image could have some degree of fuzziness
    (contours might not be entirely straight).

  • There could be
    Coca-Cola bottles in the image, and the algorithm should only detect the
    can!

  • The brightness of the image
    could vary a lot (so you can't rely "too much" on color
    detection).

  • The can could be partly
    hidden on the sides or the middle and possibly partly hidden behind a
    bottle.

  • There could be no can at all
    in the image, in which case you had to find nothing and write a message saying
    so.




So you
could end up with tricky things like this (which in this case had my algorithm totally
fail):



src="https://i.stack.imgur.com/Byw82.png" alt="Total
fail">



I did this project a while ago, and
had a lot of fun doing it, and I had a decent implementation. Here are some details
about my
implementation:



Language:
Done in C++ using OpenCV
library.



Pre-processing:
For the image pre-processing, i.e. transforming the image into a more raw form to give
to the algorithm, I used 2
methods:





  1. Changing
    color domain from RGB to rel="noreferrer">HSV and filtering based on "red" hue, saturation above a
    certain threshold to avoid orange-like colors, and filtering of low value to avoid dark
    tones. The end result was a binary black and white image, where all white pixels would
    represent the pixels that match this threshold. Obviously there is still a lot of crap
    in the image, but this reduces the number of dimensions you have to work
    with.
    Binarized<br />            image

  2. Noise filtering using median filtering
    (taking the median pixel value of all neighbors and replace the pixel by this value) to
    reduce noise.

  3. Using href="http://en.wikipedia.org/wiki/Canny_edge_detector" rel="noreferrer">Canny Edge
    Detection Filter to get the contours of all items after 2 precedent
    steps.
    Contour<br />            detection



Algorithm:
The algorithm itself I chose for this task was taken from href="https://rads.stackoverflow.com/amzn/click/com/0123725380"
rel="noreferrer">this awesome book on feature extraction and called href="http://en.wikipedia.org/wiki/Generalised_Hough_transform"
rel="noreferrer">Generalized Hough Transform (pretty different from the
regular Hough Transform). It basically says a few
things:





  • You can
    describe an object in space without knowing its analytical equation (which is the case
    here).

  • It is resistant to image deformations such as
    scaling and rotation, as it will basically test your image for every combination of
    scale factor and rotation factor.

  • It uses a base model (a
    template) that the algorithm will "learn".

  • Each pixel
    remaining in the contour image will vote for another pixel which will supposedly be the
    center (in terms of gravity) of your object, based on what it learned from the
    model.



In the end, you
end up with a heat map of the votes, for example here all the pixels of the contour of
the can will vote for its gravitational center, so you'll have a lot of votes in the
same pixel corresponding to the center, and will see a peak in the heat map as
below:



src="https://i.stack.imgur.com/wxrT1.png"
alt="GHT">




Once you have that, a
simple threshold-based heuristic can give you the location of the center pixel, from
which you can derive the scale and rotation and then plot your little rectangle around
it (final scale and rotation factor will obviously be relative to your original
template). In theory at
least...



Results:
Now, while this approach worked in the basic cases, it was severely lacking in some
areas:




  • It is
    extremely slow! I'm not stressing this enough. Almost a
    full day was needed to process the 30 test images, obviously because I had a very high
    scaling factor for rotation and translation, since some of the cans were very
    small.

  • It was completely lost when bottles were in the
    image, and for some reason almost always found the bottle instead of the can (perhaps
    because bottles were bigger, thus had more pixels, thus more
    votes)

  • Fuzzy images were also no good, since the votes
    ended up in pixel at random locations around the center, thus ending with a very noisy
    heat map.

  • In-variance in translation and rotation was
    achieved, but not in orientation, meaning that a can that was not directly facing the
    camera objective wasn't
    recognized.




Can
you help me improve my specific algorithm, using
exclusively OpenCV features, to resolve the
four specific issues
mentioned?



I hope some people will also learn
something out of it as well, after all I think not only people who ask questions should
learn. :)



Answer




An alternative approach would be to extract
features (keypoints) using the href="https://en.wikipedia.org/wiki/Scale-invariant_feature_transform"
rel="noreferrer">scale-invariant feature transform (SIFT) or href="https://en.wikipedia.org/wiki/Speeded_up_robust_features"
rel="noreferrer">Speeded Up Robust Features
(SURF).



It is implemented in href="https://en.wikipedia.org/wiki/OpenCV" rel="noreferrer">OpenCV
2.3.1.




You can find a nice code
example using features in href="http://docs.opencv.org/2.4/doc/tutorials/features2d/feature_homography/feature_homography.html"
rel="noreferrer">Features2D + Homography to find a known
object



Both algorithms are
invariant to scaling and rotation. Since they work with features, you can also handle
rel="noreferrer">occlusion (as long as enough keypoints are
visible).



src="https://i.stack.imgur.com/kF63R.jpg" alt="Enter image description
here">



Image source: tutorial
example



The processing takes a few hundred ms
for SIFT, SURF is bit faster, but it not suitable for real-time applications. ORB uses
FAST which is weaker regarding rotation
invariance.




The original
papers




No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print &q...