Monday 18 December 2017

opencv - Use Azure Machine learning to detect symbol within an image

itemprop="text">

4 years ago I posted href="https://stackoverflow.com/q/6999920/411094">this question and got a
few answers that were unfortunately outside my skill level. I just attended a build tour
conference where they spoke about machine learning and this got me thinking of the
possibility of using ML as a solution to my problem. i found href="https://gallery.azureml.net/MachineLearningAPI/02ce55bbc0ab4fea9422fe019995c02f"
rel="nofollow noreferrer">this on the azure site but i dont think it will
help me because its scope is pretty narrow.



Here
is what i am trying to achieve:




i
have a source image:



src="https://i.stack.imgur.com/6y76s.jpg" alt="source
image">



and i want to which one of the
following symbols (if any) are contained in the image
above:



src="https://i.stack.imgur.com/SuHkU.jpg"
alt="symbols">



the compare needs to support
minor distortion, scaling, color differences, rotation, and brightness
differences.




the number of symbols to
match will ultimately at least be greater than
100.



is ML a good tool to solve this problem? if
so, any starting tips?


itemprop="text">
class="normal">Answer



As far as
I know, Project Oxford (MS Azure CV API) wouldn't be suitable for your task. Their APIs
are very focused to Face related tasks (detection, verification, etc), OCR and Image
description. And apparently you can't extend their models or train new ones from the
existing ones.



However, even though I don't know
an out of the box solution for your object detection problem; there are easy enough
approaches that you could try and that would give you some start point
results.



For instance, here is a naive method
you could use:




1)
Create your dataset:

This is probably the more tedious step
and paradoxically a crucial one. I will assume you have a good amount of images to work
with. What would you need to do is to pick a fixed window size and extract positive and
negative examples.
alt="enter image description here">



If some
of the images in your dataset are in different sizes you would need to rescale them to a
common size. You don't need to get too crazy about the size, probably 30x30 images would
be more than enough. To make things easier I would turn the images to gray scale too.



2) Pick a classification
algorithm and train it:

There is an awful amount of
classification algorithms out there. But if you are new to machine learning I will pick
the one I would understand the most. Keeping that in mind, I would check out logistic
regression which give decent results, it's easy enough for starters and have a lot of
libraries and tutorials. For instance, href="http://blog.yhathq.com/posts/logistic-regression-and-python.html"
rel="noreferrer">this one or href="https://msdn.microsoft.com/en-us/magazine/dn948113.aspx" rel="noreferrer">this
one. At first I would say to focus in a binary classification problem (like if
there is an UD logo in the picture or not) and when you master that one you can jump to
the multi-class case. There are resources for that href="http://www.codeproject.com/Articles/821347/MultiClass-Logistic-Classifier-in-Python"
rel="noreferrer">too or you can always have several models one per logo and
run this recipe for each one separately.




To train your model, you just need
to read the images generated in the step 1 and turn them into a vector and label them
accordingly. That would be the dataset that will feed your model. If you are using
images in gray scale, then each position in the vector would correspond to a pixel value
in the range 0-255. Depending on the algorithm you might need to rescale those values to
the range [0-1] (this is because some algorithms perform better with values in that
range). Notice that rescaling the range in this case is fairly easy (new_value =
value/255).



You also need to split your dataset,
reserving some examples for training, a subset for validation and another one for
testing. Again, there are different ways to do this, but I'm keeping this answer as
naive as possible.



3) Perform the
detection:

So now let's start the fun part. Given any image
you want to run your model and produce coordinates in the picture where there is a logo.
There are different ways to do this and I will describe one that probably
is not the best nor the more efficient, but it's easier to
develop in my opinion.



You are going to scan the
picture, extracting the pixels in a "window", rescaling those pixels to the size you
selected in step 1 and then feed them to your model.



src="https://i.stack.imgur.com/VGk3f.png" alt="Extracting windows to feed the
model">




If the model give you a
positive answer then you mark that window in the original image. Since the logo might
appear in different scales you need to repeat this process with different window sizes.
You also would need to tweak the amount of space between
windows.



4) Rinse and
repeat:

At the first iteration it's very likely that you will
get a lot of false positives. Then you need to take those as negative examples and
retrain your model. This would be an iterative process and hopefully on each iteration
you will have less and less false positives and fewer false
negatives.



Once you are reasonable happy with
your solution, you might want to improve it. You might want to try other classification
algorithms like rel="noreferrer">SVM or href="https://en.wikipedia.org/wiki/Deep_learning" rel="noreferrer">Deep Learning
Artificial Neural Networks, or to try better object detection frameworks like
href="https://en.wikipedia.org/wiki/Viola%E2%80%93Jones_object_detection_framework"
rel="noreferrer">Viola-Jones. Also, you will probably need to use href="https://en.wikipedia.org/wiki/Cross-validation_%28statistics%29"
rel="noreferrer">crossvalidation to compare all your solutions (you can
actually use crossvalidation from the beginning). By this moment I bet you would be
confident enough that you would like to use OpenCV or another ready to use framework in
which case you will have a fair understanding of what is going on under the hood.



Also you could just disregard all this answer
and go for an OpenCV object detection tutorial like this href="http://note.sonots.com/SciSoftware/haartraining.html"
rel="noreferrer">one. Or take another answer from another question like
this href="https://stackoverflow.com/questions/10168686/algorithm-improvement-for-coca-cola-can-shape-recognition?rq=1">one.
Good luck!


No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print &q...