Home Image Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition

# Image Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition

Charles Menguy
1#
Charles Menguy Published in 2012-04-16 04:23:16Z
Tiago
2#

An alternative approach would be to extract features (keypoints) using the scale-invariant feature transform (SIFT) or Speeded Up Robust Features (SURF).

It is implemented in OpenCV 2.3.1.

You can find a nice code example using features in Features2D + Homography to find a known object

Both algorithms are invariant to scaling and rotation. Since they work with features, you can also handle occlusion (as long as enough keypoints are visible).

Image source: tutorial example

The processing takes a few hundred ms for SIFT, SURF is bit faster, but it not suitable for real-time applications. ORB uses FAST which is weaker regarding rotation invariance.

### The original papers

• SURF: Speeded Up Robust Features
• Distinctive Image Features from Scale-Invariant Keypoints
• ORB: an efficient alternative to SIFT or SURF
Fantastic Mr Fox
3#
Fantastic Mr Fox Reply to 2015-10-20 15:20:22Z
 If you are not limited to just a camera which wasn't in one of your constraints perhaps you can move to using a range sensor like the Xbox Kinect. With this you can perform depth and colour based matched segmentation of the image. This allows for faster separation of objects in the image. You can then use ICP matching or similar techniques to even match the shape of the can rather then just its outline or colour and given that it is cylindrical this may be a valid option for any orientation if you have a previous 3D scan of the target. These techniques are often quite quick especially when used for such a specific purpose which should solve your speed problem. Also I could suggest, not necessarily for accuracy or speed but for fun you could use a trained neural network on your hue segmented image to identify the shape of the can. These are very fast and can often be up to 80/90% accurate. Training would be a little bit of a long process though as you would have to manually identify the can in each image.
Darren Cook
4#
Darren Cook Reply to 2012-04-16 05:03:20Z
 Fun problem: when I glanced at your bottle image I thought it was a can too. But, as a human, what I did to tell the difference is that I then noticed it was also a bottle... So, to tell cans and bottles apart, how about simply scanning for bottles first? If you find one, mask out the label before looking for cans. Not too hard to implement if you're already doing cans. The real downside is it doubles your processing time. (But thinking ahead to real-world applications, you're going to end up wanting to do bottles anyway ;-)
Shep
5#
 This may be a very naive idea (or may not work at all), but the dimensions of all the coke cans are fixed. So may be if the same image contains both a can and a bottle then you can tell them apart by size considerations (bottles are going to be larger). Now because of missing depth (i.e. 3D mapping to 2D mapping) its possible that a bottle may appear shrunk and there isn't a size difference. You may recover some depth information using stereo-imaging and then recover the original size.
Alex L
6#
Alex L Reply to 2012-04-16 08:05:11Z
 I would detect red rectangles: RGB -> HSV, filter red -> binary image, close (dilate then erode, known as imclose in matlab) Then look through rectangles from largest to smallest. Rectangles that have smaller rectangles in a known position/scale can both be removed (assuming bottle proportions are constant, the smaller rectangle would be a bottle cap). This would leave you with red rectangles, then you'll need to somehow detect the logos to tell if they're a red rectangle or a coke can. Like OCR, but with a known logo?
tskuzzy
7#
 Looking at shape Take a gander at the shape of the red portion of the can/bottle. Notice how the can tapers off slightly at the very top whereas the bottle label is straight. You can distinguish between these two by comparing the width of the red portion across the length of it. Looking at highlights One way to distinguish between bottles and cans is the material. A bottle is made of plastic whereas a can is made of aluminum metal. In sufficiently well-lit situations, looking at the specularity would be one way of telling a bottle label from a can label. As far as I can tell, that is how a human would tell the difference between the two types of labels. If the lighting conditions are poor, there is bound to be some uncertainty in distinguishing the two anyways. In that case, you would have to be able to detect the presence of the transparent/translucent bottle itself.
Nakilon
8#
 Please take a look at Zdenek Kalal's Predator tracker. It requires some training, but it can actively learn how the tracked object looks at different orientations and scales and does it in realtime! The source code is available on his site. It's in MATLAB, but perhaps there is a Java implementation already done by a community member. I have succesfully re-implemented the tracker part of TLD in C#. If I remember correctly, TLD is using Ferns as the keypoint detector. I use either SURF or SIFT instead (already suggested by @stacker) to reacquire the object if it was lost by the tracker. The tracker's feedback makes it easy to build with time a dynamic list of sift/surf templates that with time enable reacquiring the object with very high precision. If you're interested in my C# implementation of the tracker, feel free to ask.
Peter Mortensen
9#
Peter Mortensen Reply to 2012-05-19 20:46:12Z
 To speed things up, I would take advantage of the fact that you are not asked to find an arbitrary image/object, but specifically one with the Coca-Cola logo. This is significant because this logo is very distinctive, and it should have a characteristic, scale-invariant signature in the frequency domain, particularly in the red channel of RGB. That is to say, the alternating pattern of red-to-white-to-red encountered by a horizontal scan line (trained on a horizontally aligned logo) will have a distinctive "rhythm" as it passes through the central axis of the logo. That rhythm will "speed up" or "slow down" at different scales and orientations, but will remain proportionally equivalent. You could identify/define a few dozen such scanlines, both horizontally and vertically through the logo and several more diagonally, in a starburst pattern. Call these the "signature scan lines." Searching for this signature in the target image is a simple matter of scanning the image in horizontal strips. Look for a high-frequency in the red-channel (indicating moving from a red region to a white one), and once found, see if it is followed by one of the frequency rhythms identified in the training session. Once a match is found, you will instantly know the scan-line's orientation and location in the logo (if you keep track of those things during training), so identifying the boundaries of the logo from there is trivial. I would be surprised if this weren't a linearly-efficient algorithm, or nearly so. It obviously doesn't address your can-bottle discrimination, but at least you'll have your logos. (Update: for bottle recognition I would look for coke (the brown liquid) adjacent to the logo -- that is, inside the bottle. Or, in the case of an empty bottle, I would look for a cap which will always have the same basic shape, size, and distance from the logo and will typically be all white or red. Search for a solid color eliptical shape where a cap should be, relative to the logo. Not foolproof of course, but your goal here should be to find the easy ones fast.) (It's been a few years since my image processing days, so I kept this suggestion high-level and conceptual. I think it might slightly approximate how a human eye might operate -- or at least how my brain does!)
Abid Rahman K
10#
Abid Rahman K Reply to 2017-10-27 06:13:51Z
 Isn't it difficult even for humans to distinguish between a bottle and a can in the second image (provided the transparent region of the bottle is hidden)? They are almost the same except for a very small region (that is, width at the top of the can is a little small while the wrapper of the bottle is the same width throughout, but a minor change right?). The first thing that came to my mind was to check for the red top of bottle. But it is still a problem, if there is no top for the bottle, or if it is partially hidden (as mentioned above). The second thing I thought was about the transparency of bottle. OpenCV has some works on finding transparent objects in an image. Check the below links. OpenCV Meeting Notes Minutes 2012-03-19 OpenCV Meeting Notes Minutes 2012-02-28 Particularly look at this to see how accurately they detect glass: OpenCV Meeting Notes Minutes 2012-04-24 See their implmentation result: They say it is the implementation of the paper "A Geodesic Active Contour Framework for Finding Glass" by K. McHenry and J. Ponce, CVPR 2006.. (Download paper). It might be helpful in your case a little bit, but problem arises again if the bottle is filled. So I think here, you can search for the transparent body of the bottles first or for a red region connected to two transparent objects laterally which is obviously the bottle. (When working ideally, an image as follows.) Now you can remove the yellow region, that is, the label of the bottle and run your algorithm to find the can. Anyway, this solution also has different problems like in the other solutions. It works only if your bottle is empty. In that case, you will have to search for the red region between the two black colors (if the Coca Cola liquid is black). Another problem if transparent part is covered. But anyway, if there are none of the above problems in the pictures, this seems be to a better way.
techExplorer
11#
 I'm not aware of OpenCV but looking at the problem logically I think you could differentiate between bottle and can by changing the image which you are looking for i.e. Coca Cola. You should incorporate till top portion of can as in case of can there is silver lining at top of coca cola and in case of bottle there will be no such silver lining. But obviously this algorithm will fail in cases where top of can is hidden, but in such case even human will not be able to differentiate between the two (if only coca cola portion of bottle/can is visible)
Community
12#
Guilherme Defreitas
13#
Guilherme Defreitas Reply to 2013-01-03 15:26:41Z
 There are a bunch of color descriptors used to recognise objects, the paper below compares a lot of them. They are specially powerful when combined with SIFT or SURF. SURF or SIFT alone are not very useful in a coca cola can image because they don't recognise a lot of interest points, you need the color information to help. I use BIC (Border/Interior Pixel Classiﬁcation) with SURF in a project and it worked great to recognise objects. Color descriptors for Web image retrieval: a comparative study
aaronsnoswell
14#
 I like your question, regardless of whether it's off topic or not :P An interesting aside; I've just completed a subject in my degree where we covered robotics and computer vision. Our project for the semester was incredibly similar to the one you describe. We had to develop a robot that used an Xbox Kinect to detect coke bottles and cans on any orientation in a variety of lighting and environmental conditions. Our solution involved using a band pass filter on the Hue channel in combination with the hough circle transform. We were able to constrain the environment a bit (we could chose where and how to position the robot and Kinect sensor), otherwise we were going to use the SIFT or SURF transforms. You can read about our approach on my blog post on the topic :)
Darien Pardinas
15#
Darien Pardinas Reply to 2013-09-24 14:14:11Z
 I like the challenge and wanted to give an answer, which solves the issue, I think. Extract features (keypoints, descriptors such as SIFT, SURF) of the logo Match the points with a model image of the logo (using Matcher such as Brute Force ) Estimate the coordinates of the rigid body (PnP problem - SolvePnP) Estimate the cap position according to the rigid body Do back-projection and calculate the image pixel position (ROI) of the cap of the bottle (I assume you have the intrinsic parameters of the camera) Check with a method whether the cap is there or not. If there, then this is the bottle Detection of the cap is another issue. It can be either complicated or simple. If I were you, I would simply check the color histogram in the ROI for a simple decision. Please, give the feedback if I am wrong. Thanks.
Deji
16#
17#
 As alternative to all these nice solutions, you can train your own classifier and make your application robust to errors. As example, you can use Haar Training, providing a good number of positive and negative images of your target. It can be useful to extract only cans and can be combined with the detection of transparent objects.
Semih Korkmaz
18#
Semih Korkmaz Reply to 2017-05-08 19:27:00Z
 Deep Learning Gather at least a few hundred images containing cola cans, annotate the bounding box around them as positive classes, include cola bottles and other cola products label them negative classes as well as random objects. Unless you collect a very large dataset, perform the trick of using deep learning features for small dataset. Ideally using a combination of Support Vector Machines(SVM) with deep neural nets. Once you feed the images to a previously trained deep learning model(e.g. GoogleNet), instead of using neural network's decision (final) layer to do classifications, use previous layer(s)' data as features to train your classifier. OpenCV and Google Net: http://docs.opencv.org/trunk/d5/de7/tutorial_dnn_googlenet.html OpenCV and SVM: http://docs.opencv.org/2.4/doc/tutorials/ml/introduction_to_svm/introduction_to_svm.html
Ken
19#
 If you are interested in it being realtime, then what you need is to add in a pre-processing filter to determine what gets scanned with the heavy-duty stuff. A good fast, very real time, pre-processing filter that will allow you to scan things that are more likely to be a coca-cola can than not before moving onto more iffy things is something like this: search the image for the biggest patches of color that are a certain tolerance away from the sqrt(pow(red,2) + pow(blue,2) + pow(green,2)) of your coca-cola can. Start with a very strict color tolerance, and work your way down to more lenient color tolerances. Then, when your robot runs out of an allotted time to process the current frame, it uses the currently found bottles for your purposes. Please note that you will have to tweak the RGB colors in the sqrt(pow(red,2) + pow(blue,2) + pow(green,2)) to get them just right. Also, this is gona seem really dumb, but did you make sure to turn on -oFast compiler optimizations when you compiled your C code?
Nuelsian
20#
 The answers on this page really amount to: "use SIFT" "use a Kinect" If you're not interested in the actual computer science of image recognition, and you just want to "use" something (like SIFT or Kinect), it is ubiquitous today to just use the commonly-available image recognition systems. As of 2017 and for years now, image recognition is widely and trivially available. You would no more sit down and (try to) achieve image recognition from scratch, than you would sit down and start gathering and displaying maps, or that you would start rendering HTML from scratch, or write an SQL database from scratch. You just use Google's tensorflow (they have reached the point of building chips, for goodness sake, to process tensorflow faster), Clarifai, Bluemix or whatever. AWS just released a good one for image recognition (2018). For example to use any of these services it's a few lines of code .... func isItACokeCan() { jds.headers = ["Accept-Language":"en"] let h = JustOf ...use your favorite http library let u: String = "https://gateway-a.watsonplatform.net/visual-recognition/api/v3/classify" + "?api_key= ... your API key ..." + "&version=2016-05-20" + "&classifier_ids= ... your Classifier name ..." h.post( u, files: ["x.jpeg": .data("x.jpeg", liveImageData!, "image/jpeg")] ) { r in if r.ok { DispatchQueue.main.async { self.processResult(r.json) } } else { DispatchQueue.main.async { self.doResults("network woe?") } } } } func processResult(_ rr: Any?){ let json = JSON(rr!) print("\(json)") }  That will literally give you the best, existing, coke-can-recognition on Earth, presently achieved. As of 2018, you can no more sit down and "write better coke-can recognition than Bluemix", than you could "sit down and write a better Go program than AlphaGo". Systems such as Siri, Google Maps, BAAS, the major image processing endeavours - and obviously google text search itself - are game-changing. Notice the incredible difference just since this question was asked six years ago. By all means if you're in to the actual computer science of image recognition, go for it. But this QA seems to be more of a review of technology. In asmuch as the answers here are saying "use a SIFT library" - you really wouldn't do that. (Again - no more than you for some reason laboriously program a web server or SQL database from scratch!) You just connect up to the well known, ubiquitous, image recognition "BAAS" systems - it's a line of code.