![]() |
Histogram Intersection Distances
I am working with two color (rgb) histograms which are divided into 4 bins. I am trying to find the similarity/difference between the two histograms. I have done some research and found papers which contain formulas which show how this is done. However because of my subpar math skills, I am unable to visualize how I would actually go about doing this in my code.
In my code the histograms are represented by a three dimensional array which contains 4 elements in each dimension.(I don't know if im saying this correctly):( Here are the formulas I have found: Histogram euclidean distance http://scien.stanford.edu/class/psyc...g/index.15.gif Histogram intersection distance http://scien.stanford.edu/class/psyc...g/index.16.gif I am not asking anyone to do anything but give a hint or two about how I would go about implementing these formulas. From what I understand the formula for Euclidean distance says that I should square each r,g,b value and then subtract them from each other, and this is supposed to give me the distance squared. The second equation is actually the one I am more interested in, but I don't understand it very well.:eek: Thanks in advance. |
You probably should mention the original website, since it gives some explanation of the variables in the equation. A, B and C represent the three colour channels in the image (typically RGB or HSV), and h and g represent the two histograms.
The first equation is simple enough. In pseudo-code, it would look something like this: :
sum = 0Essentially, this sums the square of the distance between every combination of bins. This is akin to the standard deviation, or the distance between two points, and it's easy to see why this would be used to compare histograms. The second equation is more curious: :
minsum = 0Quote:
I checked around, and found this formula: http://www.eng.tau.ac.il/~hayit/553....2/image7SP.JPG Which more clearly shows that yes, it is the height by the width (at least if you take a histogram of all pixels, which I figure you want to do). Or, to put it another way, the sum of all the bins in the histogram (which amounts to the same thing). Now, if this is the case it's not technically a distance algorithm. But other sites refer to it as the "histogram intersection" rather than the "histogram intersection distance", which makes more sense. The theory behind this took me a while to figure out. If the two images are almost identical, then the sum of the minimums will be almost the same as the sum of the smallest of the two histograms. This means that 1.0 signifies identity, and the larger the deviation from that, the more dissimilar the image. To give you an example with a single dimensional histogram: :
h = [16, 45, 23, 1]:
>>> def intersection(h, g): |
Thanks for clearing it up for me. I will try to implement it in the program when I get back from school.
|
You may find it advantageous to apply a lossy compression (as in jpeg) to the images before comparison, to reduce the number of highly-similar colors.
|
Would that matter, if the colours were all being placed in discrete bins anyway?
Edit: Returning to the problem of proving the histogram's range, I know now that the histogram intersection will always be between 0 and 1, with 1 being an exact match and 0 being a complete failure. This is because the sum of the minimums (at the top of the divisor) is never greater than the minimum sum (at the bottom of the divisor). |
It depends on what one is shooting for. If an exact match is desired, one wouldn't want to degrade the image. Humans can't perceive all the possible colors, so lumping them into perceivable colors would exaggerate the effects and, I believe, introduce some semantic content that is otherwise missed. I think this would enhance a search for similarity, versus exactitude. Just musings, no supportive evidence.
|
A histogram lumps together sets of colours anyway. Only in cases when a pixel's color is pushed over the boundry between sets would it make a difference to the histogram, which is likely to be a relatively rare occurance if the number of distinct sets is small.
|
A histogram lumps colors if you build it to do so. I'm suggesting that's a Good Thang to do.
|
Thats what I have it doing the range of each bin is 64 colors. This is just for now. Later on I will try to change the sizes of bins to see what works better.
|
That sounds reasonable. Still, I'm just shooting in the dark, here, random thoughts as I read your posts.
|
| All times are GMT -5. The time now is 1:47 AM. |
Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC