Kanav Kahol Website

Content Based Image Retrieval

Research Synopsis

This research was a collaboration with Dr. John Black at ASU. It was centered toward development of perceptually sound, content based image retrieval systems.

Research Summary

During the decade of the 90’s (which the US Congress designated the “decade of the brain”) tremendous advances have been made in the study of the human brain, as well as human perception and cognition. The result has been a new interdisciplinary research area called Cognitive Science. This new science is based on scientific findings from a wide range of scientific disciplines, including cytology, neuroscience, medicine, philosophy, linguistics, anthropology, and clinical psychology.

The next logical step would be to apply the knowledge gained by cognitive scientists to the design of computer-based systems that process information in a manner that mimics the human mind. This approach (which has been called cognitive computing) employs a collection of biologically inspired processing techniques to provide a form of “soft computing” that can the handle imprecision, uncertainty and partial truth of the real world.

The performance of content-based image retrieval using low-level visual content has largely been judged to be unsatisfactory. Perceived performance could probably be improved if retrieval were based on higher-level content. However, researchers have not been very successful in bridging what is now called the "semantic gap" between low-level content detectors and higher-level visual content. We propose a novel "top-down" approach to bridging this semantic gap. A list of primitive words (which we call "lexical basis functions") are selected from a lexicon of the English language, and are used to characterize the higher-level content of natural outdoor images. Visual similarity between pairs of images are then "computed" based on the degree of similarity between their respective word lists. These "computed" similarities are then shown to correlate with subjectively perceived similarities between pairs of images. This demonstrates that the chosen set of lexical basis functions is able to characterize the multidimensional content (and similarity) of these image pairs in a manner that parallels their subjectively perceived content (and similarity). If a retrieval system could be designed to automatically detect the visual content represented by these basis functions, it could compute a similarity measure that would correlate with human subjective similarity rankings.

As a sample consider the following image

Figure 1. Sample Image

given below are 98 words which form a feature set against which this image can be evaluated.

for e.g. observer A feels this image is cloudy, wooded, watery, muddy, natural.

We can evaluate these image against this list and find a representative set of this image. Conversion to words of an image helps in characterizing the content of images. To find similarity of images we can find dot product of the image's lexical basis functions. When compared to subjective similarity measures the lexical basis functions similarity measure line up well.

Some of my publications help to get a deeper insights into the approach. We are applying the same approach to Face recognition as well as motion.

We can evaluate these image against this list and find a representative set of this image. Conversion to words of an image helps in characterizing the content of images. To find similarity of images we can find dot product of the image's lexical basis functions. When compared to subjective similarity measures the lexical basis functions similarity measure line up well. We have extended the study to find clusters in images and the results have been very encouraging. Given below is a sample cluster:

Snowy:

 

Publications

J Black, K Kahol, P Tripathi, S Panchanathan, "Indexing natural images for retrieval based on Kansei", Accepted for publication at Human Vision and Electronic Imaging conference 2004

S Panchanathan, J Black, P Tripathi, K Kahol, "Cognitive Multimedia Computing", published in IEEE International Symposium on Information Science and Electrical Engineering 2003 (ISEE 2003), November 13--15, 2003, ACROS Fukuoka, Fukuoka, Japan

J Black, K Kahol, P Tripathi, S Panchanathan, "Visual Concept Derivation from Natural Scenery Images Using Lexical Basis Functions, Multidimensional Scaling, and Density Clustering", accepted for publication at IEEE Indian International Conference on Artificial Intelligence 2003.

J Black, K Kahol, G Fahmy, P Kuchi, S Panchanathan , "Characterizing the high-level content of natural images using lexical basis functions" , accepted at Human Vision and Electronic Imaging Conference SPIE 2003, Santa Clara

J Black, K Kahol, P Kuchi, S Panchanathan , "The use of lexical basis functions to characterize faces, and to measure their perceived similarity" , ICONIP 2002 Singapore

CUbiC | ASU | ©2005 Kanav Kahol