Understanding an image and articulating the content in words concisely is a key cognitive task which humans excel at. We are building a cognitive system that given an image can automatically and concisely summarize the salient content in the image in a few descriptive sentences. The system aims to produce descriptive sentences at different levels of granularity (from generic concise captions to very specific detailed textual descriptions) and abstractions (from visual denotations to contextual connotations). This is an exploratory research project beyond simple object recognition and aims at building a cross-modal image to text translation system with many interesting potential applications.