What Makes Paris Look like Paris?

What Makes Paris Look like Paris?

Carl Doersch 1

Saurabh Singh 1

Abhinav Gupta 1

Josef Sivic 2

Alexei A. Efros 1 , 2

1 Carnegie Mellon University

2 INRIA / Ecole Normale Sup´erieure, Paris

Figure 1: These two photos might seem nondescript, but each contains hints about which city it might belong to. Given a large image database of a given city, our algorithm is able to automatically discover the geographically-informative elements (patch clusters to the right of each photo) that help in capturing its “look and feel”. On the left, the emblematic street sign, a balustrade window, and the balcony support are all very indicative of Paris, while on the right, the neoclassical columned entryway sporting a balcony, a Victorian window, and, of course, the cast iron railing are very much features of London.


1 Introduction Consider the two photographs in Figure 1, both downloaded from Google Street View. One comes from Paris, the other one from London. Can you tell which is which? Surprisingly, even for these nondescript street scenes, people who have been to Europe tend to do quite well on this task. In an informal survey, we presented 11 subjects with 100 random Street View images of which 50% were from Paris, and the rest from eleven other cities. We instructed the subjects (who have all been to Paris) to try and ignore any text in the photos, and collected their binary forced-choice responses (Paris / Not Paris). On average, subjects were correct 79% of the time ( std = 6 . 3 ), with chance at 50% (when allowed to scruti- nize the text, performance for some subjects went up as high as 90% ). What this suggests is that people are remarkably sensitive to the geographically-informative features within the visual envi- ronment. But what are those features? In informal debriefings, our subjects suggested that for most images, a few localized, distinctive elements “immediately gave it away”. E.g. for Paris, things like windows with railings, the particular style of balconies, the dis- tinctive doorways, the traditional blue/green/white street signs, etc. were particularly helpful. Finding those features can be difficult though, since every image can contain more than 25 , 000 candidate patches, and only a tiny fraction will be truly distinctive. In this work, we want to find such local geo-informative features automatically , directly from a large database of photographs from a particular place, such as a city. Specifically, given tens of thousands of geo-localized images of some geographic region R , we aim to find a few hundred visual elements that are both: 1) repeating, i.e. they occur often in R , and 2) geographically discriminative, i.e. they occur much more often in R than in R C . Figure 1 shows sample output of our algorithm: for each photograph we show three of the most geo-informative visual elements that were automatically discovered. For the Paris scene (left), the street sign, the window with railings, and the balcony support are all flagged as informative. But why is this topic important for modern computer graphics? 1) Scientifically, the goal of understanding which visual elements are fundamental to our perception of a complex visual concept, such as a place, is an interesting and useful one. Our paper shares this motivation with a number of other recent works that don’t actually synthesize new visual imagery, but rather propose ways of finding and visualizing existing image data in better ways, be it selecting candid portraits from a video stream [Fiss et al. 2011], summarizing

Given a large repository of geotagged imagery, we seek to auto- matically find visual elements, e.g. windows, balconies, and street signs, that are most distinctive for a certain geo-spatial area, for example the city of Paris. This is a tremendously difficult task as the visual features distinguishing architectural elements of differ- ent places can be very subtle. In addition, we face a hard search problem: given all possible patches in all images, which of them are both frequently occurring and geographically informative? To address these issues, we propose to use a discriminative clustering approach able to take into account the weak geographic supervi- sion. We show that geographically representative image elements can be discovered automatically from Google Street View imagery in a discriminative manner. We demonstrate that these elements are visually interpretable and perceptually geo-informative. The dis- covered visual elements can also support a variety of computational geography tasks, such as mapping architectural correspondences and influences within and across cities, finding representative el- ements at different geo-spatial scales, and geographically-informed image retrieval. CR Categories: I.3.m [Computer Graphics]: Miscellaneous— Application I.4.10 [Image Processing and Computer Vision]: Im- age Representation—Statistical

Keywords: data mining, visual summarization, reference art, big data, computational geography, visual perception



Made with FlippingBook - professional solution for displaying marketing and sales documents online