Advanced Search
Welcome to Omgili,
Omgili (Oh My God I Love It ;) is a search engine for discussions. With Omgili you can find answers and solutions, debates, discussions, personal experiences, opinions and more... To learn more about Omgili click here.

This is a complete preview of the discussion as it was indexed by Omgili crawlers. Use this preview if the original discussion is unavailable.
Click here to view the original discussion.

Flickr: The Help Forum: cornell univ.'s database of flickr images

Just stumbled across this news item: analysis of Flickr photos could lead to online travel books cornell has apparently created a two-terabyte database of flickr photos and metadata, harvested using the Flickr API. Our dataset was collected by downloading images and photo metadata from Flickr.com using the site’s public API.

Our goal was to retrieve as large and unbiased a sample of geotagged photos as possible.

To do this, we first sample a photo id uniformly at random from the space of Flickr photo id numbers, look up the corresponding photographer, and download all the geotagged photos (if any) of that initial user.

For each photo we download metadata (textual tags, date and time taken, geolocation) and the image itself. We then crawl the graph of contacts starting from this user, downloading all the geotagged photos.

We repeat the entire process for another randomly selected photo id number, keeping track of users who have already been processed so that their photos and contact lists are not re-crawled. This crawl was performed during a six-month period in the summer and fall of 2008.

In total we retrieved 60,742,971 photos taken by 490,048 Flickr users. I realize that as "academic research"*, this probably falls under fair use.

Nevertheless, not sure I'm comfortable with the idea of my ARR photos existing in alternate databases, apparently with flickr's blessing. * funded in part by the National Science Foundation (NSF) Google, Yahoo!, and the John D.

And Catherine T. MacArthur Foundation.

The CAC is supported by Cornell, the NSF, the Department of Defense, the Department of Agriculture and members of its corporate program. maybe it's finally time to hide from API searches.

Hmm, I saw that the other day, but didn't read the entire paper to see that they'd downloaded all the photos.

Academic use is generally Fair Use, but I suppose I do question what happens to those photos at that database now.

Two other interesting bits... They have a KML file with selected photos they've downloaded for anyone's use in Google Earth. www.cs.cornell.edu/~crandall/photomap/ And when you view this file, you can choose to see more images.

Here's a sample page for Salt Lake City. www.cs.cornell.edu/~crandall/maps/random/random_444.html Interestingly, all of those thumbs are hosted at Flickr, but none of them are linked back to the Flickr photo pages -- instead they link to the static image on Flickr.

These are the medium sizes, so now all attribution and most EXIF data is lost. All for the good of research, eh?

I think the point of the research is for developing real-world applications (otherwise why would Yahoo even be involved.) So you could argue that the long term goal of this is for commercial applications, which would violate the API terms. Of course, not linking back to the Flickr photo pages violates the terms, too.

Guess I naively presumed that yahoo, google, and other enterprises would be developing their commercial applications using their own databases.

Didn't occur to me that they'd want to farm the data and research out to third parties.

Sigh.

Cornell must abide by the Flickr API TOS and remove your photos from their database IMMEDIATELY upon request or Flickr can remove their API access. I agree - it just takes one unscrupulous person with access to this dataset making their own copy and there's a Niagra Falls-sized hole for your photos to flow through.

The techniques developed in this paper could be quite useful in photo management and organization applications.

For example, the geo-classification method we propose could allow photo management systems like Flickr to automatically suggest geotags, significantly reducing the labor involved in adding geolocation annotations. That'd be neat.

Is Flickr planning on using this research to do that?

;-) The scalability of our methods allows for automatically mining the information latent in very large sets of images;

For instance, Figures 2 and 3 raise the intriguing possibility of an online travel guidebook that could automatically identify the best sites to visit on your next vacation, as judged by the collective wisdom of the world’s photographers. Which of course implies being able to use a database as comprehensive as Flickr's to build a commercial travel site. I just wish they'd linked the photos back to the photo pages.

Yeah, that kind of has a bad pong about it all.

I just checked that I am opting out of that, that's for sure. It's times like this I wished I had a 'contacts only' privacy level back like we used to...

Guess I naively presumed that yahoo, google, and other enterprises would be developing their commercial applications using their own databases. I'm assuming the goal of monetizing flickr is finally being met.

Hi dbthayer, thanks for bringing this to our attention.

We are taking a look at this.

Interesting article with bigger images of some of the "heat maps" generated by the raw data: gizmodo.com/5232419/35 -flickr-photos-mapped Of note, when it comes to using "crowd wisdom" to generate maps of "interest": The Apple store in Manhattan is the fifth most photographed place in NYC , beating out the Statue of Liberty. sigh.

The research is interesting. baa baa, bleat.

;-)

Brock, how do you opt out of that?

The second checkbox here www.flickr.com/account/prefs/optout/?from=privacy But based on the study, it seems they also went contact hopping.

The 2nd box in the setting above only hides your photos from API searches.

To hide your profile (to hide the ability to find your photos via your username), you also have to hide your profile from searches, which is the 3rd box on that page.

But then, no one will be able to find you in a Flickr member search, either.

Glad I decided a long time ago to *not* geo-tag my stuff... edit: It's times like this I wished I had a 'contacts only' privacy level back like we used to... I joined apparently after that was scrapped, but I've been pro that idea for a l ng time.

Yeah, it was really, really early on.

I didn't get the complexity behind it, at the time (which was one of the reasons it was scrapped, I think) but there was a level for that for a short while.

But I think it was only for part of 2004 (maybe only the first few months?). The number of people whose heads get blown up by the existing system kind of supports that, but it's the kind of level I'd choose for most of my stuff now, if it were an option.

FlyButtafly Geo privacy options exist.

I do not know whether they would be effective against what was done here.

For now its just research and doesn't look like some diabolical plot to commercialize pics.

If that happens whoever is bound to abide by licensing anyway. The Flickr paranoia is a bit laughable, after all you are posting pics to the internet geotagged and free.

Personaly i would invite Google earth or similar to mine and use the pics automaticialy for their content ( providing they linked back)

Kleinber@cs.cornell.edu is the professor's email, if you feel like writing to him.

More information here: www.cs.cornell.edu/home/kleinber/#contact I've found one of my pictures so far, not that you'd know it's mine (links back to static medium image).

Actually, David Crandall is the first author on the paper, so he'd be the best person to contact about the project.

Not sure I've made up my mind about this yet. I'm certainly not against fair use for academic purposes. mostly, I'm just amazed at the audacity of these guys.

Huevos grande , to create their own flickr database alternate universe.

Without the slightest glimmer that some flickr users might react negatively, or have concerns about the security of their intellectual property (now residing in ithaca), or care about little things like linkbacks... but they're probably harmless enough, even if they're clueless about public relations. and the way that flickr (and yahoo, with the BOSS API) passes out API keys doesn't thrill me either.

But they're probably harmless enough, even if they're clueless about public relations Except that the scholars associated with this research specialize in social media and social networks.

And their scholarly research is protected by copyright, meaning that to access most academic papers or journals, one must either pay for the privilege of reading them or be affiliated with an academic institution.

And forget about disseminating their research without permission. In this case, the paper was publicly published, but what of the database?

If indeed they downloaded Flickr photos to their servers, who has access to them and how is that access controlled?

Is any attribution or copyright information still associated with this database of images, or is it as sloppy as those pages and that kml file, with little to no EXIF data or attribution or source information beyond the static url of the Flickr images? I'm in favor of academic research and Fair Use as it applies to it, but it seems to me that this is one of those new frontiers (when else has anyone been privy to such a grand scale of creative works?) that hasn't fully been thought through.

In this case, the paper was publicly published Er. Ok.

Well, I only meant it's refreshing to have access to an academic paper, instead of it being being a wall.

Any word from Flickr on this?

Eric in SF Not over the weekend.

The comment that they would look into it was last Thursday. Has Flickr stopped checking the forums and posting replies outside of west coast work hours?

That's not always been the case.

More Flickr staff were laid off last week.

Whatever the weekend staff level is, I suspect it's less than last week.

Brock "It's times like this I wished I had a 'contacts only' privacy level back like we used to...

" I think we still do.

I have one contact, and when I view his/her pictures it has a yellow square with a dot in it (like the green and red ones for public and private) and says something like "Only contacts can see this photo."

That's Friends/Family.

There is no Contact level privacy.

Recently every time I update one of my websites I get a flurry of accesses from the Amazon ec2 servers.

Now it might be a spider, or it might be some scrapper or data miner, but whatever it has a habit of trying to access the page editing features, which is a bit anti-social, especially as the robots.txt says don't go there. So I've blocked all the ec2 servers.

Fuck em one and all. Now what about the ability to backlist on flickr?

Colleen - not to go too off-topic, but how did you know about the Flickr layoffs?

They've not made the common news but if it was buried in a small notice somewhere I could have missed it.

A couple of them were buy-outs, not layoffs, but it made the press last week when it happened. geobloggers.com/2009/04/29/on-my-last-day-at-flickr/ gigaom.com/2009/04/29/flickr-hit-hard-by-yahoo-layoffs/ latimesblogs.latimes.com/technology/2009/05/around-the-we... www.webpronews.com/blogtalk/2009/05/01/flickr-is-target-o...