Welcome to Omgili,
Omgili ( Oh My God I Love It ;) is a search engine for discussions. With Omgili you can find answers and solutions, debates, discussions, personal experiences, opinions and more... To learn more about Omgili click here.
This is a complete preview of the discussion as it was indexed by Omgili crawlers. Use this preview if the original discussion is unavailable.
Click here to view the original discussion.
 |
|
 |
|
Flickr: The Help Forum: "noindex" Flickr pages showing in Google Search again [Flickr Case 1073716]
I know there is little flickr can do about that, but i noticed that some of my flickr pages created last week are indexed by google, even though the have the "noindex" metatag.
i reported the problem on the google forum, with an example of such a flickr page that should not have been indexed by google:
www.google.com/support/forum/p/Webmasters/thread?tid=573e343f298204b2&hl=en
maybe flickr engineers could ping google technical team to let them know that they have a problem, i.,e.
They index pages that they should not index.
EDIT:
- i fixed the thread title - this is not related to google image, but to the regular google index -
- the problem was confirmed in the google forum, and it was traced to a syntax error in the flickr markup used in the header of all flickr pages.
the problem was traced to a syntax error (maybe recently introduced by flickr in all the headers) in the way the previous metatag is closed.
The head of that page was cut right at that bad meta tag:
<meta name="viewport" content="width=820" />
The meta tag is bad not because of its name and content, those are irrelevant to Googlebot.
Meta tags are used by user agents that understand them and ignored by others.
But it is closed with />
Instead of just >
Which is what is needed since it does not have an xhtml doctype.
Once that / is found ,the head is deemed closed.
therefore the following metatag with "noindex" is now ignored by googlebot.
it is also possible that googlebot code changed recently and it is now less resilient to serious syntax errors, i.e.
This syntax error causes it to now ignore the rest of the header (including the "noindex" metatag).
|
|
 |
|
 |
 |
|
 |
|
According to some comments from "Autocrat" on the google forum:
the html validator:
validator.w3.org/check?uri=http%3A%2F%2Fwww.flickr.com%2F...
shows flickr pages markup code is Very invalid ....
The use of...
<meta name="viewport" content="width=820" />
may make it possible that...
<meta name="robots" content="noindex,follow">
is then ignored?
|
|
 |
|
 |
 |
|
 |
|
As I understand it, the " ...
" is pretty much just a site's way of saying "This shit is too long to display", and simply not displaying the rest of the URL in the link.
The link will still work if you click on it, unless you copy the truncated link and use that - that won't work...
|
|
 |
|
 |
 |
|
 |
|
Apparently the issue is that the "viewport" metatag is closed with "/>", instead of ">".
and this syntax error causes google to ignore the "noindex" metatag, and to index all the google pages that should NOT be indexed.
|
|
 |
|
 |
 |
|
 |
|
I am forwarding another comment from the google forum:
Quote: :
The head of that page was cut right at that bad meta tag:
<meta name="viewport" content="width=820" />
The meta tag is bad not because of its name and content, those are irrelevant to Googlebot.
Meta tags are used by user agents that understand them and ignored by others.
But it is closed with />
Instead of just >
Which is what is needed since it does not have an xhtml doctype.
Once that / is found ,the head is deemed closed.
If you notice, the validator says about the next line:
Line 10, Column 45: document type does not allow element "META" here .
<meta name="robots" content="noindex,follow">
So that is now ignored.
|
|
 |
|
 |
 |
|
 |
|
If that were true then all pages on flickr would have the same problem.
When I search for my stuff on google I get back things that it can find through other means *mostly* (like through blogspot) but there is the occasional flickr page that gets returned for no reason obvious to me.
But I would believe that in that case it is just a link out in the world I am not aware of that is causing the result.
Like I said, if it were a blatant syntax error, all pages would have the problem
|
|
 |
|
 |
 |
|
 |
|
> if that were true then all pages on flickr would have the same problem.
no, i think flickr just introduced the bug in their markup quite recently, so just recent pages were indexed by google while they should not.
in any case, the bug in the flickr markup is true, there is no doubt there, you can just look at the source.
This was confirmed by several people on the google forum that i mentioned in my OP.
and the page i mentioned is in the google index, no doubt about that either, you can check.
And i never changed my privacy setting, and this page have had the "noindex" metatag since it was created.
the problem is that the syntax error in the flickr header prevents some parsers (including googlebot) from actually seeing their noindex metatag.
also, i have a website that has a lot of traffic (and high ranking on google), and that points to my flickr pages, so my new flickr pages are being crawled by flickr in no time.
That is not the case for most flickr pages, that are not crawled by googlebot very often.
|
|
 |
|
 |
 |
|
 |
|
I see no recent photos whatsoever when I google "dsphoto- flickr"
|
|
 |
|
 |
 |
|
 |
|
Please read the last paragraph i added to my previous post.
if your pages have not been crawled since flickr introduced the markup bug (the "/" that should not be there), they will not be in the index.
plus, not all the pages crawled by googlebot make their way in the google index.
i found another of my pages that made its way in the google index, while it should never have:
flickr.com/photos/loupiote/2898187230/meta/in/set-7215760...
this page was created after i configured my flickr account to have "noindex".
So this is not an isolated quirk.
i'm pretty sure the "<meta name="viewport" content="width=820" />
(with that syntax error) has been added very recently.
i don't remember seeing it a few weeks ago.
|
|
 |
|
 |
 |
|
 |
|
My flickr pages get crawled almost every day.
www.google.co.uk/search?num=50&hl=en&client=firef...
I somehow doubt I'm that 'special'.
|
|
 |
|
 |
 |
|
 |
|
I too have a webpage that is indexed by google and has links to my flickr account on the front page.
The search engines used to index the flickr pages no matter what, it was only recently that they stopped returning results (like in the last year).
I read in this forum on one of your previous posts that this was because of some change that flickr made, depending on whether I have the "allow 3rd party api" thingie setting here on flickr
so now if I go to yahoo or google or whatever, all the paths lead to my webpage and possibly to flickr from other places like blogspot, etc if there is a link out there.
Yes, that makes more sense and that still seems to be working for me
|
|
 |
|
 |
 |
|
 |
|
There is now a case number of this issue: [Flickr Case 1073716]
@Walwyn
you have not configured your flickr account to have the "noindex" metatag, so it is perfectly normal that your flickr pages are indexed by google.
but new pages with "noindex" should not be indexed.
the problem was traced to a syntax error in the way the previous metatag is closed.
it is closed with />
Instead of just >
Which is what is needed since it does not have an xhtml doctype.
Once that / is found ,the head is deemed closed.
i.e.
The "noindex" mematag is ignored by google (because of this syntax error that flickr introduced recently).
|
|
 |
|
 |
 |
|
 |
|
@ dsphoto
> it was only recently that they stopped returning results (like in the last year)
yes, but since a few days, i.e.
When flickr introduced that syntax error in all their headers, your pages can be indexed again, even if the "noindex" is there, because of what i explained above (confirmed on the google forum).
Basically that small inoccuous extraneous "/" (a bug recently introduced by flickr) causes the "noindex" to be ignored.
but it causes no problem with browsers, of course.
|
|
 |
|
 |
 |
|
 |
|
Loupiote (Old Skool)
I well know that my pages don't have 'noindex', my response was to your assertion that dsphoto-'s stream might not have been indexed.
My stream is being indexed daily, I somehow doubt that I'd be being treated differently by the googlebot.
|
|
 |
|
 |
 |
|
 |
|
@Walwyn
there is nothing wrong in your case.
The syntax error in the header has no effect on you, since there is no tag following the syntax error in the flickr header, in your case.
So whatever you are trying to say, it is un-related to the topic at hand :)
flickr streams are not crawled at the same frequency, and there may be a lot of reason why some streams are crawled more often than others.
E.g. the ranking of your flickr pages, the number of people who have links to it etc.
But again this is off-topic here.
|
|
 |
|
 |
 |
|
 |
|
Simply that I cannot confirm the bug if there is one at this time, that is all I was saying
|
|
 |
|
 |
 |
|
 |
|
> simply that I cannot confirm the bug if there is one at this time, that is all I was saying
you can easiely confirm the bug:
just go to the google forum thread:
www.google.com/support/forum/p/Webmasters/thread?tid=573e343f298204b2&hl=en
and do the google search indicated:there:
"DSC04449 - Kat at the Giant Pillow Fight (San Francisco)"
you will find a flickr photo page (this one) in the google index.
That photo page was created on feb 15th 2009, and it has always had the "noindex".
also, you can use the "view source" of your browser, and you can see the bug with own eyes in the "header" section of each flickr page.
You should be able to see the syntax error, which is a "/" just before the closing >
At the end of the "viewport" metatag.
I can see the syntax error in your pages, too.
|
|
 |
|
 |
 |
|
 |
|
I don't think it's a bug, I think it's a markup error.
I think google's robot is inferring the end of the element when it hits the " />" and the beginning of the body element (where meta is not allowed).
I think that's what's happening with the validator you're using.
This makes a cascade of errors follow, since there is line after line of more markup in the head element that isn't allowed in the body element.
I believe your analysis of the problem is correct and that making the ending of that tag will solve the problem.
It should end as an HTML tag, not as XHTML.
|
|
 |
|
 |
 |
|
 |
|
> if that were true then all pages on flickr would have the same problem.
They may.
That error is repeated on this page, for example.*
Most _browsers_ are very forgiving and just parse the " />" as if it were the closing anchor bracket, but apparently google's bot follows HTML doc types strictly.
*<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Flickr: The Help Forum: "noindex" Flickr pages showing in Google Search again [Flickr Case 1073716]</title>
<meta http-equiv="Content-Type" content="text/html;
Charset=UTF-8">
<meta name="keywords" content="photography, digital photography, cameraphones, camera, hobby photography, photo, digital camera, compactflash, smartmedia, cameras, canon, nikon, olympus, fujifilm, video">
<meta name="description" content="Flickr is almost certainly the best online photo management and sharing application in the world.
Show off your favorite photos and videos to the world, securely and privately show content to your friends and family, or blog the photos and videos you take with a cameraphone.">
<meta http-equiv="imagetoolbar" content="no">
<meta name="viewport" content="width=820" />
|
|
 |
|
 |
 |
|
 |
|
@Civilized Explorer
i agree with you, the "bug" is a markup error in the flickr markup, in the header section.
my analysis was in fact the analysis of several experts on the google forum, but i do agree with it and i verified that their analysis is correct.
yes, flickr should correct the markup and they should NOT use XHTML ending in any tag, since they do not include an xhtml doctype.
|
|
 |
|
 |
 |
|
 |
|
This viewport metatag with the illegal xhtml closing "/>" appears in the header of every flickr page (except for flickr mobile).
<meta name="viewport" content="width=820" />
>
Most _browsers_ are very forgiving and just parse the " />" as if it were the closing anchor bracket, but apparently google's bot follows HTML doc types strictly.
yes, or at least, it does now (maybe googlebot was more forgiving for those errors in the past).
|
|
 |
|
 |
 |
|
 |
|
We're looking into that issue, thanks loupiote.
|
|
 |
|
 |
 |
|
 |
|
I'm not entirely sure what is causing this problem.
The potential cause listed above makes sense at first blush, but doesn't seem to really hold up on further examination.
The majority of the pages on the internet contain invalid markup;
If the Googlebot were so easily tripped up, Google Search wouldn't be very useful.
That being said, it's certainly possible that the invalid meta tag is causing this, and as such, I have fixed that bug.
I removed the necessary / before the closing tag *and* moved the viewport meta tag down below the other tags, so that the noindex tag should come first.
Hopefully Googlebot will respect the noindex tags now.
We'll keep our eye on this problem;
Thanks loupiote for all the research!
|
|
 |
|
 |
 |
|
 |
|
Thanks!
i'm not entirely sure either what is causing the problem, but there is no question that some of my recently-created flickr pages with "noindex" got into the google index, while they should not have.
it is also no question that the w3c markup validator finds lots of syntax errors in the flickr markup, including that one that i mentioned (and that you have fixed).
I think it's better to have the most correct markup, especially with gynamically-generated pages like on flickr.
>
If the Googlebot were so easily tripped up, Google Search wouldn't be very useful.
true.
So maybe google-bot is having hic-hups and sometimes its just misses the "noindex".
anyway, i'll keep an eye on that, and if i see more of my pages making their way into the google index, i'll let you know.
|
|
 |
|
 |
 |
|
 |
|
Okay, for someone who knows nothing about computers and whose images are showing up at google a year after I opted out of EVERYTHING I thought I needed to opt out of, can someone explain in plain english for a stupid computer novice like me what the "noindex" means?
Am I supposed to have something inparticular as a tag on all my photos to help prompt them not to image them?
Thanks in advance.
|
|
 |
|
 |
 |
|
 |
|
Disclaimer: I am a linguist, not a computer programming (Information Systems, Computer Science) major.
jeanneg, from what I understand, you do not need to do anything extra.
Opting out of public searches and API searches should prevent most search engines from indexing your photo pages.
When the search engines see the "noindex" metatag in the HMTL page, the search engine ignores it.
I believe flickr automatically adds the "noindex" HTML tag to the photo.
Or that is how I think the "noindex" metatag is supposed to work.
|
|
 |
|
 |
 |
|
 |
|
Yes, when you opt-out, Flickr adds that data to the page.
To be clear, the opt-out in question is the 2nd box on this page:
www.flickr.com/account/prefs/optout/?from=privacy
However, it can take a very long time for your pages to drop out a search engines cache if those pages were ever once crawled.
It all depends on when that search engine gets around to re-crawling your pages when you make that change.
|
|
 |
|
 |
|
|
|