Yesterday we looked at how to use the Robot Exclusion Standard (that is your “robots.txt” file to you and me) that prevents spiders from visiting certain pages on your web site. Today we’re going to look at how to do the same with the Robots Meta Tag.
This is similar to your robots.txt file but is limited to stipulating all spiders, but has the added option of stipulating that a spider can index but not follow links on the page or vice versa. This is useful for pages you want indexed but don’t want to pass Page Rank too.
I see plenty of sites that have ‘yes please follow my links and yes please spider my page,’ well, the spiders are going to do this anyway so why should you bother? The answer to this one is simply that you shouldn’t:
<META NAME="ROBOTS" CONTENT="INDEX, FOLLOW">
The three variations are please index but don’t follow, please follow but don’t index or don’t do both.
<META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW">
<META NAME="ROBOTS" CONTENT="INDEX, NOFOLLOW">
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
We’d advise if your going to use the “no follow” and “no index” tag it’s preferable to stipulate that in your robots.txt file as for the spiders it’s a bit like knocking on someone’s door,the door opens and your told not to come in. The power in the use of the Robots Meta Tag is where you want the one but don’t want the other.
It’s important to know that the index but don’t follow tag is one of the two ways you can have content spidered but don’t pass Link Juice or Page Rank to the page your linking to and is often used by rogue companies who advertise trading in either but actually cheat you out of a valid link. The other method is the “no follow” tag you can put in individual links as mentioned in our post on Benchmarking Incoming Links.