Last Thursday, I realized that this blog was being scraped. I’m sure it was being scraped before, but I never really wanted to spend time looking into it. A blog takes so much time to write and manage, that I didn’t want to add another task to my blogging plate.

But now that I have seen the sploggers in action, I realize how truly irritating and damaging sploggers are. But I guess that’s the case with all theft.

Anyways, I turned to you, my readers, to get your advice, and I got some amazing tips from the people that responded. I also did a little extra searching around, plus Darren Hoyt wrote a post with some good links in it, so here’s a summary of 10 ways to stop sploggers.

Part 1: 4 5 ways to find out if you’re being splogged

  1. I found out about one of the scrapers via FeedBurner’s Uncommon Use feature. When you’re checking your feed stats in FeedBurner, make sure to keep an eye on the Uncommon Uses section to see if any sites you’re not familiar with are showing up. Jonathan Bailey from Plagiarism Today also has a detailed post on this subject.
    Feedburner Uncommon Uses
  2. Use the Copyfeed WordPress plugin, or any other plugin that adds a footer to your feed, and add a link back to your site in the footer. This way, if a splogger publishes your complete posts on their site, you’ll get Incoming Link notices in your WordPress dashboard.
  3. Lorelle has a bunch of steps and plugins that you can use to identify content theft.
  4. Use the Digital Fingerprint WordPress plugin. This plugin places a customized digital fingerprint into blog posts, which is only visible in the feed:”Once embedded in your post, the plugin allows you to quickly and easily search the blogosphere for references to the digital fingerprint using Google Blogsearch, BlogPulse, and Sphere…It also allows the quick and easy search for your digital fingerprint of the web itself using Google, Yahoo!, and MSN. An optional quick search can be included in the dashboard itself…Lastly, the plugin provides a few resources and links to places that will help you combat splog and spam should you find your content plagiarised or stolen on another website.”Talk about comprehensive!
  5. Update Jan. 23, 2008: Joost de Valk explains five more ways that you can search around the web for your URL and feed URL to find instances where your blog is being scraped.
  6. Update Feb. 24, 2008: Jacques left a comment here suggesting using Copyscape to track down sploggers.

Part 2: You’re being splogged? 10 ways to stop ’em

  1. Alyk says to contact the sploggers and ask them to stop. This may sound simple, but it’s a great way to start…as long as the sploggers have left their email addresses or a contact form on the site. In my case, they hadn’t.
  2. Blaine, Jonathan and Jacob all suggested complaining to Google if the splogger is using Google Adwords to monetize the site. Jacob sent the following links:Easy: report them as spam to Google here:
    https://www.google.com/webmasters/tools/spamreport?hl=enHarder: submit a dmca complaint to Google:
    http://www.google.com/dmca.html
  3. Lorelle gives six steps that you can take to stop sploggers, such as how to contact them, and who to contact if they don’t respond. She also says what NOT to do.
  4. Ryan pointed me to a thread on SitePoint where a guy’s whole site was being lifted, design, database and all. The guy comes up with a creative piece of code that appears on the copycat site insulting the current owner, but it’s not a perfect workaround. He explains how he found out about this:

    I just noticed that a few of the pages visited were mywords.info/… rather than teleclick.ca/… If the owner of MyWords hadn’t been stupid enough to keep my StatCounter code on his copy of the site, I might not have found out about it for months.
    The surest way to locate mirrors of your website, however, is simply to invest a few minutes a week Googleing selections of your own content and checking for ripoffs. You’ll likely find plenty of stolen content from your sites if they’re popular enough, and can ask the re-publishers to remove it, or give you proper credit, depending on your policy.

  5. I checked one of my scrapers’ page source and noticed that they’re using FeedBurner to track their feeds, and scraping my FeedBurner feed (as opposed to the regular site feed). They’re also using Google Analytics to track site stats. I haven’t done this yet, but I plan on contacting FeedBurner and Google about this user. I’m guessing Google will have a particular interest in making sure that FeedBurner feeds aren’t being abused.
  6. A few readers suggested filing DMCA (Digital Millenium Copyright Act) notices with the host of the splogger, but aside from the fact that I suspect that this is about as effective as the paper it’s written on, it doesn’t apply to non-Americans. So if the splogger or splogee is not American, it seems this won’t apply.
  7. Use the Copyfeed WordPress plugin (see step 1 under Part 1 above), or any other plugin that adds a footer to your feed. I use Copyfeed, since it allows you to do a whole bunch of useful things to your feed, but most important I have put a copyright notice in my feed footer with a link back to my site. This way, at the very least, if someone finds my content elsewhere on the web and they like it, they can easily find my site. Another similar plugin that does the same thing is Joost de Valk’s RSS Footer plugin. Joost’s plugin automatically adds a link back to the original post in the footer as well. (Hat tip to Glenn Dixon for that one.)
  8. Some people suggest using the FeedEntryHeader WordPress plugin which adds a copyright statement and a link to the original article to the top of your feed entries. However, I’ve seen quite a few sites appear in Google Alerts where the excerpt from the post being cited is a header stating that if you’re reading this post off-site, it’s being scraped. But we want Google to scrape our sites and recommend it to others, and having that text appear in the Alert is not so user friendly.
  9. The AntiLeech WordPress plugin is for those of us who don’t just want to stop sploggers, but want to get back at them as well. AntiLeech doesn’t prevent sploggers from accessing your site; “it produces a fake set of content especially for them that includes links back to your site (and [his], too, ok?) and sends it only to them.”
  10. Only publish a partial feed. This way, if your feed is scraped, readers who want to read the whole thing will be forced to click on the “Read More” link, and will be taken to your site. The drawback? Now your feed will be a partial feed. I know that I personally don’t have patience for partial feeds, and I don’t read them, since what’s the point of having a feed if you’re forcing me to visit your site? There’s a reason I’m using feeds, and it’s because I don’t have time and patience to open up every post in its own window. Since I feel this way, I really wouldn’t want to do that to my readers. And anyway, does everyone have to suffer because of a bunch of miserable, blood-sucking sploggers?

Even with all the steps above, I suspect that the sploggers will always retain the upper hand and we will never be able to completely stop them, or any of the other dark and sleazy types lurking on the web. If we could visualize the web, I bet the dark part takes up the majority of the web, and is probably growing faster too since it’s so much easier to crank out stolen or lowest-common-denominator content than good, useful, high-quality content.