Distil: Tech Partner SpotlightPosted by Guest Blog in Partner Marketplace, Technology
This guest blog comes to us from Distil.it, a featured member of the SoftLayer Technology Partners Marketplace. Distil is the first content protection network that helps companies identify and block malicious content scraping and data theft. In this video we talk to Distil CEO Rami Essaid about how the company developed, their participation in the TechStars program and most importantly, how they can help you!
Tech Partners Marketplace: http://www.softlayer.com/partners/marketplace/distil
When Google’s “Panda” Algorithm Collides with Duplicate Content
If you’re a Webmaster, it’s likely you’ve heard about the Google latest search algorithm — “Panda” — and all the benefits and implications of this update. Today, we wanted highlight what happens when Google Panda collides online with duplicate content. There have been plenty of opinions written about Google Panda and duplicate content, but we want to provide some background and examples to help you better understand how Panda and duplicate content might affect you.
What is Duplicate Content?
Duplicate content is a term used in the field of search engine optimization to describe content that appears on more than one web page, within the same web site. When multiple pages within a web site contain essentially the same content, search engines such as Google can penalize/not display that site in any relevant search results.
Should you be Concerned?
When Google released Panda, there was a significant outcry from legitimate business and publishers who were either downgraded overnight in their search engine page rank or dropped all together. For many of the businesses, the Panda algorithm reduced SEO rank and decreased visitors, site revenue and online market awareness. Some websites even experienced damage to their brand, as their customers and prospects questioned whether they were still in business.
We’ve spoken with Cult of Mac, Digital Trends and several Fortune 1000 businesses, and they’ve all said the same thing: They were penalized and downgraded as a result of the Panda release as a result of unauthorized duplication of their content. They had done everything to comply with Google in optimizing their SEO configurations, but the third-party websites scraping and duplicating their content (outside of their control) caused their page ranks to fall.
Google’s Official Stance on Duplicate Content:
“We do a good job of choosing a version of the content to show in our search results.”
“In rare situations, our algorithm may select a URL from an external site that is hosting your content without your permission. If you believe that another site is duplicating your content in violation of copyright law, you may contact the site’s host to request removal. In addition, you can request that Google remove the infringing page from our search results by filing a request under the Digital Millennium Copyright Act.”
Where is This “External” Duplicate Content Coming From?
Sometimes, it’s not clear how third-party sites obtain copies of legitimate work. Typically, they either steal it by manually or automatically scraping the content. The scraped content is then republished onto their sites, providing no credit or link to the original work.
What does that look like? It’s not difficult to find examples, but I tracked one down that seemed particularly ironic. Here’s an original article by PC World on Google’s War Against Scraper Sites:
Here’s a duplicate copy of the same story that doesn’t give any credit to the original PC World article:
It’s clear that we’re not looking at a coincidence here. The title, article content and images are all identical. The scraping site didn’t even attempt to mask their plagiarism with synonym changes. Why would they do that? Just take a look at the ads on the scraper site … They want to profit from the keywords and traffic driven by PC World’s content.
What Can You Do About It?
- Listen to Google
Google provides a list of tips for using rel=”nofollow” and canonicalization to ensure they are able to identify you as the original author of content and avoid penalizing or downgrading your business’s search ranking results.
- Learn About DMCA and Use It
If your content has already been duplicated by unauthorized publishers, you should learn more about the Digital Millennium Copyright Act (DMCA) and how it can help get help remove your content from infringing websites. Two helpful resources to start learning the law and your rights are Google’s official DMCA policy page and the United States Copyright Office.
- Be Proactive About Stopping Scrapers
We believe the best solution is to implement practices and or services to proactively prevent people or web scrapers from harvesting or scraping your content in the first place. Although web scrapers can be difficult to detect, there are tactics and/or services that can be implemented to limit certain behaviors on your website(s). Some of the quickest ways to make strides in the right direction are to implement rate limiting rules, to block traffic from blacklisted IP addresses and to use Captcha to help reduce automated web scrapers.
While none of these tactics are fool-proof ways to completely prevent your content from being duplicated, the more barriers to entry you have, the more difficult it will be for web scrapers to repeatedly duplicate your content. Distil built an enterprise-ready platform to monitor and prevent site scraping, so if you want some help in the protecting your content, try our our service. Whatever route you take, the key is to make sure that whatever tactics or services you implement, you don’t forget about your legitimate traffic … You don’t want to throw out the baby with the bathwater. Be proactive, but make sure you keep your priorities on the user-experience and quality of your site(s).
-Sean Harmer, Distil
These Partners have built their businesses on the SoftLayer Platform, and we’re excited for them to tell their stories. New Partners will be added to the Marketplace each month, so stay tuned for many more come.