"Blackhat SEO" and distribution of malwares

Date : May 05, 2011

More and more computer attacks exploit Net user web surfing habits. Usually attack attempts come by email (e.g. phishing), or while navigating on web sites compromised by hackers. From now on they can also randomly occur from an innocent query on a search engine such as Google. A simple ordinary request on Google, can lead a Net user to download a malware, which will exploit the system if he clicks on one of the suggested links in the result page!

To do so, hackers have discovered how to «hijack» the initial use of indexing and referencing SEO (Search Engine
Optimization) algorithms used by search engines such as Google, Bing, etc, in order to propagate all kinds of malwares. These attacks made regularly the buzz in the press, because hackers used underground techniques known as «Blackhat SEO». As an example, Apple was a last victim of such an attack, as at the end of April a false antivirus for Mac OS X was circulating on the Internet by means of illegitimate referencing SEO technique, as recalled by the following ZDNet article: «New MAC OS X scareware delivered through blackhat SEO».

Let me emphasize that here, “Blackhat SEO” is not dangerous by itself, and it is not a technique made for malware propagation. Besides, it would be rather appropriate to call it “unethical SEO” or “unethical referencing”, because it reflects more “unfair” practices used by SEO specialized companies on the Internet. However by rebound, it can be used as propagation vector for malicious codes. It is what we will further see in this article.

This article aims at explaining how the practice of unethical referencing makes it possible to hackers to massively distribute with the unwanted complicity of the search engines, malwares such as viruses, worms or any other malicious codes. We will not describe here the complex meanders of referencing. We will not explain either, the complex technologies of web page indexing. We will thus not speak here about the SEA (Search Engine Advertisement) or SEM (Search Engine Marketing), also known under the name of “sponsorized links” exploited by advertising agencies. We will speak rather about the SEO, and especially about its hijack use by hackers. 

 

The search engine « holy Grail »: The first place!

On the Internet, web sites multiply techniques to increase their visibility in order to attract more visitors than their competitors. In other words, the web sites evolving in very competitive sectors try to be more visible on search engines. The search engines became de facto, the battlefields of referencing. The winner, being the one who obtains the best place in the indexing, that is to say the best PR (Page Ranking).

Indeed nearly 80% of the Net users use a search engine to get information or a resource on the Internet. In fact, needless to say that it is easier to go through a search engine to seek for something, than to memorize incredibly more complex URL that might change in time.

Page referencing, or page indexing are realms by themselves. Net users generally do not even realize that referencing strategies are being used when they browse the web. Links clicks, previously visited URL, are vital data for the search engine which will be able to induce the displayed result page. Specialized companies in referencing exploit from now on these data in order to offer a better visibility to the web sites of their customers. These companies became master in using such various technologies to appear in the first pages of results, even the first links.

Before continuing ahead on the distribution of malwares thanks to “blackhat SEO” strictly speaking, it is necessary to understand what SEO referencing techniques are about.

 

SEO, what is that?

As already said before, the SEO acronym means “Search Engine Optimization”. It is also called “organic referencing” or “natural referencing”, which is quite as obscure as the acronym. Behind these 3 letters hide complex techniques of referencing and indexing HTML web pages hosted on the Internet, and used by search engines to present the best results to the Net user requests.

What Net users are not aware of, is that creating web sites with a strong added value such as those online e-business (among others), requires a considerable work in terms of visibility. We mentioned above that the objectives are indeed to obtain a better rank than others. In the SEO language that means that the following points (among others) are crucial:

  • The choice of expressions or keywords used in the web pages,
  • The optimization of the pages,
  • The architecture of the site,
  • The ease of indexing the contents of the pages,
  • The frequentation,
  • The strategic and marketing positioning,
  • Etc.


These aspects are fundamental in the success of the good referencing of a web site. They must be accompanied by a regular follow-up of good referencing practices (kind of search engines Charter). If then we consider the actual most popular search engine Google, not less than 200 parameters are taken into account by its own referencing and indexing algorithms  (caffeine et Mayday), in charge of presenting the results of Net user requests.
In short SEO can be resumed as a whole of complex processes used to improve the visibility of web sites on the Internet.

 

Why optimizing your web site?

The first reason concerns the “Web marketing” aspect. It is necessary to have as quick as possible, the best possible PR (Page Rank) in the pages of results. In the SEO jargon, that is called the SERP (Search Engine Result Page). As showed in the study of the Optify company, the click rates of a Net user having subjected a request to a search engine, remains the highest in the first displayed page of results. The first suggested link collects approximately 37% of chance to be clicked in first by a Net user, the 10th link collects hardly only 2.5%.

As explained by paradiseo, while taking Google as an example, which collects almost 90% of search requests on the Internet in France:

  • 85% of the Net users do not go beyond the first page,
  • The first 3 results of the first page monopolize more than 60% of clicks,
  • The first position receives approximately 4 times more clicks than the second.

Consequently, it is obvious that referencing is crucial for web sites to be popular in strong competitive sectors.

 

Is SEO dangerous?

That being said, one can wonder about the induced danger. Indeed the SEO, can give a feeling of “tricking” the results. It should not be the case, beyond the “Page Ranking war” between web sites merchants. The search engine suppliers claim to take care of best practices respect regarding webmasters. We mentioned it above, the Charters exist and webmasters ought to be supervised. The risks incurred by those which do not respect them, can go from the degradation of search criteria on web links, to the downgrading, to the filtering even to insulation (aka “Google sandboxing”).

SEO becomes dangerous when it is used as unfair ends, commonly called “blackhat SEO”. In the security engineers’ mind, the word “blackhat” remains associated with hackers. In the SEO realm, this term is somewhat confusing, as it would be more appropriate to use it as “unethical SEO”. Let’s continue on these “unethical” practices.
 

What «blackhat SEO» is all about?

Unethical SEO referencing thus consists in influencing the results posted by search engines. Many techniques exist to allow favoring links over other ones. They are often implemented in a massive way. These techniques are known under the name of “Spamdexing”. They are neither more nor less, abusive referencing techniques. All consist in misleading the search engines on the quality or popularity of web pages or web sites, in order to obtain at the time of a request, the best possible classification in the result pages, preferably in the very first page, because as we mentioned above, users seldom go beyond the first page.

Without being exhaustive let’s cite well known ones:

  • “meta keywords” tags: Historically this technique was exploited a long time ago, even overexploited by webmasters to characterize using keywords their web sites. Many abuses, consisting in putting any kind of keywords in tags, which sometimes had no relationship with the site itself, were severely condemned by search engine providers such as Google. The consequence is now that now, providers do grant them a very weak weight in term of positioning criteria on result pages.
  • Hidden contents: Since the contents are indexed by the search engines, hackers use hidden contents in web pages to obtain a better rank in result pages. The commonly employed techniques are:
    • Text of the same color as the background,
    • Keywords with extremely small fonts, which will be almost invisible when displayed,
    • Contents hidden in CSS style sheet,
    • Text positioned out of the displayed zone on the screen (e.g. with negative coordinates),
    • Use of particular page properties (e.g. display:none, visibility:hidden, etc.)
    • Etc.
  • Contents duplication: This practice makes it possible to artificially increase the popularity of a site during the indexing phase. However search engines have difficulty to make the difference between the original site and the duplicated one. Although commonly used, this technique is condemned by the professional SEO companies, as well as by the search engines suppliers.
  • “pagejacking”: This technique consists in recovering the “meta” tags of an already well indexed page in order to use it for your own use, and thus to be referred better.
  •  “backlinks farming”: Literally this is a “farm of backlinks”. It is a question here of overusing within “farms” of backlinks (aka incoming/inbound links) in order to massively point to the site for which one wishes to improve its visibility. Indeed the more number of “backlinks”, the better reputation you have.
  •  “UserAgent cloaking”: “Cloaking” consists in presenting contents of different web pages according to whether the “UserAgent” belongs to (presumably) a Net user or to an indexing robot. The idea here, is to present a normal page to Net users, and a page containing many keywords (meta or others) if it is the one of search engine indexer.
  • «Referer cloaking»: As its previous counterpart, this technique consists in presenting different contents according to the previously visited link. This technique thus makes it possible to the hacker to discriminate a visitor of an engine of indexing, and consequently to present adapted contents to him.
  • “IP cloaking”: This technique is similar to the “UserAgent cloaking”, but is based on the discrimination of the IP address rather than the “UserAgent”. In the case of malicious web pages, it consists in behaving differently during a search engine indexing (IP address are already known), or if it is from a visit of a normal Net user.

Note: The “cloaking” techniques are firmly condemned by search engine suppliers. However the latter have difficulty to fight against them, because these techniques are often used by the webmasters to present pages translated according to the language of visitors.

Other techniques exist, such as those exploiting the forums, the community blogs, RSS feeds, community or social networks, in order to promote links artificially. Certain tools are able to automate posts on forums in order to promote such or such web site. Even though they are often easy to detect, since their knowledge of human language is poor, they remain however a source of annoyance easy to implement on targeted forums, blogs and others.

Note: We did not speak here about a technique known as “negative SEO”. This “blackhat” technique is used to denigrate competitors, and consists in deteriorating the note of a target web site, that is to say to decrease its visibility on search engines, by stealthily associating it with keywords, which might be unsuited to its notoriety (pornography, drug, etc).

 

Blackhat SEO and distribution of malwares

The threat in terms of security is still not clearly identified. Indeed, until now we have only not considered the inherent part in “blackhat SEO”, which one could compare to the “ticking” search results.

However the threat is real. It results from the malicious ingeniousness of hackers (blackhat ones) who managed to understand and exploit indexing and referencing algorithms such as Google for example, to promote in priority their links redirecting the Net users to real IT threats (virus, worms, backdoors, etc).

Indeed, hackers use these “blackhat” referencing techniques to randomly promote links likely to infect Net user computers, associated with ordinary search requests. To increase the chances of success, hackers had managed to make search engines index able to answer requests on “banal” keywords. These request traps are varied and unfortunately are often associated with over mediatized events such as earthquake, wars, attacks, nuclear catastrophes, death, etc) or with frequently used expressions (stock options, etc). 


At the beginning of 2010, a blackhat SEO attack aimed at many engines, indexing keywords referring to the Haitians’ drama following the earthquake.

Last January the well known American supplier, Godaddy, was targeted by a blackhat SEO attack distributing malwares through numerous web sites hosted on its networks. The attack in itself allowed to redirect the Net users to malicious web sites hosting malwares, associated with keywords referring to names of celebrity, sexual or political scandals, when typed in search engines.
 

How to protect?

We would have liked to be able to affirm it. It is unfortunately extremely difficult to find parades with the propagation of malwares through “blackhat SEO”. The solutions are most probably in the hands of the search engines suppliers and their ability to detect bad referencing practices (so called blackhat SEO), as well as malicious links. 

Recently Google had reinforced its filtering means with a new algorithm called “Panda”, in charge of hunting bad SEO practices and in case in charge of degrading the note of refractory sites if necessary. This new algorithm has the effect of a “bomb” in the media because many well known sites got their note degraded, as recalled by the following article. The penalties are now various such as for example; the note of a web page can now decrease the one of the whole site, the abuse of advertisement can also degrade the note of the site, as well as that sites in flash will be also penalized because of the difficulties related to their indexing, etc.

However it is a first step, because even if these measures are not specifically dedicated to fight the distribution of malwares they allow to fight against the means used for their spreading, and make it possible to do a little cleanup on bad SEO practices. Consequently, they will slow down for a certain time the commonly used techniques of hackers.

Finally, good security practices should remain in Net users' mind remain in terms of protections; vigilance during navigation on the Internet or when downloading files, warning IT on suspect emails or web links, weird unusual events happening on the system, and also patch management that is keeping up to date the system and applications on computers, etc.

Previous Previous Next Next Print Print