Quantcast

Sunday, April 22, 2007

Click Fraud Attacks: Emerging Trends

Click fraud attacks have become significantly more sophisticated over the last few months. At the same time, click fraud detection systems are becoming increasingly more efficient to detect smart attacks. Here, we describe three cases that were caught by Authenticlick over the last seven days.
  • Bogus Conversions

    Over a period of several months, a single distribution partner generating well over 1% of the traffic from the leading search engine network was responsible for up to 15% of the downstream conversions. All these conversions were found to be fake. The distribution partner in question was targeting advertisers where conversions consist of filling up a web form. These advertisers are an easy target for smart fraudsters. In addition to generating bogus conversions, the culprit operated from abroad and experienced an usually fast rate of exponential growth over the last two years.

  • Fraud through AOL and other "good proxies"


    Another fraud case was identified last week, generating a large proportion of clicks from known good proxies including AOL. This type of scheme is more difficult to detect. Authenticlick was able to unearth the fraudulent activity thanks to advanced methodology based on network topology metrics. It is interesting to note that the fraud scheme was detected, even though the data submitted by the search engine did not include any information about the user agent.

  • Fraud involving a symbiotic relationship between a distribution partner and an advertiser


    This interesting fraud case involves a very large number of IP addresses, but a very small number of advertisers. It was first identified by Authenticlick in April 2007. It is believed that either the advertiser and the fraudster have a symbiotic relationship, or the advertiser is a victim who benefits from click fraud as the fraudster improves the victim's ROI, through a particular type of fraud described here.


Additional Notes about Adware

The last fraud case discussed in this article is particularly interesting in the sense that it almost certainly implies viruses (adware or spyware) installed and remotely controlled over thousands of computers. Two types of viruses are currently active:
  • The first type actually triggers Internet Explorer and is best described in Google's paper. It is an Internet Explorer parasite. This type of virus is easier to detect as it generates too many clicks per user.

  • The second type of hitbot does not rely on Internet Explorer to trigger clicks. Instead, it has its own code to communicate using the HTTP protocol. This type of virus, more widespread than the previous, is more difficult to detect. Yet, as it relies on user agent lookup tables to generate clicks, Authenticlick has been able to identify this type of fraudulent activity, as criminals (so far) have not been able to correctly replicate the expected underlying multivariate distributions. Also note that we have developed a patented solution to catch this type of fraud.

Sunday, April 15, 2007

How Can Advertisers Benefit from Click Scoring?

Since click fraud detection is a rudimentary application of click scoring, one thinks of click scoring as a tool to eliminate unqualified traffic. Click scoring can actually do much more, such as determine optimum pricing associated with a click, identify new sources of potentially converting traffic, measure traffic quality in the absence of conversions or in the presence of bogus conversions, and assess the quality of distribution partners, to name a few applications. Also note that scoring is not limited to clicks but can also involve impressions and metrics such as clicks per impressions.

From the advertiser viewpoint, one important application of click scoring is to detect new sources of traffic to improve total revenue, in a way that can not be accomplished through A/B/C testing, traditional ROI optimization or SEO. The idea consists of tapping into delicately selected new traffic sources rather than improving existing ones.

Let us consider a framework where we have two types of scores:

  • Score I: generic score computed using a pool of advertisers, possibly dozens of advertisers from the same category.
  • Score II: customized score specific to a particular advertiser.

What can we do when we combine these two scores? Here's the solution:

  1. Scores I and II are good. This is usually one of the two traffic segments that advertisers are considering. Typically advertisers focus their efforts on SEO or A/B testing to further refine the quality and gain a little edge.
  2. Score I is good and score II is bad. This traffic is usually rejected. No effort is made to understand why the good traffic is not converting. Advertisers rejecting this traffic might miss major sources of revenue.
  3. Score I is bad and score II is good. This is the other traffic segment that advertisers are considering. Unfortunately this situation makes advertisers happy: they are getting conversions. However this is a red flag, indicating that the conversions might be bogus. This happens frequently when conversions consist of filling web forms. Any attempt to improve conversions (e.g. through SEO) are counter-productive. Instead, the traffic should be seriously investigated.
  4. Scores I and II are bad. Here, most of the time, the reaction consists of dropping the traffic source entirely and permanently. Again, this is a bad approach. By reducing the traffic using a schedule based on click scores, one can significantly lower exposure to bad traffic and at the same time not miss the opportunity when the traffic quality improves.

This discussion illustrates how scoring can help advertisers substantially improve their revenue.

Case Study
We have applied this concept to optimize the traffic on a partner website, where conversions consist of filling up a web form to subscribe to a newsletter.

  • One source representing 25% of the traffic was producing negative results, even though the scores were very high. After investigating the case, we realized that the landing page was not targeted for the user segment in question. After modifying the content to better target these users, the website experienced a substantial page view increase and visit depth - and higher revenue. Eventually we decided to increase this source to 50% of the total traffic.
  • Another source represented 2% of the paid clicks but 30% of the conversions from a major network. After investigation, all conversions (most of them, bogus) originating from this source were discarded, but the source continued to be monitored. Without this discovery, they would be sending newsletters to thousands of people who never actually subscribed, without knowing it (until complaints arrive).

Comparing Click Scores with Conversions: Goodness of Fit




(click on image to enlarge)

Comments:
  • Overall good fit
  • Peaks could mean:

    1. Bogus conversions
    2. Residual noise
    3. Model needs improvement (e.g. incorporate anti-rules)

  • Valleys could mean:

    1. Undetected conversions
    2. Residual noise
    3. Model needs improvement

Typical Click Score Distribution




(click on image to enlarge)

Comments:
  • Reverse bell curve
  • Scores below 425 correspond to clicks that are clearly unbillable
  • Spike at the very bottom and very top
  • 50% of the traffic has good scores
  • In this scorecard, a drop of 50 points represents a 50% drop in conversion rate: clicks with a score of 700 convert twice as frequently as clicks with a score of 650.