<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-24017879</id><updated>2012-01-28T14:56:57.206-08:00</updated><title type='text'>Click Scoring</title><subtitle type='html'>Articles and press releases related to click fraud detection, click scoring, impression fraud and real time click scoring.&lt;img src="http://www.datashaping.com/clickscoringblog.GIF"&gt;</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://clickscoring.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/24017879/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://clickscoring.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Vincent Granville</name><uri>http://www.blogger.com/profile/11458380199915437888</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://www.datashaping.com/granville2.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>10</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-24017879.post-7136153798982008391</id><published>2008-08-19T01:57:00.000-07:00</published><updated>2008-08-19T01:59:14.333-07:00</updated><title type='text'>New fraud scheme on Google (phishing / click fraud)</title><content type='html'>Fraudsters send you a fake email about your AdWord account being terminated. They ask you to renew your account by login on to a fake Google AdWord website that looks real. That's how they steal your login/password. Once your account is hijacked, they increase your daily budget and your bid for keywords that are part of their botnet system. In the process, they might also steal your credit card info or other useful info (your address for identity theft, your keyword list to feed their botnet).&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Complaint received from a client:&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Vincent....don't know if you'd be interested in this...but i use&lt;br /&gt;google ad words &amp; just recently someone hacked into my profile &amp;&lt;br /&gt;changed my daily max from $10 to $6,810, and then miraculously i&lt;br /&gt;received over 1,000 clicks that day at $5.50 per click....they were&lt;br /&gt;trying to charge me over $7K. I reported it and about a week later&lt;br /&gt;they admitted it was not legitimate. Have you heard of this&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Email sent by fraudsters:&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Renew Your Account Now !&lt;br /&gt;&lt;br /&gt;Dear Member,&lt;br /&gt;&lt;br /&gt;This is your official notification from Google Inc. that the service(s) listed below will be deactivated and deleted if not renewed immediately.&lt;br /&gt;&lt;br /&gt;As the Primary Contact, you must renew the service(s) listed below or it will be deactivated and deleted.&lt;br /&gt;&lt;br /&gt;Renew Now your Google AdWords services. [link deleted]&lt;br /&gt;&lt;br /&gt;SERVICE: Google AdWords&lt;br /&gt;EXPIRATION: August, 19 2008&lt;br /&gt;&lt;br /&gt;Thank you for using Google Inc service.&lt;br /&gt;We appreciate your business and the opportunity to serve you.&lt;br /&gt;&lt;br /&gt;Google AdWords Service . &lt;/em&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/24017879-7136153798982008391?l=clickscoring.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://clickscoring.blogspot.com/feeds/7136153798982008391/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=24017879&amp;postID=7136153798982008391' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/24017879/posts/default/7136153798982008391'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/24017879/posts/default/7136153798982008391'/><link rel='alternate' type='text/html' href='http://clickscoring.blogspot.com/2008/08/new-fraud-scheme-on-google-phishing.html' title='New fraud scheme on Google (phishing / click fraud)'/><author><name>Vincent Granville</name><uri>http://www.blogger.com/profile/11458380199915437888</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://www.datashaping.com/granville2.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-24017879.post-1644858931837787415</id><published>2008-02-24T20:29:00.001-08:00</published><updated>2008-02-24T20:29:56.195-08:00</updated><title type='text'>Invitation to join Analytic Bridge</title><content type='html'>Analytic Bridge has grown from 20 to about 400 people in just one week. We invite you to revisit our network, and sign up if you are not already a member.&lt;br /&gt;&lt;br /&gt;In the last seven days, we have added many groups, several white papers, dozens of useful links. Also, members have contributed to several forums, including&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;Explanation of Variance Inflation Factor&lt;br /&gt;&lt;li&gt;Data Validation&lt;br /&gt;&lt;li&gt;Post your best graphs in our photo section&lt;br /&gt;&lt;li&gt;How to produce nice graphs with R?&lt;br /&gt;&lt;li&gt;Data Warehousing, ETL and Business Intelligence opportunites&lt;br /&gt;&lt;li&gt;Professional Certificates (chartered statistician, SAS certified, series 6, etc.)&lt;br /&gt;&lt;li&gt;Spatial ETL Pros Needed for Leader in Geographic Business Intelligence Solutions&lt;br /&gt;&lt;li&gt;Genetic Data Mining Method for the Proper Use of the Correlation Coefficient&lt;br /&gt;&lt;li&gt;Who makes $100K or more a year?&lt;br /&gt;&lt;li&gt;Companies hiring statisticians and data miners&lt;br /&gt;&lt;li&gt;Best books for learning data mining&lt;br /&gt;&lt;li&gt;Basic Introduction to Text Mining&lt;br /&gt;&lt;li&gt;Non-Linear ARIMA using neural nets?&lt;br /&gt;&lt;li&gt;Statistics handbooks now available in the links section&lt;br /&gt;&lt;li&gt;Jobs in Switzerland&lt;br /&gt;&lt;li&gt;Interesting discussions on the Web Analytics group&lt;br /&gt;&lt;li&gt;Data mining blog&lt;br /&gt;&lt;li&gt;LinkedIn, Plaxo, Facebook and other networks&lt;br /&gt;&lt;li&gt;Building Statistical Regression Models: Straight Data are Necessary&lt;br /&gt;&lt;li&gt;Domain names for sale&lt;br /&gt;&lt;li&gt;Career paths: switching to a different industry&lt;br /&gt;&lt;li&gt;Generalized Goldbach Conjecture and Integer Coverages&lt;br /&gt;&lt;li&gt;XML job feeds available for your blog&lt;br /&gt;&lt;li&gt;Starting Salaries for Analytic Graduates&lt;br /&gt;&lt;li&gt;Useful links&lt;br /&gt;&lt;li&gt;Statistical Software - Comparative Analysis&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;To join Analytic Bridge, visit us at &lt;a href="http://www.analyticbridge.com/"&gt;http://www.analyticbridge.com/&lt;/a&gt;. Members are also entitled to a 20% discount on all products available on &lt;a href="http://www.datashapingstore.com/"&gt;DataShapingStore.com&lt;/a&gt;. Please contact us for details.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;&lt;i&gt;Vincent Granville, Ph.D.&lt;br /&gt;Founder and Principal&lt;br /&gt;&lt;a href="http://www.datashaping.com/"&gt;http://www.datashaping.com/&lt;/a&gt;&lt;br /&gt;&lt;a href="http://www.analyticbridge.com/"&gt;http://www.analyticbridge.com/&lt;/a&gt;&lt;br /&gt;&lt;/i&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/24017879-1644858931837787415?l=clickscoring.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://clickscoring.blogspot.com/feeds/1644858931837787415/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=24017879&amp;postID=1644858931837787415' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/24017879/posts/default/1644858931837787415'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/24017879/posts/default/1644858931837787415'/><link rel='alternate' type='text/html' href='http://clickscoring.blogspot.com/2008/02/invitation-to-join-analytic-bridge.html' title='Invitation to join Analytic Bridge'/><author><name>Vincent Granville</name><uri>http://www.blogger.com/profile/11458380199915437888</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://www.datashaping.com/granville2.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-24017879.post-225197545400857913</id><published>2007-07-03T17:04:00.000-07:00</published><updated>2007-07-05T13:03:47.285-07:00</updated><title type='text'>Massive Click Fraud Case Unearthed in our Laboratory</title><content type='html'>Here we provide specific details about a widespread botnet still operating.  As many as 50% of all advertisers may be victims, albeit with a low frequency.  It is connected with a particular search distribution partner on the largest search engine network.  We will call it Spiralup, although its real name is different.  Their brand is associated with spyware, though they have clearly added click fraud to their areas of focus.&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt; Their traffic has been growing exponentially over the last few years, according to Alexa (see graph below).  Note that Alexa can’t always discriminate between real and fake traffic.  Software (AlexaBooster) is available which allows a user to artificially inflate Alexa rankings.&lt;br /&gt;&lt;li&gt; Note two sharp dips in early 2006 and 2007 (see graph below).&lt;br /&gt;&lt;li&gt; In 2006, the browser distribution was different, with more Firefox, possibly indicating a network of human beings paid to click.&lt;br /&gt;&lt;li&gt; In 2007, the browser distribution shifted, favoring Internet Explorer, as they employ a botnet programmed specifically for IE but not for other browsers.&lt;br /&gt;&lt;li&gt; They continually add new advertisers to their target list, but rarely generate more than 3 clicks per day per advertiser.  Newly infected computers are assigned to advertisers recently  added to their list.&lt;br /&gt;&lt;li&gt; Advertisers accepting clicks from foreign countries, and small advertisers, are hit hardest. &lt;br /&gt;&lt;li&gt; A portion of their traffic is real, a portion of it is bogus, generated by botnets (clicking agents attached to viruses), and a portion of it comes from human beings paid to click according to a pre-specified schedule.&lt;br /&gt;&lt;li&gt; Because they have infected so many computers, they are able to use a very large pool of IP addresses, though the traffic skews towards international, and some specific IP blocks and foreign transparent proxies are widely used.&lt;br /&gt;&lt;li&gt; Their traffic patterns are associated with unrealistic variances and they generate an extremely high proportion of bogus conversions.&lt;br /&gt;&lt;li&gt; Below is a table with four sample clicks: &lt;br /&gt;&lt;ul type="square"&gt;&lt;br /&gt;&lt;li&gt;13/May/2007:08:58:54, query=data+marts, IP=xxx.139.16.154&lt;br /&gt;&lt;li&gt;02/May/2007:04:31:47, query=on+line+shopping+sears+canada, IP=xxx.55.121.2&lt;br /&gt;&lt;li&gt;06/Jan/2007:02:22:23, query=malpractice, IP=xxx.115.106.226&lt;br /&gt;&lt;li&gt;13/Feb/2007:19:33:17, query=fort+myers+mesothelioma+lawyers, IP=xxx.152.21.8&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;Details:&lt;br /&gt;&lt;ul type="square"&gt;&lt;br /&gt;&lt;li&gt; Each click is from a different advertiser.&lt;br /&gt;&lt;li&gt; Each click has a Google gclid tag.&lt;br /&gt;&lt;li&gt; The time zone is from the advertiser log.&lt;br /&gt;&lt;li&gt; The first click was billed at full price (even days later, the charge did not disappear).  It resulted in a bogus conversion.  It also triggered an HTTP request on the target page for a blank stylesheet.&lt;br /&gt;&lt;li&gt; This means that the botnet is a parasite of Internet Explorer, and does not have its own code to connect to the Internet, but rather relies on Internet Explorer to do so. &lt;br /&gt;&lt;li&gt; All four clicks have IE 6 as a user agent, as one would expect.&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;&lt;i&gt;Spiralup&lt;/i&gt;'s exponential traffic growth:&lt;br /&gt;&lt;p&gt;&lt;br /&gt;&lt;center&gt;&lt;br /&gt;&lt;img src="http://www.datashaping.com/spiralup2.JPG"&gt;&lt;br /&gt;&lt;/center&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/24017879-225197545400857913?l=clickscoring.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://clickscoring.blogspot.com/feeds/225197545400857913/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=24017879&amp;postID=225197545400857913' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/24017879/posts/default/225197545400857913'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/24017879/posts/default/225197545400857913'/><link rel='alternate' type='text/html' href='http://clickscoring.blogspot.com/2007/07/massive-click-fraud-case-unearthed-in.html' title='Massive Click Fraud Case Unearthed in our Laboratory'/><author><name>Vincent Granville</name><uri>http://www.blogger.com/profile/11458380199915437888</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://www.datashaping.com/granville2.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-24017879.post-2554498596705275442</id><published>2007-04-22T19:06:00.000-07:00</published><updated>2007-04-24T15:46:02.734-07:00</updated><title type='text'>Click Fraud Attacks: Emerging Trends</title><content type='html'>Click fraud attacks have become significantly more sophisticated over the last few months. At the same time, click fraud detection systems are becoming increasingly more efficient to detect smart attacks.  Here, we describe three cases that were caught by &lt;a href="http://www.authenticlick.net"&gt;Authenticlick&lt;/a&gt; over the last seven days.&lt;ul type="square"&gt;&lt;li&gt;Bogus Conversions&lt;br /&gt;&lt;br /&gt;Over a period of several months, a single distribution partner generating well over 1% of the traffic from the leading search engine network was responsible for up to 15% of the downstream conversions. All these conversions were found to be fake. The distribution partner in question was targeting advertisers where conversions consist of filling up a web form. These advertisers are an easy target for smart fraudsters. In addition to generating bogus conversions, the culprit operated from abroad and experienced an usually fast rate of exponential growth over the last two years.&lt;br /&gt;&lt;br /&gt;&lt;li&gt;Fraud through AOL and other "good proxies"&lt;p&gt;&lt;br /&gt;Another fraud case was identified last week, generating a large proportion of clicks from known good proxies including AOL. This type of scheme is more difficult to detect. &lt;a href="http://www.authenticlick.net"&gt;Authenticlick&lt;/a&gt; was able to unearth the fraudulent activity thanks to advanced methodology based on network topology metrics. It is interesting to note that the fraud scheme was detected, even though the data submitted by the search engine did not include any information about the user agent.&lt;br /&gt;&lt;br /&gt;&lt;li&gt;Fraud involving a symbiotic relationship between a distribution partner and an advertiser &lt;p&gt;&lt;br /&gt;This interesting fraud case involves a very large number of IP addresses, but a very small number of advertisers. It was first identified by &lt;a href="http://www.authenticlick.net"&gt;Authenticlick&lt;/a&gt; in April 2007. It is believed that either the advertiser and the fraudster have a symbiotic relationship, or the advertiser is a victim who benefits from click fraud as the fraudster improves the victim's ROI, through a particular type of fraud described &lt;a href="http://clickscoring.blogspot.com/2007/03/click-fraud-new-definition-and.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;&lt;b&gt;Additional Notes about Adware&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The last fraud case discussed in this article is particularly interesting in the sense that it almost certainly implies viruses (adware or spyware) installed and remotely controlled over thousands of computers. Two types of viruses are currently active:&lt;ul type="square"&gt;&lt;li&gt;The first type actually triggers Internet Explorer and is best described in &lt;a href="http://datashaping.com/daswani.pdf"&gt;Google's paper&lt;/a&gt;. It is an Internet Explorer parasite. This type of virus is easier to detect as it generates too many clicks per user. &lt;br /&gt;&lt;br /&gt;&lt;li&gt;The second type of hitbot does not rely on Internet Explorer to trigger clicks. Instead, it has its own code to communicate using the HTTP protocol. This type of virus, more widespread than the previous, is more difficult to detect. Yet, as it relies on user agent lookup tables to generate clicks, Authenticlick has been able to identify this type of fraudulent activity, as criminals (so far) have not been able to correctly replicate the expected underlying multivariate distributions. Also note that we have developed a patented solution to catch this type of fraud.&lt;br /&gt; &lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/24017879-2554498596705275442?l=clickscoring.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://clickscoring.blogspot.com/feeds/2554498596705275442/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=24017879&amp;postID=2554498596705275442' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/24017879/posts/default/2554498596705275442'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/24017879/posts/default/2554498596705275442'/><link rel='alternate' type='text/html' href='http://clickscoring.blogspot.com/2007/04/click-fraud-attacks-emerging-trends.html' title='Click Fraud Attacks: Emerging Trends'/><author><name>Vincent Granville</name><uri>http://www.blogger.com/profile/11458380199915437888</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://www.datashaping.com/granville2.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-24017879.post-5286149681685545859</id><published>2007-04-15T20:30:00.000-07:00</published><updated>2007-04-16T01:24:42.644-07:00</updated><title type='text'>How Can Advertisers Benefit from Click Scoring?</title><content type='html'>Since click fraud detection is a rudimentary application of click scoring, one thinks of click scoring as a tool to eliminate unqualified traffic. Click scoring can actually do much more, such as determine optimum pricing associated with a click, identify new sources of potentially converting traffic, measure traffic quality in the absence of conversions or in the presence of bogus conversions, and assess the quality of distribution partners, to name a few applications. Also note that scoring is not limited to clicks but can also involve impressions and metrics such as clicks per impressions.&lt;br /&gt;&lt;br /&gt;From the advertiser viewpoint, one important application of click scoring is to detect new sources of traffic to improve total revenue, in a way that can not be accomplished through A/B/C testing, traditional ROI optimization or SEO. The idea consists of tapping into delicately selected new traffic sources rather than improving existing ones.&lt;br /&gt;&lt;br /&gt;Let us consider a framework where we have two types of scores:&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;Score I: generic score computed using a pool of advertisers, possibly dozens of advertisers from the same category.&lt;br /&gt;&lt;li&gt;Score II: customized score specific to a particular advertiser.&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;What can we do when we combine these two scores? Here's the solution:&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt;Scores I and II are good. This is usually one of the two traffic segments that advertisers are considering. Typically advertisers focus their efforts on SEO or A/B testing to further refine the quality and gain a little edge.&lt;br /&gt;&lt;li&gt;Score I is good and score II is bad. This traffic is usually rejected. No effort is made to understand why the good traffic is not converting. Advertisers rejecting this traffic might miss major sources of revenue.&lt;br /&gt;&lt;li&gt;Score I is bad and score II is good. This is the other traffic segment that advertisers are considering. Unfortunately this situation makes advertisers happy: they are getting conversions. However this is a red flag, indicating that the conversions might be bogus. This happens frequently when conversions consist of filling web forms. Any attempt to improve conversions (e.g. through SEO) are counter-productive. Instead, the traffic should be seriously investigated.&lt;br /&gt;&lt;li&gt;Scores I and II are bad. Here, most of the time, the reaction consists of dropping the traffic source entirely and permanently. Again, this is a bad approach. By reducing the traffic using a schedule based on click scores, one can significantly lower exposure to bad traffic and at the same time not miss the opportunity when the traffic quality improves.&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;This discussion illustrates how scoring can help advertisers substantially improve their revenue.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Case Study&lt;/b&gt;&lt;br /&gt;We have applied this concept to optimize the traffic on a partner website, where conversions consist of filling up a web form to subscribe to a newsletter. &lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;One source  representing  25% of the traffic was producing negative results, even though the scores were very high. After investigating the case, we realized that the landing page was not targeted for the user segment in question. After modifying the content to better target these users, the website experienced a substantial page view increase and visit depth - and higher revenue. Eventually we decided to increase this source to 50% of the total traffic.&lt;br /&gt;&lt;li&gt;Another source represented 2% of the paid clicks but 30% of the conversions from a major network. After investigation, all conversions (most of them, bogus) originating from this source were discarded, but the source continued to be monitored. Without this discovery, they would be sending newsletters to thousands of people who never actually subscribed, without knowing it (until complaints arrive).&lt;br /&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/24017879-5286149681685545859?l=clickscoring.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://clickscoring.blogspot.com/feeds/5286149681685545859/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=24017879&amp;postID=5286149681685545859' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/24017879/posts/default/5286149681685545859'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/24017879/posts/default/5286149681685545859'/><link rel='alternate' type='text/html' href='http://clickscoring.blogspot.com/2007/04/how-can-advertisers-benefit-from-click.html' title='How Can Advertisers Benefit from Click Scoring?'/><author><name>Vincent Granville</name><uri>http://www.blogger.com/profile/11458380199915437888</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://www.datashaping.com/granville2.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-24017879.post-7363089582249137158</id><published>2007-04-15T17:29:00.000-07:00</published><updated>2007-04-19T00:00:24.047-07:00</updated><title type='text'>Comparing Click Scores with Conversions: Goodness of Fit</title><content type='html'>&lt;center&gt;&lt;br /&gt;&lt;a href="http://www.datashaping.com/gof.GIF"&gt;&lt;img src="http://www.datashaping.com/gofx.GIF"&gt;&lt;/a&gt;&lt;br&gt;&lt;br /&gt;(click on image to enlarge)&lt;br /&gt;&lt;/center&gt;&lt;br /&gt;&lt;b&gt;Comments:&lt;/b&gt;&lt;ul&gt;&lt;li&gt;Overall good fit&lt;br /&gt;&lt;li&gt;Peaks could mean:&lt;ol&gt;&lt;br /&gt;&lt;li&gt;Bogus conversions&lt;br /&gt;&lt;li&gt;Residual noise&lt;br /&gt;&lt;li&gt;Model needs improvement (e.g. incorporate anti-rules)&lt;/ol&gt;&lt;br /&gt;&lt;li&gt;Valleys could mean:&lt;ol&gt;&lt;br /&gt;&lt;li&gt;Undetected conversions&lt;br /&gt;&lt;li&gt;Residual noise&lt;br /&gt;&lt;li&gt;Model needs improvement&lt;/ol&gt;&lt;br /&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/24017879-7363089582249137158?l=clickscoring.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://clickscoring.blogspot.com/feeds/7363089582249137158/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=24017879&amp;postID=7363089582249137158' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/24017879/posts/default/7363089582249137158'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/24017879/posts/default/7363089582249137158'/><link rel='alternate' type='text/html' href='http://clickscoring.blogspot.com/2007/04/comparing-click-score-with-conversions.html' title='Comparing Click Scores with Conversions: Goodness of Fit'/><author><name>Vincent Granville</name><uri>http://www.blogger.com/profile/11458380199915437888</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://www.datashaping.com/granville2.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-24017879.post-1055869272325219059</id><published>2007-04-15T10:23:00.000-07:00</published><updated>2007-04-19T00:03:05.553-07:00</updated><title type='text'>Typical Click Score Distribution</title><content type='html'>&lt;center&gt;&lt;br /&gt;&lt;a href="http://www.datashaping.com/scores.GIF"&gt;&lt;img src="http://www.datashaping.com/scoresx.GIF"&gt;&lt;/a&gt;&lt;br&gt;&lt;br /&gt;(click on image to enlarge)&lt;br /&gt;&lt;/center&gt;&lt;br /&gt;&lt;b&gt;Comments:&lt;/b&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Reverse bell curve&lt;br /&gt;&lt;li&gt;Scores below 425 correspond to clicks that are clearly unbillable&lt;br /&gt;&lt;li&gt;Spike at the very bottom and very top&lt;br /&gt;&lt;li&gt;50% of the traffic has good scores&lt;br /&gt;&lt;li&gt;In this scorecard, a drop of 50 points represents a 50% drop in conversion rate: clicks with a score of 700 convert twice as frequently as clicks with a score of 650.&lt;br /&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/24017879-1055869272325219059?l=clickscoring.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://clickscoring.blogspot.com/feeds/1055869272325219059/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=24017879&amp;postID=1055869272325219059' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/24017879/posts/default/1055869272325219059'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/24017879/posts/default/1055869272325219059'/><link rel='alternate' type='text/html' href='http://clickscoring.blogspot.com/2007/04/typical-click-score-distribution.html' title='Typical Click Score Distribution'/><author><name>Vincent Granville</name><uri>http://www.blogger.com/profile/11458380199915437888</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://www.datashaping.com/granville2.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-24017879.post-8685241527537395705</id><published>2007-03-21T00:30:00.000-07:00</published><updated>2007-04-19T00:50:14.791-07:00</updated><title type='text'>Click Fraud: New Definition and Methodology to Assess Generic Traffic Quality</title><content type='html'>&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;strong style="mso-bidi-font-weight: normal"&gt;1. What is click fraud? &lt;p&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;Click fraud is usually defined as the act of purposely clicking on ads on pay-per-click programs with no interest in the target web site. Two types of fraud are usually mentioned:&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;ul style="MARGIN-TOP: 0in" type="square"&gt;&lt;li class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-list: l2 level1 lfo1; tab-stops: list .5in"&gt;An advertiser clicking on competitor ads to deplete their ad spend budgets, with fraud frequently taking place early in the morning and through multiple distribution partners:&lt;span style="mso-spacerun: yes"&gt; &lt;/span&gt;AOL, Ask.com, MSN, Google, Yahoo, etc. &lt;/li&gt;&lt;li class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-list: l2 level1 lfo1; tab-stops: list .5in"&gt;A malicious distribution partner trying to increase its income, using clickbots or paid human beings to generate traffic that looks like genuine clicks.&lt;/li&gt;&lt;/ul&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;While these are two important sources of non-converting traffic, there are many other sources of poor traffic. Some of them are sometimes referred to as invalid clicks rather than click fraud, but from the advertiser or publisher viewpoint, there is no difference. In this paper, we are considering all types of non billable or partially billable traffic, whether it is the result of fraud or not, whether there is or there is no intent to defraud, and whether there is or there is not a financial incentive to generate the traffic in question. These sources of undesirable traffic include:&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;ul style="MARGIN-TOP: 0in" type="square"&gt;&lt;li class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-list: l3 level1 lfo2; tab-stops: list .5in"&gt;&lt;strong style="mso-bidi-font-weight: normal"&gt;Accidental fraud&lt;/strong&gt;: a home-made robot not designed for click fraud purposes, running loose, out of control, clicking on every links, possibly because of a design flaw. An example is a robot run by spammers harvesting email addresses. This robot was not designed for click fraud purposes, nevertheless ended up costing money to advertisers. &lt;/li&gt;&lt;li class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-list: l3 level1 lfo2; tab-stops: list .5in"&gt;&lt;strong style="mso-bidi-font-weight: normal"&gt;Political activists&lt;/strong&gt;: people with no financial incentives, but motivated by hate. This kind of clicking activity has been found against companies recruiting people in class action lawsuits, and results in artificial clicks and bogus conversions. It is a pernicious kind of click fraud because the victim thinks its PPC campaigns generate many leads, while in reality most of these leads (email addresses) are bogus.&lt;/li&gt;&lt;li class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-list: l3 level1 lfo2; tab-stops: list .5in"&gt;&lt;strong style="mso-bidi-font-weight: normal"&gt;Disgruntled individuals&lt;/strong&gt;: it could be an employee working for a PPC advertiser or a search engine, who was recently fired. Or it could be a publisher who believes to be unjustifiably banned.&lt;/li&gt;&lt;li class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-list: l3 level1 lfo2; tab-stops: list .5in"&gt;&lt;strong style="mso-bidi-font-weight: normal"&gt;Unethical guys in the PPC community&lt;/strong&gt;: small search engines trying to make their competitor look bad by generating unqualified clicks, or shareholder fraud.&lt;/li&gt;&lt;li class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-list: l3 level1 lfo2; tab-stops: list .5in"&gt;&lt;strong style="mso-bidi-font-weight: normal"&gt;Organized criminals&lt;/strong&gt;: spammers and other internet pirates used to run bots and viruses, who found that their devices could be programmed to generate click fraud. Terrorism funding comes in this category, and is investigated by the both FBI and the SEC.&lt;/li&gt;&lt;li class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-list: l3 level1 lfo2; tab-stops: list .5in"&gt;&lt;strong style="mso-bidi-font-weight: normal"&gt;Hackers&lt;/strong&gt;: many people have now access to home made web robots (the source code in Perl or Java is available for free). While it is easy to fabricate traffic with a robot, it is more complicated to emulate legitimate traffic as it requires spoofing thousands of ordinary IP addresses – not something any amateur can do well. Some individuals might find this as a challenge and generate high quality emulated traffic, just for the sake of it, with no financial incentives. &lt;/li&gt;&lt;li class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-list: l3 level1 lfo2; tab-stops: list .5in"&gt;&lt;strong&gt;Traditional media&lt;/strong&gt; losing market share to PPC advertising have incentive to contribute to click fraud. &lt;/li&gt;&lt;/ul&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;In this paper, we will be even more general by encompassing other sources of problems not generally labeled as click fraud, but sometimes referred to as invalid, non-billable, or low-quality clicks. This includes&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;ul style="MARGIN-TOP: 0in" type="square"&gt;&lt;li class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-list: l4 level1 lfo3; tab-stops: list .5in"&gt;&lt;strong style="mso-bidi-font-weight: normal"&gt;Impression fraud&lt;/strong&gt;: impressions and clicks should always be considered jointly, not separately. This can be an issue for search engines, as their need to join very large databases and match users with both impressions and clicks. In some schemes, fraudulent impressions are generated to make a competitor’s CTR look low. Advanced schemes use good proxy servers (e.g. AOL) to hide the activity. When the CTR drops low enough, the competitor ad is not displayed anymore. This scheme is usually associated with &lt;strong style="mso-bidi-font-weight: normal"&gt;self-clicking&lt;/strong&gt;, a practice where an advertiser clicks on its own ads though proxy servers to improve its ranking, and thus improve its position in search result pages. This scheme targets both paid and organic traffic.&lt;/li&gt;&lt;li class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-list: l4 level1 lfo3; tab-stops: list .5in"&gt;&lt;strong style="mso-bidi-font-weight: normal"&gt;Multiple clicks&lt;/strong&gt;: while multiple clicks are not necessarily fraudulent, they end up either (i) costing lots of money to advertisers when they are billed at the full price or (ii) costing lots of money to publishers and search engines if only the first click is charged for. Another issue is how to accurately determine that two clicks – say five minute apart – are attached to the same user. &lt;/li&gt;&lt;li class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-list: l4 level1 lfo3; tab-stops: list .5in"&gt;&lt;strong style="mso-bidi-font-weight: normal"&gt;Fictitious fraud&lt;/strong&gt;: clicks that appear as fraudulent, but are never charged for. These clicks can be made up by unethical click fraud companies. Or they can be the result of testing campaigns, and we call them click noise. A typical example is Googlebot. While Google never charges for clicks originating from its Googlebot robot, other search engines that do not have the most updated list of Googlebot IP addresses might accidentally charge for these clicks. Another example of fictitious fraud further discussed in this paper is fictitious clicks. We explain what fictitious clicks are and how they can be detected.&lt;/li&gt;&lt;/ul&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;strong style="mso-bidi-font-weight: normal"&gt;2. A Black and White Universe, or is it Grey? &lt;p&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;Our experience has shown that web traffic isn’t black or white, and that there is a whole range from low quality to great traffic. Also non converting traffic might not necessarily be bad, and in many cases can actually be very good. Lack of conversions might be due to poor ads, or poorly targeted ads. This raises two points:&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;ul style="MARGIN-TOP: 0in" type="square"&gt;&lt;li class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-list: l0 level1 lfo4; tab-stops: list .5in"&gt;&lt;strong style="mso-bidi-font-weight: normal"&gt;Traffic scoring&lt;/strong&gt;: while as much as 5% of the traffic from any source can be easily and immediately identified as totally unbillable, with no chance of ever converting, a much larger portion of the traffic has generic quality issues – issues that are not specific to a particular advertiser. A traffic scoring approach (click or impression scoring) provides a much more actionable mechanism both for search engines interested in ranking distribution partners, and for advertisers refining their ad campaigns.&lt;/li&gt;&lt;li class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-list: l0 level1 lfo4; tab-stops: list .5in"&gt;A &lt;strong style="mso-bidi-font-weight: normal"&gt;generic, universal scoring&lt;/strong&gt; approach allows advertisers with limited or no ROI metrics to test new sources of traffic, knowing beforehand where the generically good traffic is, regardless of conversions. This can help advertisers substantially increase their reach and tap on new traffic sources as opposed to obtain very small ROI improvements from A/B testing. Some advertisers converting offline, victim of bogus conversions or interested in branding will find click scores most valuables.&lt;/li&gt;&lt;/ul&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;A scoring approach can help search engines determine the optimum price for multiple clicks (here I mean true user-generated multiple clicks, not a double click that results from a technical glitch). By incorporating the score in their &lt;strong style="mso-bidi-font-weight: normal"&gt;smart pricing&lt;/strong&gt; algorithm, they can reduce the loss due to the simplified business rule “one click per ad per user per day”.&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;Search engine, publishers and advertisers can all win, as poor quality publishers can now be accepted in a network, but are priced correctly so that the advertiser still has a positive ROI. And good publisher experiencing drop in quality can have their commission lowered according to click scores, rather than being discontinued outright. When their traffic gets better, their commission increases accordingly, based on scores.&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;In order to make sense for search engines, a scoring system needs to be as generic as possible. The scores that we have developed meet this criterion. Our click scores have been designed to match the conversion rate distribution, using very &lt;strong style="mso-bidi-font-weight: normal"&gt;generic conversions&lt;/strong&gt;, taking into account bogus conversions, and based on patent-pending methodology to match a conversion with a click, through correct user identification. As everybody knows, an IP can have multiple users attached to it, and a single user can have multiple IP addresses within a two minute period. &lt;strong style="mso-bidi-font-weight: normal"&gt;Cookies&lt;/strong&gt; (particularly in server logs, less so in redirect logs) also have notorious flaws, and we do not rely on cookies when dealing with advertiser server log data.&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;We have designed scores based on click logs, relying – among other - on &lt;strong style="mso-bidi-font-weight: normal"&gt;network topology&lt;/strong&gt; metrics. We also have designed scores based on advertiser server logs, also relying on network topology metrics (distribution partners, unique browsers per IP cluster, etc.) and even on impression-to-click ratio and other search engine metrics, as we reconcile server logs with search engine reports to get the most accurate picture. &lt;em style="mso-bidi-font-style: normal"&gt;Using search engine metrics to score advertiser traffic allow us to design good scores for search engine data, and the other way around as search engine scores are correlated with true conversions. It also makes us one of the very few third party traffic scoring company serving both sides equally well.&lt;/em&gt; &lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;When dealing with advertiser server logs, the &lt;strong style="mso-bidi-font-weight: normal"&gt;reconciliation&lt;/strong&gt; process and the use of appropriate tags (e.g. Google’s gclid) whenever possible, allow us to not count clicks that are an artifact of browser technology. We have actually submitted a patent to eliminate what is called “&lt;strong style="mso-bidi-font-weight: normal"&gt;fictitious clicks&lt;/strong&gt;” by Google, and more generally, to eliminate clicks from clickbots.&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;Advertiser scores are designed to be a good indicator of conversion rate. Search engine scores use a combination of weights based both on expert knowledge and advertiser data. Score have been &lt;strong style="mso-bidi-font-weight: normal"&gt;smoothed&lt;/strong&gt; and &lt;strong style="mso-bidi-font-weight: normal"&gt;standardized&lt;/strong&gt; using the same methodology used for credit card scoring. The best quality assessment systems will rely on both our &lt;strong style="mso-bidi-font-weight: normal"&gt;real-time&lt;/strong&gt; and less granular scores, such as &lt;strong style="mso-bidi-font-weight: normal"&gt;end-of-day&lt;/strong&gt;.&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;The use of a smooth score, based on solid metrics, substantially reduce &lt;strong style="mso-bidi-font-weight: normal"&gt;false positives&lt;/strong&gt;.&lt;span style="mso-spacerun: yes"&gt; &lt;/span&gt;If a single rule is triggered, or even two rules are triggered, it might barely penalize the click. Also, if a rule is triggered by too many clicks or not correlated with true conversions, it is ignored. For instance, a rule formerly known as “double click” (with enough time between the two clicks) has been found to be a good indicator of conversion, and was changed from a rule into an anti-rule in our system, whenever the correlation is positive. A click with no external referral but otherwise normal will not be penalized, after score standardization.&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;strong style="mso-bidi-font-weight: normal"&gt;3. Mathematical Model &lt;p&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;The scoring methodology developed by &lt;a href="http://www.authenticlick.net"&gt;Authenticlick&lt;/a&gt; is state-of-the art. It is based on almost 30 years of experience in auditing, statistics and fraud detection, both in real-time and on historical data. Several patents are currently pending.&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;It combines sophisticated cross-validation, design of experiments, linkage and unsupervised clustering to find new rules, machine learning, and the most advanced models ever used in scoring, with a parallel implementation and fast, robust algorithms to produce at once a large number of small overlapping decision trees. The clustering algorithm is a hybrid combination of unique decision-tree technology with a new type of PLS logistic stepwise regression to handle dozens of thousand highly redundant metrics. It provides meaningful regression coefficients computed in a very short amount of time, and efficiently handles interaction between rules.&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;Some aspects of the methodology show limited similarities with ridge regression, tree bagging and tree boosting. Below we compare the efficiency of different systems to detect click fraud on highly realistic simulated data. The criterion for comparison is the mean square error, a metric that measures the fit between scored clicks and conversions:&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;ul style="MARGIN-TOP: 0in" type="square"&gt;&lt;li class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-list: l1 level1 lfo5; tab-stops: list .5in"&gt;Scoring system with identical weights: 60% improvement over binary (fraud / non fraud) &lt;span style="mso-spacerun: yes"&gt;&lt;/span&gt;approach&lt;/li&gt;&lt;li class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-list: l1 level1 lfo5; tab-stops: list .5in"&gt;First-order PLS regression: 113% improvement over binary approach&lt;/li&gt;&lt;li class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-list: l1 level1 lfo5; tab-stops: list .5in"&gt;Full standard regression (not recommended as it provides highly unstable and non-interpretable results): 157% improvement over binary approach&lt;/li&gt;&lt;li class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-list: l1 level1 lfo5; tab-stops: list .5in"&gt;Second-order PLS regression: 197% improvement over binary approach, easy interpretation and robust, nearly parameter-free technique&lt;/li&gt;&lt;/ul&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;Substantial additional improvement is achieved when the decision trees component is added to the mix. Improvement rates on real data are similar. &lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;strong style="mso-bidi-font-weight: normal"&gt;4. Bogus Conversions &lt;p&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;strong style="mso-bidi-font-weight: normal"&gt;&lt;p&gt;&lt;/p&gt;&lt;/strong&gt;&lt;p&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;The reason we elaborate a bit on bogus conversions is because its impact is worse than most people think. If not taken care of, it can make a fraud detection system seriously biased. Search engines that rely on pre-sales or non-sales conversions such as sign-up forms to assess traffic performance can be misled into thinking that some traffic is good when it actually is poor, and the other way around.&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;Usually, the advertiser is not willing to provide too much information to the search engine, and thus conversions are computed generally as a result of the advertising placing some JavaScript code or a clear gif on target conversion pages. The search engine is then able to track conversions on these pages. However, the search engine has no control on which “converting pages” the advertiser wants to track. Also, the search engine has no visibility on what is happening between the click and the conversion, or after the conversion. If the search engine has access to pre-sale data only, the risk for bogus conversions is high. We have actually noticed a significant increase in bogus conversions from some specific traffic segment.&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;Another issue with bogus conversions is when an advertiser (let’s call it an ad broker) purchases traffic upstream, and then acts as a search engine and distributes the traffic downstream to other advertisers. This business model is widespread. &lt;span style="mso-spacerun: yes"&gt;&lt;/span&gt;If the traffic upstream is artificial but results in many bogus conversions – a conversion being a click or lead delivered downstream – the ad broker does not see a drop in ROI. She might actually see an increase in ROI. Only the advertisers downstream start to complain. Once the problem starts being addressed, it might be too late and can cost the ad broker to loose clients. Had the ad broker used a scoring system such as ours, the bogus conversions would have been detected early, even if the ROI was unchanged.&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;This business flaw can be exploited by criminals running a network of distribution partners. Smart criminals will hit this type of “ad broker” advertisers harder: the criminals can generate bogus clicks to make money themselves, and as long as they generate a decent amount of bogus conversions, the victim is making money too and might not notice the scheme. If the conversions are tracked by the upstream search engine (where the traffic originates), the clicks might erroneously be considered very good.&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;strong style="mso-bidi-font-weight: normal"&gt;5. A Few Misconceptions &lt;p&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;strong style="mso-bidi-font-weight: normal"&gt;&lt;p&gt;&lt;/p&gt;&lt;/strong&gt;&lt;p&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;It has been argued that the victims of click fraud are good publishers, not advertisers as advertisers automatically adjust their bids. However, this does not apply to advertisers lacking good conversion metrics (e.g. if conversion takes place offline) nor smaller advertisers who do not update bids and keywords in real time. It can actually lead advertisers to permanently eliminate whole traffic segments, and lack the good ROI when the fraud problem gets fixed on the network. On some 2&lt;sup&gt;nd&lt;/sup&gt;-tier networks, &lt;strong style="mso-bidi-font-weight: normal"&gt;impression fraud&lt;/strong&gt; can lead an advertiser to be kicked out one day, without the ability to ever come back. Both the search engine and the advertiser lose in this case, and the one who wins is the bad guys now displaying cheesy, irrelevant ads on the network. The website user loses too as all good ads have been replaced with irrelevant material. &lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;Another point that we sometimes hear is that 3&lt;sup&gt;rd&lt;/sup&gt; party auditors do not have access to the right data. Again, not only auditors with large volume of traffic can track network flows just like search engines do, but in addition they have access to more comprehensive conversion data, and are better equipped to detect &lt;strong style="mso-bidi-font-weight: normal"&gt;bogus conversions&lt;/strong&gt;. In our case, we process search engine and advertiser data: large volumes of data in both cases. However,&lt;span style="mso-spacerun: yes"&gt; &lt;/span&gt;&lt;span style="mso-spacerun: yes"&gt;&lt;/span&gt;some auditing firms lacking statistical expertise and / or domain knowledge have had serious flaws in their counting methodology. These flaws have been highly publicized by Google, and overestimated. Due to “&lt;strong style="mso-bidi-font-weight: normal"&gt;fictitious clicks&lt;/strong&gt;”, 1000 clicks are on average reported as 1,400 clicks by some auditing firms, according to a well known source. The 400 extra “non-clicks” or “fictitious clicks” (they really never existed) are said to be from users clicking on the back button of their browser. It is well known that most visits are just one-page long, and content displayed by back-clicking with your browser is usually served by the browser cache, not by the advertiser server logs. Thus this 1,400 / 1,000 ratio does not make sense. We believe that the issue is of a different nature, such as counting all http requests associated with one page as the click tags are attached to all requests, depending on server configuration. It is also an issue that we have addressed long ago.&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;Auditing firms performing good quality reconciliation also have access to many metrics typically used by fraud detection systems for search engines: average ad position, bid, impression-to-click ratio, etc.&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;span style="mso-spacerun: yes"&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;p&gt;Finally, many systems to detect fraud are still essentially based on &lt;strong style="mso-bidi-font-weight: normal"&gt;outlier detection&lt;/strong&gt; and detecting shifts from average. Based on our experience in the credit card fraud industry, we know that most fraudsters try very hard to look as average as possible, avoiding expensive or cheap clicks, using the right distribution of user agents, generating a small random number of clicks per infected computer per day, except possibly for clicks going through AOL or other proxies. This type of fraud needs a truly multivariate approach, looking at billions of combinations of several carefully selected variables simultaneously, looking for statistical evidence in billions of tiny click segments, to unearth the more sophisticated fraud cases impacting large volume of clicks, possibly orchestrated by terrorists or large corrupt financial institutions rather than distribution partners.&lt;/p&gt;&lt;p class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;p&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/24017879-8685241527537395705?l=clickscoring.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://clickscoring.blogspot.com/feeds/8685241527537395705/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=24017879&amp;postID=8685241527537395705' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/24017879/posts/default/8685241527537395705'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/24017879/posts/default/8685241527537395705'/><link rel='alternate' type='text/html' href='http://clickscoring.blogspot.com/2007/03/click-fraud-new-definition-and.html' title='Click Fraud: New Definition and Methodology to Assess Generic Traffic Quality'/><author><name>Vincent Granville</name><uri>http://www.blogger.com/profile/11458380199915437888</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://www.datashaping.com/granville2.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-24017879.post-115177632866824188</id><published>2006-07-01T10:50:00.000-07:00</published><updated>2007-02-19T20:53:44.546-08:00</updated><title type='text'>Efficient Click Fraud Detection using Advanced Analytics</title><content type='html'>To some extent, the technology to combat click fraud is similar to what banks are using to combat credit card fraud. The best systems are based on statistical scoring technology, as the transaction - a click in our context - is usually not either bad or good.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;Multiple scoring systems based e.g. on IP and click scores, scorecards and metric mix optimization are the basic ingredients. Because of the vast amount of data, and potentially millions of metrics used in a good scoring system, combinatorial optimization is required, using algorithms such as Markov Chain Monte Carlo or simulated annealing.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;While scoring advertiser data can be viewed as a regression problem, the dependent variable being the conversion metric, scoring search engine data is more challenging as conversion data is not readily available. Even when dealing with advertiser data, we have several issues to address. First, the scores need to be standardized. Two identical ad campaigns might perform very differently if the landing pages are different. The scoring system needs to address this issue.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;Also, while scoring can be viewed as a regression problem, it is a very difficult one. First, the metrics involved are usually highly correlated, making the problem ill-conditioned from a mathematical viewpoint. There might be more metrics (and thus more regression coefficients) than observed clicks, making the regression approach highly unstable. Finally, the regression coefficients - also referred to as weights - must be constrained to take only a few potential values. The dependent variable being binary, we are dealing with a sophisticated ridge logistic regression problem.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;The best technology will actually rely on an hybrid system that can handle contrarian configurations, such as "time &lt; 4am" is bad, "country not US" is bad, but "time &lt; 4am and country = UK" is good. Good cross validation is also critical to eliminate configurations and metrics with no statistical significance or poor robustness. Careful metric binning, and a fast distributed feature optimization algorithm is important as well.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;Finally, design of experiments to create test campaigns - some with high proportion of fraud and some with no fraud - as well as usage of generic conversion and proper user identification is critical. And let's not forget that failing to remove bogus conversions will result  in a biased system with many false positives.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/24017879-115177632866824188?l=clickscoring.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://clickscoring.blogspot.com/feeds/115177632866824188/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=24017879&amp;postID=115177632866824188' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/24017879/posts/default/115177632866824188'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/24017879/posts/default/115177632866824188'/><link rel='alternate' type='text/html' href='http://clickscoring.blogspot.com/2006/07/efficient-click-fraud-detection-using.html' title='Efficient Click Fraud Detection using Advanced Analytics'/><author><name>Vincent Granville</name><uri>http://www.blogger.com/profile/11458380199915437888</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://www.datashaping.com/granville2.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-24017879.post-114871042606137100</id><published>2006-05-26T23:09:00.000-07:00</published><updated>2007-02-16T05:28:29.706-08:00</updated><title type='text'>New Developments in Click Fraud Detection</title><content type='html'>&lt;b&gt;Definition&lt;/b&gt;&lt;br /&gt;&lt;p&gt;&lt;br /&gt;Although there is no formal definition of click fraud, it is customary to consider fraudulent any click not resulting from a user genuinely interested in an ad found in a pay-per-click search engine network such as Google or Yahoo. This definition encompasses competitor fraud (depleting your competitor's budget), distribution partner fraud and other types of fraud committed either with or without financial incentives, as well as accidental fraud. Most but not all click fraud cases are potentially subject to prosecution, e.g. under the unfair business practice code.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;&lt;b&gt;New Patterns and Trends&lt;/b&gt;&lt;br /&gt;&lt;p&gt;&lt;br /&gt;There is increasing evidence that new patterns are emerging. While Google has improved impression fraud detection – a practice consisting of generating bogus impressions to reduce ad relevancy of your competitors to drive them out of Google – the fraud has spread to Yahoo and MSN. And more sophisticated bogus impression schemes are taking place on Google. Political activists and disgruntled employees, a new type of fraudsters not motivated by money, click on expensive paid ads from companies that they hate. They know which keywords are expensive.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;Traffic distribution partners willing to eliminate competing affiliates on a search engine network are rumored to have used click fraud warfare, or clickware. Other fraudsters, in an attempt to hide their activity, are generating bogus impressions, bogus clicks and also bogus conversions. To get undetected, they keep their CTR and conversion rates to more discrete - yet still too high - levels.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;On the other side, many companies are changing their employee internet usage policy for increased security. This means that sometimes, a same company or government agency uses spoofed IP addresses or one IP and one same browser shared by 50,000 employees. This can cause fraud detection systems to fail and generate many false positives, thus inflating fraud numbers. As far as organic search is concerned, we are worried by individuals who have been banned by Google using the same technology that get them banned to eliminate their competitors. This and other schemes have the potential to reduce search results relevancy, already low in some categories such as mortgages. However search engines will fight back with more advanced relevancy algorithms. This is actually one of the priorities for MSN and many others.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;On the positive side, We see that some search engines are taking the click fraud issue seriously. Over the long term, we believe that the concept of click fraud will be replaced by the much more meaningful concept of click quality or click profiling, a concept that we are currently implementing (see &lt;a href="http://www.clickprofiling.com/"&gt;ClickProfiling.com&lt;/a&gt;). &lt;/p&gt;&lt;p&gt;&lt;p&gt;&lt;br /&gt;True click fraud is illegal clicking worth investigating by the SEC or FBI because of potential connections with international crime, shareholder fraud or terrorism funding. It represents a small but potentially fast growing percentage due to the technical expertise of these groups. From a click scoring viewpoint, extremely poor clicks account for 10%, very poor clicks for 10%, poor clicks for 10%, and less than average clicks for another 20% of all clicks. Correctly identifying these click segments using an appropriate click scoring system is of critical importance to increase ROI. Sophisticated keyword selection systems should automatically buy dozens of thousands of under-sold keywords and automatically set ads on Google and Yahoo, ideally three ads per keyword. Ebay and Amazon have yet to substantially improve they automated bidding tools though. &lt;/p&gt;&lt;p&gt;&lt;p&gt;&lt;br /&gt;On the long term, advertisers will get smarter. Increased PPC with increased fraud and thus lower ROI or even negative ROI can not be sustained over the long term. We believe that the future will eventually bring better fraud detection and increased ROI – possibly with higher PPC - thanks in part to more knowledgeable advertisers and better relevancy algorithms.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;b&gt;Case Studies &lt;/b&gt;&lt;br /&gt;&lt;p&gt;&lt;br /&gt;Examples of false positive that we were able to identify include a large corporation, let's call it Acme, and the US Army. In the case of Acme, an alarm was raised because of thousands of clicks per day, day after day, by the same IP and same browser, all seemingly coming from a same user. However the keywords associated with the clicks – both paid and unpaid - the velocity and timing, the proportion of paid clicks and referrals did not show unusual patterns. It was found that Acme uses one IP and one browser for all its employees. Similarly, after investigating a bucket of clicks with highly suspicious spoofed IPs, it was found that the addresses were used by the US Army to hide their true origin. This prevents potential criminals from being indirectly informed (by checking IP addresses in their server logs) that they are being monitored by the Army. Again, the clicks were legitimate.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;Conversely, we correctly identified another set of spoofed IP addresses as fraudulent with our metric mix that incorporates proprietary keyword categorizations and multivariate statistical distributions. Email spammers accidentally clicking on paid clicks with web robots in their efforts to harvest email addresses made a few mistakes: they were using the same number of clicks per IP per day, at least on the IP addresses that they did not share with legitimate users. In another case, our linkage analysis revealed that thousands of IP addresses were switched off by one distribution partner caught in click fraud. When they reappeared, they were attached to a new partner, clearly showing that the fraud involved clickware or adware. The fraudster knew which computers were infected and possibly sold this information to another criminal.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;Finally, We are dealing not only with counterfeit clicks, but also fake impressions and bogus conversions. Click scoring is a complex problem: bogus conversions involve purchases with stolen credit cards or users paid to fill in forms and provide fake information. They can make poor clicks look good if undetected. However, we have developed methodology that preserves the quality of our click scoring system. Interestingly, one of our clients was using a click fraud detection system that failed to capture these bogus conversions in a fraud scheme, because their previous click monitoring system relied on Javascript and clear gif.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;&lt;b&gt;Fraud Schemes, Clickware &lt;/b&gt;&lt;br /&gt;&lt;p&gt;&lt;br /&gt;Different types of undetectable attacks can be carried out against internet companies that bill advertising clients using logfile statistics. These attacks usually rely on IP masking, IP masquerading and fake referrals. IP masking is accomplished by having a web robot accessing web pages through several hundreds of anonymous proxy servers.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;In another scenario, trojans are uploaded on popular shareware sites. Once downloaded by a user, these trojans perform the useful tasks they are supposed to do (e.g. hard drive cleaning, virus scanning etc.) but in addition, they randomly "click" on target links, writing fake information in target logfiles using web robot technology.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;Competing advertisers, affiliates or partners in a pay-per-click program might want to kill each other to gain market share, using click spam. Target links could consist of paid links associated with selected advertising clients (e.g. perpetrator's competitors) or expensive paid keywords (e.g. "bulk Email" or "online casino") on pay-per-click search engines. Another version of this attack could rely on a virus with an embedded web robot instead of a trojan. The resulting fake information in the target logfiles can not be distinguished from legitimate clicks from real users. The fake clicks have a 0% click-to-sale ratio, driving the advertiser's ROI into negative territory. We have computed that it is possible to generate $200 million in illegitimate charges with a click spam program running non-stop over a 12 month time period on one server.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;More recent cases involve ad relevancy fraud. It is possible to eradicate advertisers on AdSense for popular keywords, with a combination of bogus impressions and self-clicks, without using fraudulent clicks.&lt;br /&gt;Another scenario consists of a shareholder essentially using AOL IP addresses and other non anonymous proxies to commit large scale fraud on high dollar keywords on a 3rd-tier search engine, to manipulate the stock price. Once caught, the shareholder would tell that he is the victim of very sophisticated criminals who have spoofed his IP address and are trying to hurt the company that he targets with click fraud. Such a bogus claim is almost impossible to defeat in court, as true IP spoofing really exists and makes the true (non existent, in this case) "spoofer" essentially indistinguishable from the (self-proclaimed, in this case) "spoofee". &lt;/p&gt;&lt;p&gt;&lt;p&gt;&lt;br /&gt;A final example would be an advertiser who was banned from Google organic search through &lt;a href="http://www.pandia.com/features/banned.html"&gt;nefarious actions&lt;/a&gt; committed by one of his competitors, unable to get back into Google unpaid search results, and then seeking revenge and retaliating against all his competitors. He would use an expert scheme involving trending, impression and click fraud distilled over many months. The fraud would increase very slowly over time, making competitors' CTRs a little bit worse each month and his own CTR better (by clicking on his own ads once in a while). Along the same lines, one can think of a distribution partner artificially inflating his revenues by 1% the first month, 2% the second month, etc. with a cap set to 5%.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;b&gt;Our Approach: Click Scoring &lt;/b&gt;&lt;br /&gt;&lt;p&gt;&lt;br /&gt;While we have considerable experience both with advertiser and search engine data, this section focuses on advertiser data. One critical issue is how to attach a conversion to a click. We have developed patent-pending technology that enables us to correctly identify a unique AOL user, whether genuine, bogus or spoofed. The algorithm even recognizes that the sale from one IP originates from a totally different IP address. It will also detect when a sale and a click from a same IP are actually generated by unrelated users that share the same IP address. Or that a sale and a click from a same IP are actually not related as the users are different but temporarily share the same IP. In most cases, we are also able to explain the missing clicks: click listed in Google reports but not seen in server logs. This amounts to 50% of billed clicks in some cases. In one severe case of missing clicks, we were able to reduce the discrepancy from 50% to 0% and maximize savings to the client.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;From a statistical viewpoint, click scoring for advertiser data can be viewed as a general scoring technology. The scoring system is designed in such a way that the score distribution matches conversion rates. Critical issues include the use of universal conversions (with detection of bogus conversions) and standardized scores, selection of an efficient metric mix and optimized robust metric weights generally obtained as solution of a ridge regression problem involving combinatorial optimization (e.g. meta-feature optimization), optimum metric binning, tree forests or contrarian scoring technology. It is also important to detect the (possibly site-dependent) optimum timeout parameter in the user identification algorithm, as we can not rely on cookies to identify users.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;&lt;b&gt;Reference &lt;/b&gt;&lt;br /&gt;&lt;p&gt;&lt;br /&gt;&lt;i&gt;Click Fraud Resistant Methods for Learning Click-Through Rates&lt;/i&gt;. Nicole Immorlica et al. Microsoft Research, 2006. &lt;a href="http://datashaping.com/ppc2.shtml#top"&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/24017879-114871042606137100?l=clickscoring.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://clickscoring.blogspot.com/feeds/114871042606137100/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=24017879&amp;postID=114871042606137100' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/24017879/posts/default/114871042606137100'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/24017879/posts/default/114871042606137100'/><link rel='alternate' type='text/html' href='http://clickscoring.blogspot.com/2006/05/new-developments-in-click-fraud.html' title='New Developments in Click Fraud Detection'/><author><name>Vincent Granville</name><uri>http://www.blogger.com/profile/11458380199915437888</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://www.datashaping.com/granville2.jpg'/></author><thr:total>0</thr:total></entry></feed>
