# # IAB_ABCe_International_Spiders_and_Robots_200612 # # December 20, 2006 # # **********COMMENTS SECTION*************************************************** # # This list has been reviewed by the IAB MTF Spider & Robot Policy Board. # This file contains a list of patterns that may be matched against HTTP User # Agent (UA) strings to determine whether that UA matches a known spider or # robot. This is one step of several required for compliance to IAB Advertising # Measurement Guidelines. # # The list is valid for use when counting Client Side Counting (CSC) # transactions. See [http://www.iab.net/standards/pdf/2292%20IAB%20spreads.pdf] # for more info. # # Rule: If any of these patterns are found to match any string within the HTTP # User-Agent, case insensitively, it is identified as a non-human interaction # and so should be filtered from counts. # # It is strongly suggested that users analyze their own log data and sort this # list in order of frequency to allow their filter program to work as # efficiently as possible. # # This list is provided in good faith but must be used at the user's own risk. # The IAB, ABC ELECTRONIC and ImServices accept no responsibility for any # legal, technical or commercial consequences arising from the use of this list. # # Special characters in this file: # # - (only at the start of a line) this line is a comment #| - field separator # , - field separator (Used when multiple exceptions) # blank lines may be present. ignore them. # # # Fields - delimited by a pipe symbol [|]: # 1) pattern - case insensitive string to match anywhere in the string # reserved characters are URL-escaped if present (|=%7C #=%23) # 2) active flag # 1=pattern is active and should be matched # 0=pattern is inactive, and should ignored # 3) [optional] comma-separated list of exception patterns # reserved characters are URL-escaped if present (|=%7C #=%23 ,=%2C) # 4) A an additional flag was added to this list in November 2005 to identify # those user-agent strings on this list that would not pass the valid user- # agent test and therefore, are redundant if both lists are used. # 1=this entry is not needed for those who use a two-pass approach # 0=this entry is always needed for both one-pass and two-pass # approaches # 5) Another flag was added to this list when the IAB and ABCe merged their two # lists (01/06) to identify those strings that primarily impact page # impression measurement vs. those strings that primarily impact ad # impression measurement (or both). The flags are as follows: # 0=this entry primarily impacts page impression measurement # 1=this entry primarily impacts ad impression measurement # 2=this entry impacts both # # NOTES: # The 3rd column supports an 'exception' feature, which lets the file specify # broadly matching patterns and then allow special cases. For instance, if a UA # advertises itself as a 'robot', it should be ignored for counting purposes # unless the string 'robotics' is present, which allows for the counting of US # Robotics cobranded browsers. There may be more than one exception for each # pattern separated by a comma. Please note that use of this field is optional. # # The 5th column attempts to associate the robot with page impressions or ad # impressions (or both) but should be used only as a guide. Application of this # list should be based on an analysis of the activity itself before excluding # any entries. # # UA strings are considered uncountable (per IAB Guidelines) if they contain # any of the following patterns (note: patterns are case insensitive, but left # in this file in mixed case for human legibility) # # Contact ImServices Group in the U.S. (spiders.bots@imservicesgroup.com) or # ABC Electronic in the UK (spiders.bots@abce.org.uk) with any feedback # regarding this file. # #******************* END OF COMMENTS ****************************************** # # User-agent: 1job Disallow: / User-agent: 247sitewatch Disallow: / User-agent: abilon Disallow: / User-agent: abot Disallow: / User-agent: accoona-ai-agent Disallow: / User-agent: agentname Disallow: / User-agent: aipbot Disallow: / User-agent: aladdino Disallow: / User-agent: apachebench Disallow: / User-agent: aport Disallow: / User-agent: appie Disallow: / User-agent: applesyndication Disallow: / User-agent: arachnia Disallow: / User-agent: aranha Disallow: / User-agent: art-online.com Disallow: / User-agent: ask jeeves Disallow: / User-agent: ask+jeeves Disallow: / User-agent: asterias Disallow: / User-agent: atomz Disallow: / User-agent: avantgo Disallow: / User-agent: avsearch Disallow: / User-agent: b2w Disallow: / User-agent: backweb Disallow: / User-agent: baidu Disallow: / User-agent: becomebot Disallow: / User-agent: bigbrother Disallow: / User-agent: BIMBO Disallow: / User-agent: blitzbot Disallow: / User-agent: bloglines Disallow: / User-agent: bordermanager Disallow: / User-agent: bumblebee Disallow: / User-agent: CE-Preload Disallow: / User-agent: change detection Disallow: / User-agent: change+detection Disallow: / User-agent: changedetection Disallow: / User-agent: charlotte Disallow: / User-agent: check_http Disallow: / User-agent: checkurl Disallow: / User-agent: chkd Disallow: / User-agent: coast Disallow: / User-agent: combine Disallow: / User-agent: cometsearch Disallow: / User-agent: contype Disallow: / User-agent: convera Disallow: / User-agent: copernicenterprisesearch Disallow: / User-agent: copyrightcheck Disallow: / User-agent: cosmos Disallow: / User-agent: crawler Disallow: / User-agent: crescent Disallow: / User-agent: crucial inforation miner Disallow: / User-agent: crucial+inforation+miner Disallow: / User-agent: curl Disallow: / User-agent: dialer Disallow: / User-agent: diphonet Disallow: / User-agent: Download Ninja Disallow: / User-agent: Download+Ninja Disallow: / User-agent: dtaagent Disallow: / User-agent: dts agent Disallow: / User-agent: dts+agent Disallow: / User-agent: earthcom.info Disallow: / User-agent: echo Disallow: / User-agent: emailsiphon Disallow: / User-agent: eNews Creator Disallow: / User-agent: eNews+Creator Disallow: / User-agent: Enfish Tracker Disallow: / User-agent: Enfish+Tracker Disallow: / User-agent: fast Disallow: / User-agent: favorg Disallow: / User-agent: feedonfeeds Disallow: / User-agent: fetch Disallow: / User-agent: filehound Disallow: / User-agent: Firehunter Disallow: / User-agent: flashget Disallow: / User-agent: freefind Disallow: / User-agent: frontier Disallow: / User-agent: geniebot Disallow: / User-agent: getright Disallow: / User-agent: go!zilla Disallow: / User-agent: golem Disallow: / User-agent: gomezagent Disallow: / User-agent: googlebot Disallow: / User-agent: grabber Disallow: / User-agent: grub Disallow: / User-agent: gulliver Disallow: / User-agent: hapax Disallow: / User-agent: harvest Disallow: / User-agent: hit list Disallow: / User-agent: hit+list Disallow: / User-agent: hitlist Disallow: / User-agent: htdig Disallow: / User-agent: httrack Disallow: / User-agent: ia_archive Disallow: / User-agent: ibot Disallow: / User-agent: ichiro Disallow: / User-agent: ideare Disallow: / User-agent: IEAutoDiscovery Disallow: / User-agent: iltrovatore-setaccio Disallow: / User-agent: indy library Disallow: / User-agent: indy+library Disallow: / User-agent: infolink Disallow: / User-agent: infoseek Disallow: / User-agent: inktomi search Disallow: / User-agent: inktomi+search Disallow: / User-agent: internet ninja Disallow: / User-agent: internet+ninja Disallow: / User-agent: internetseer Disallow: / User-agent: inverse ip insight Disallow: / User-agent: inverse+ip+insight Disallow: / User-agent: ipsentry Disallow: / User-agent: irlbot Disallow: / User-agent: isilo Disallow: / User-agent: jakarta Disallow: / User-agent: janrain-lobster Disallow: / User-agent: jetbot Disallow: / User-agent: jobo Disallow: / User-agent: justview Disallow: / User-agent: keepalive Disallow: / User-agent: keynote Disallow: / User-agent: kilroy Disallow: / User-agent: kinja Disallow: / User-agent: kummhttp Disallow: / User-agent: lachesis Disallow: / User-agent: larbin Disallow: / User-agent: libwww-perl Disallow: / User-agent: linkbot Disallow: / User-agent: linkchecker Disallow: / User-agent: linklint Disallow: / User-agent: linkscan Disallow: / User-agent: linksweeper Disallow: / User-agent: linkwalker Disallow: / User-agent: lisa Disallow: / User-agent: locust Disallow: / User-agent: lotusdiscovery Disallow: / User-agent: lwp Disallow: / User-agent: lydia Disallow: / User-agent: mac finder Disallow: / User-agent: mac+finder Disallow: / User-agent: MacReport Disallow: / User-agent: magenta Disallow: / User-agent: magus bot Disallow: / User-agent: magus+bot Disallow: / User-agent: markwatch Disallow: / User-agent: mazingo Disallow: / User-agent: mazzilla Disallow: / User-agent: mediapartners-google Disallow: / User-agent: mercator Disallow: / User-agent: mfc_tear_sample Disallow: / User-agent: microsoft internet explorer/4.40.426 (windows 95) Disallow: / User-agent: microsoft scheduled cache content download service Disallow: / User-agent: microsoft url control Disallow: / User-agent: microsoft+internet+explorer/4.40.426+(windows+95) Disallow: / User-agent: microsoft+scheduled+cache+content+download+service Disallow: / User-agent: microsoft+url+control Disallow: / User-agent: minuteman Disallow: / User-agent: mirago Disallow: / User-agent: missigua Disallow: / User-agent: miva Disallow: / User-agent: mj12bot Disallow: / User-agent: mobipocket webcompanion Disallow: / User-agent: mobipocket+webcompanion Disallow: / User-agent: moget Disallow: / User-agent: monitor Disallow: / User-agent: monkeycrawl Disallow: / User-agent: monster Disallow: / User-agent: mothra/126-paladium Disallow: / User-agent: motor Disallow: / User-agent: mozilla 2.0 (compatible; msie 3.02; update a; windows nt) Disallow: / User-agent: mozilla/5.0 (compatible; msie 5.0) Disallow: / User-agent: mozilla/5.0+(compatible;+msie+5.0) Disallow: / User-agent: mozilla+2.0+(compatible;+msie+3.02;+update+a;+windows+nt) Disallow: / User-agent: ms frontpage Disallow: / User-agent: MS Search Disallow: / User-agent: ms+frontpage Disallow: / User-agent: MS+Search Disallow: / User-agent: MSNPTC Disallow: / User-agent: nalanda Disallow: / User-agent: nbot Disallow: / User-agent: nessus Disallow: / User-agent: netmechanic Disallow: / User-agent: netnewswire Disallow: / User-agent: new/0.1libwww Disallow: / User-agent: newave-lisa Disallow: / User-agent: news search Disallow: / User-agent: news+search Disallow: / User-agent: newsapp Disallow: / User-agent: newsbot Disallow: / User-agent: newsfire Disallow: / User-agent: newsgator Disallow: / User-agent: newslookup Disallow: / User-agent: newsmachine Disallow: / User-agent: newsnow Disallow: / User-agent: newssearch Disallow: / User-agent: nextgensearchbot Disallow: / User-agent: ng/2.0 Disallow: / User-agent: nomad Disallow: / User-agent: npbot Disallow: / User-agent: nutch Disallow: / User-agent: nutscrape Disallow: / User-agent: obot Disallow: / User-agent: ocelli Disallow: / User-agent: omniexplorer Disallow: / User-agent: openfind Disallow: / User-agent: oracle ultra search Disallow: / User-agent: oracle+ultra+search Disallow: / User-agent: patric Disallow: / User-agent: perman surfer Disallow: / User-agent: perman+surfer Disallow: / User-agent: pioneer Disallow: / User-agent: pita Disallow: / User-agent: pluck Disallow: / User-agent: plumtree Disallow: / User-agent: polybot Disallow: / User-agent: pompos Disallow: / User-agent: port huron labs Disallow: / User-agent: port+huron+labs Disallow: / User-agent: powermarks Disallow: / User-agent: proxysg Disallow: / User-agent: psbot Disallow: / User-agent: pulpfiction Disallow: / User-agent: quepasacreep Disallow: / User-agent: rational sitecheck Disallow: / User-agent: rational+sitecheck Disallow: / User-agent: realnamesbot Disallow: / User-agent: robot Disallow: / User-agent: rpt-http Disallow: / User-agent: rss client Disallow: / User-agent: rss+client Disallow: / User-agent: rssmaker-ng Disallow: / User-agent: rssreader Disallow: / User-agent: rufusbot Disallow: / User-agent: sawaalrobo Disallow: / User-agent: schmozilla Disallow: / User-agent: scirus Disallow: / User-agent: scooter Disallow: / User-agent: scoutabout Disallow: / User-agent: search.ch Disallow: / User-agent: seekbot Disallow: / User-agent: seeker.lookseek.com Disallow: / User-agent: servers alive Disallow: / User-agent: servers+alive Disallow: / User-agent: sherlock Disallow: / User-agent: shopwiki Disallow: / User-agent: sitescooper Disallow: / User-agent: slurp Disallow: / User-agent: slysearch Disallow: / User-agent: snooper Disallow: / User-agent: sohu Disallow: / User-agent: spider Disallow: / User-agent: spike Disallow: / User-agent: spinne Disallow: / User-agent: spyder Disallow: / User-agent: squid cache Disallow: / User-agent: squid+cache Disallow: / User-agent: stackrambler Disallow: / User-agent: stuff Disallow: / User-agent: sucker Disallow: / User-agent: sundoh search Disallow: / User-agent: sundoh+search Disallow: / User-agent: szukacz Disallow: / User-agent: taz Disallow: / User-agent: teleport Disallow: / User-agent: templeton Disallow: / User-agent: teoma Disallow: / User-agent: thunderstone Disallow: / User-agent: t-h-u-n-d-e-r-s-t-o-n-e Disallow: / User-agent: topix Disallow: / User-agent: ukonline Disallow: / User-agent: ultraseek Disallow: / User-agent: urchin Disallow: / User-agent: urlcheck Disallow: / User-agent: vagabondo Disallow: / User-agent: versus Disallow: / User-agent: voyager Disallow: / User-agent: web downloader Disallow: / User-agent: web+downloader Disallow: / User-agent: webauto Disallow: / User-agent: webcapture Disallow: / User-agent: webcheck Disallow: / User-agent: webclipping.com Disallow: / User-agent: WebCopier Disallow: / User-agent: webcrawl Disallow: / User-agent: webdup Disallow: / User-agent: webextractor Disallow: / User-agent: webinator Disallow: / User-agent: website extractor Disallow: / User-agent: website+extractor Disallow: / User-agent: webtool Disallow: / User-agent: webtrends Disallow: / User-agent: webvac Disallow: / User-agent: webwasher Disallow: / User-agent: webzip Disallow: / User-agent: wfarc Disallow: / User-agent: wget Disallow: / User-agent: whatsup Disallow: / User-agent: whizbang Disallow: / User-agent: worm Disallow: / User-agent: xenu Disallow: / User-agent: yacy Disallow: / User-agent: yandex Disallow: / User-agent: ync Disallow: / User-agent: yotta Disallow: / User-agent: zealbot Disallow: / User-agent: zeus Disallow: / User-agent: zibber Disallow: / User-agent: zipppbot Disallow: / User-agent: zyborg Disallow: / User-agent: ez publish link validator Disallow: / User-agent: ez+publish+link+validator Disallow: / User-agent: whistleblower Disallow: / User-agent: terrawizbot Disallow: / User-agent: Goldfire Disallow: / User-agent: SiteVigil Disallow: / User-agent: EmailSmartz Disallow: / User-agent: iOpus-I-M Disallow: / User-agent: BITS Disallow: / User-agent: heritrix Disallow: / User-agent: c r a w l e r Disallow: / User-agent: c+r+a+w+l+e+r Disallow: / User-agent: Freedom Disallow: / User-agent: yahoofeedseeker Disallow: / User-agent: internal zero-knowledge agent Disallow: / User-agent: internal+zero-knowledge+agent Disallow: / User-agent: NaverBot Disallow: / User-agent: SurveyBot/ Disallow: / User-agent: Liferea Disallow: / User-agent: NetNewsWire Disallow: / User-agent: TPSystem Disallow: / User-agent: YahooSeeker Disallow: / User-agent: FindLinks Disallow: / User-agent: psycheclone Disallow: / User-agent: oodlebot Disallow: / User-agent: mackster Disallow: / User-agent: AdsBot-Google Disallow: / User-agent: InnovantageBot Disallow: / User-agent: 192.comAgent Disallow: / User-agent: NASA Search Disallow: / User-agent: KHTE Disallow: / User-agent: KTXN Disallow: / User-agent: AutoMapIt Disallow: / User-agent: Advanced Email Extractor Disallow: / User-agent: Advanced+Email+Extractor Disallow: / User-agent: MSRBOT Disallow: / User-agent: Moreoverbot Disallow: /