Interdire aspirateur et bot via le robots.txt
10 messages
• Page 1 sur 1
- kitten13
- WRInaute discret

- Messages: 213
- Inscription: 30 Avr 2007
Interdire aspirateur et bot via le robots.txt
Bonjour,
Je souhaite interdire les aspirateurs et bots, j'ai donc ajouter ces lignes dans le robots.txt et fait un test avec intellitamper mais celui-ci arrive quand même à choper les pages, avez vous une astuce autre ?
Merci pour vos réponses
Je souhaite interdire les aspirateurs et bots, j'ai donc ajouter ces lignes dans le robots.txt et fait un test avec intellitamper mais celui-ci arrive quand même à choper les pages, avez vous une astuce autre ?
- Code: Tout sélectionner
User-agent: Alexibot
User-agent: asterias
User-agent: BackDoorBot/1.0
User-agent: Black Hole
User-agent: BlowFish/1.0
User-agent: BotALot
User-agent: BuiltBotTough
User-agent: Bullseye/1.0
User-agent: BunnySlippers
User-agent: Cegbfeieh
User-agent: CheeseBot
User-agent: CherryPicker
User-agent: CherryPickerElite/1.0
User-agent: CherryPickerSE/1.0
User-agent: CopyRightCheck
User-agent: cosmos
User-agent: Crescent
User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
User-agent: DISCo Pump 3.1
User-agent: DittoSpyder
User-agent: EmailCollector
User-agent: EmailSiphon
User-agent: EmailWolf
User-agent: EroCrawler
User-agent: ExtractorPro
User-agent: Foobot
User-agent: Harvest/1.5
User-agent: hloader
User-agent: httplib
User-agent: humanlinks
User-agent: InfoNaviRobot
User-agent: IntelliTamper
User-agent: JennyBot
User-agent: Kenjin Spider
User-agent: LexiBot
User-agent: libWeb/clsHTTP
User-agent: LinkextractorPro
User-agent: LinkScan/8.1a Unix
User-agent: LinkWalker
User-agent: lwp-trivial
User-agent: lwp-trivial/1.34
User-agent: Mata Hari
User-agent: Microsoft URL Control - 5.01.4511
User-agent: Microsoft URL Control - 6.00.8169
User-agent: MIIxpc
User-agent: MIIxpc/4.2
User-agent: Mister PiX
User-agent: moget
User-agent: moget/2.1
User-agent: NetAnts
User-agent: NetAttache
User-agent: NetAttache Light 1.1
User-agent: NetMechanic
User-agent: NICErsPRO
User-agent: Offline Explorer
User-agent: Openfind
User-agent: Openfind data gathere
User-agent: ProPowerBot/2.14
User-agent: ProWebWalker
User-agent: psbot
User-agent: QueryN Metasearch
User-agent: RepoMonkey
User-agent: RepoMonkey Bait & Tackle/v1.01
User-agent: RMA
User-agent: SiteSnagger
User-agent: SpankBot
User-agent: spanner
User-agent: SuperBot
User-agent: SuperBot/2.6
User-agent: suzuran
User-agent: Szukacz/1.4
User-agent: Teleport
User-agent: Telesoft
User-agent: The Intraformant
User-agent: TheNomad
User-agent: TightTwatBot
User-agent: Titan
User-agent: toCrawl/UrlDispatcher
User-agent: True_Robot
User-agent: True_Robot/1.0
User-agent: turingos
User-agent: URLy Warning
User-agent: VCI
User-agent: VCI WebViewer VCI WebViewer Win32
User-agent: Web Image Collector
User-agent: WebAuto
User-agent: WebBandit
User-agent: WebBandit/3.50
User-agent: WebCopier
User-agent: webcopy
User-agent: WebEnhancer
User-agent: WebmasterWorldForumBot
User-agent: webmirror
User-agent: WebReaper
User-agent: WebSauger
User-agent: website extractor
User-agent: Website Quester
User-agent: Webster Pro
User-agent: WebStripper
User-agent: WebStripper/2.02
User-agent: WebZip
User-agent: WebZip/4.0
User-agent: Wget
User-agent: Wget/1.5.3
User-agent: Wget/1.6
User-agent: WinHTTrack
User-agent: WWW-Collector-E
User-agent: Xenu's
User-agent: Xenu's Link Sleuth 1.1c
User-agent: Zeus
User-agent: Zeus 32297 Webster Pro V2.9 Win32
User-Agent: MJ12bot
User-agent: HTTrack
User-agent: HTTrack 3.0
User-agent: TurnitinBot
User-agent: QuepasaCreep
Disallow: /
User-Agent: *
Allow: /
Merci pour vos réponses
- kitten13
- WRInaute discret

- Messages: 213
- Inscription: 30 Avr 2007
Re: Interdire aspirateur et bot via le robots.txt
Merci pour la réponse, je vois que webrankinfo utilise aussi cette cette technique via le robots.txt, est ce vraiment inefficace ?
Sinon la solution du .htaccess me semble la plus efficace aussi, mais sous quel forme écrire les règles ?
Merci
Sinon la solution du .htaccess me semble la plus efficace aussi, mais sous quel forme écrire les règles ?
Merci
- darkjukka
- WRInaute impliqué

- Messages: 669
- Inscription: 28 Avr 2007
Re: Interdire aspirateur et bot via le robots.txt
J'ai tendance à penser que oui, un logiciel malveillant pour aspirer un site n'aurait aucun intérêt à suivre les règles d'interdictions préconisées par un fichier qu'ils peuvent éviter.
Perso j'ai ce .htaccess dans un certains répertoire de mon site :
Tu peux t'inspirer de celui-ci
Perso j'ai ce .htaccess dans un certains répertoire de mon site :
- Code: Tout sélectionner
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^-?$ [OR]
RewriteCond %{HTTP_USER_AGENT} Advanced\ Email\ Extractor [OR]
RewriteCond %{HTTP_USER_AGENT} almaden [NC,OR]
RewriteCond %{HTTP_USER_AGENT} @nonymouse [OR]
RewriteCond %{HTTP_USER_AGENT} Art-Online [OR]
RewriteCond %{HTTP_USER_AGENT} CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} Crescent\ Internet\ ToolPack [OR]
RewriteCond %{HTTP_USER_AGENT} DirectUpdate [OR]
RewriteCond %{HTTP_USER_AGENT} Download\ Accelerator [OR]
RewriteCond %{HTTP_USER_AGENT} eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} echo\ extense [OR]
RewriteCond %{HTTP_USER_AGENT} EmailCollector [OR]
RewriteCond %{HTTP_USER_AGENT} EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} Fetch\ API\ Request [OR]
RewriteCond %{HTTP_USER_AGENT} flashget [NC,OR]
RewriteCond %{HTTP_USER_AGENT} frontpage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} HTTP\ agent [OR]
RewriteCond %{HTTP_USER_AGENT} HTTPConnect [OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [OR]
RewriteCond %{HTTP_USER_AGENT} IPiumBot\ laurion(dot)com [OR]
RewriteCond %{HTTP_USER_AGENT} Kapere [OR]
RewriteCond %{HTTP_USER_AGENT} libwww-perl [OR]
RewriteCond %{HTTP_USER_AGENT} Microsoft\ URL\ Control [OR]
RewriteCond %{HTTP_USER_AGENT} minibot\(NaverRobot\) [OR]
RewriteCond %{HTTP_USER_AGENT} NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} NPBot [OR]
RewriteCond %{HTTP_USER_AGENT} Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} Program\ Shareware [OR]
RewriteCond %{HTTP_USER_AGENT} QuepasaCreep [OR]
RewriteCond %{HTTP_USER_AGENT} SiteMapper [OR]
RewriteCond %{HTTP_USER_AGENT} Star\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} SurveyBot [OR]
RewriteCond %{HTTP_USER_AGENT} Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} Telesoft [OR]
RewriteCond %{HTTP_USER_AGENT} TuringOS [OR]
RewriteCond %{HTTP_USER_AGENT} TurnitinBot [OR]
RewriteCond %{HTTP_USER_AGENT} vobsub [NC,OR]
RewriteCond %{HTTP_USER_AGENT} webbandit [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebCapture [OR]
RewriteCond %{HTTP_USER_AGENT} webcollage [OR]
RewriteCond %{HTTP_USER_AGENT} WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} WebDAV [OR]
RewriteCond %{HTTP_USER_AGENT} WebEmailExtractor [OR]
RewriteCond %{HTTP_USER_AGENT} WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} WEBsaver [OR]
RewriteCond %{HTTP_USER_AGENT} WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} Wysigot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} Zeus [OR]
RewriteCond %{HTTP_REFERER} ^XXX
RewriteCond %{HTTP_USER_AGENT} ADSARobot [OR]
RewriteCond %{HTTP_USER_AGENT} ah-ha [NC,OR]
RewriteCond %{HTTP_USER_AGENT} aktuelles [NC,OR]
RewriteCond %{HTTP_USER_AGENT} amzn_assoc [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Anarchie [OR]
RewriteCond %{HTTP_USER_AGENT} ASPSeek [OR]
RewriteCond %{HTTP_USER_AGENT} ASSORT [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ATHENS [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Atomz [OR]
RewriteCond %{HTTP_USER_AGENT} attach [NC,OR]
RewriteCond %{HTTP_USER_AGENT} attache [NC,OR]
RewriteCond %{HTTP_USER_AGENT} autoemailspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} BackWeb [OR]
RewriteCond %{HTTP_USER_AGENT} Bandit [OR]
RewriteCond %{HTTP_USER_AGENT} BatchFTP [OR]
RewriteCond %{HTTP_USER_AGENT} bdfetch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} big.brother [NC,OR]
RewriteCond %{HTTP_USER_AGENT} BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} bmclient [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Boston\ Project [OR]
RewriteCond %{HTTP_USER_AGENT} BravoBrian\ SpiderEngine\ MarcoPolo [OR]
RewriteCond %{HTTP_USER_AGENT} Buddy [OR]
RewriteCond %{HTTP_USER_AGENT} Bullseye [NC,OR]
RewriteCond %{HTTP_USER_AGENT} bumblebee [NC,OR]
RewriteCond %{HTTP_USER_AGENT} capture [OR]
RewriteCond %{HTTP_USER_AGENT} ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} CICC [OR]
RewriteCond %{HTTP_USER_AGENT} clipping [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Crescent [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Crescent\ Internet\ ToolPak [OR]
RewriteCond %{HTTP_USER_AGENT} Custo [OR]
RewriteCond %{HTTP_USER_AGENT} cyberalert [OR]
RewriteCond %{HTTP_USER_AGENT} Deweb [NC,OR]
RewriteCond %{HTTP_USER_AGENT} diagem [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Digger [OR]
RewriteCond %{HTTP_USER_AGENT} Digimarc [OR]
RewriteCond %{HTTP_USER_AGENT} DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} disco [NC,OR]
RewriteCond %{HTTP_USER_AGENT} DISCoFinder [OR]
RewriteCond %{HTTP_USER_AGENT} Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} Download\ Wonder [OR]
RewriteCond %{HTTP_USER_AGENT} Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} Drip [OR]
RewriteCond %{HTTP_USER_AGENT} DSurf15a [OR]
RewriteCond %{HTTP_USER_AGENT} DTS.Agent [NC,OR]
RewriteCond %{HTTP_USER_AGENT} EasyDL [OR]
RewriteCond %{HTTP_USER_AGENT} ecollector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} efp@gmx\.net [OR]
RewriteCond %{HTTP_USER_AGENT} Email\ Extractor [OR]
RewriteCond %{HTTP_USER_AGENT} EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} fastlwspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} FavOrg [OR]
RewriteCond %{HTTP_USER_AGENT} Favorites\ Sweeper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Fetch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} FEZhead [NC,OR]
RewriteCond %{HTTP_USER_AGENT} FileHound [OR]
RewriteCond %{HTTP_USER_AGENT} FlashGet\ WebWasher [OR]
RewriteCond %{HTTP_USER_AGENT} FlickBot [OR]
RewriteCond %{HTTP_USER_AGENT} fluffy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GalaxyBot [OR]
RewriteCond %{HTTP_USER_AGENT} Generic [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Getleft [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} GetSmart [OR]
RewriteCond %{HTTP_USER_AGENT} GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} GetWebPage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} gigabaz [OR]
RewriteCond %{HTTP_USER_AGENT} Girafabot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} go-ahead-got-it [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GornKer [OR]
RewriteCond %{HTTP_USER_AGENT} Grabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} Green\ Research [OR]
RewriteCond %{HTTP_USER_AGENT} Harvest [NC,OR]
RewriteCond %{HTTP_USER_AGENT} hhjhj@yahoo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} hloader [OR]
RewriteCond %{HTTP_USER_AGENT} HMView [OR]
RewriteCond %{HTTP_USER_AGENT} HomePageSearch [OR]
RewriteCond %{HTTP_USER_AGENT} httpdown [OR]
RewriteCond %{HTTP_USER_AGENT} http\ generic [OR]
RewriteCond %{HTTP_USER_AGENT} IBM_Planetwide [OR]
RewriteCond %{HTTP_USER_AGENT} Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} imagefetch [OR]
RewriteCond %{HTTP_USER_AGENT} IncyWincy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} informant [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Ingelin [NC,OR]
RewriteCond %{HTTP_USER_AGENT} InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} InternetLinkAgent [OR]
RewriteCond %{HTTP_USER_AGENT} InternetSeer\.com [OR]
RewriteCond %{HTTP_USER_AGENT} Iria [OR]
RewriteCond %{HTTP_USER_AGENT} Irvine [OR]
RewriteCond %{HTTP_USER_AGENT} JBH*Agent [OR]
RewriteCond %{HTTP_USER_AGENT} JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} JOC [OR]
RewriteCond %{HTTP_USER_AGENT} JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} JustView [OR]
RewriteCond %{HTTP_USER_AGENT} KWebGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Lachesis [OR]
RewriteCond %{HTTP_USER_AGENT} larbin [NC,OR]
RewriteCond %{HTTP_USER_AGENT} LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} LexiBot [OR]
RewriteCond %{HTTP_USER_AGENT} lftp [OR]
RewriteCond %{HTTP_USER_AGENT} libwww [OR]
RewriteCond %{HTTP_USER_AGENT} likse [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Link*Sleuth [OR]
RewriteCond %{HTTP_USER_AGENT} LINKS\ ARoMATIZED [OR]
RewriteCond %{HTTP_USER_AGENT} LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} LWP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} lwp-trivial [OR]
RewriteCond %{HTTP_USER_AGENT} Magnet [OR]
RewriteCond %{HTTP_USER_AGENT} Mac\ Finder [OR]
RewriteCond %{HTTP_USER_AGENT} Mag-Net [OR]
RewriteCond %{HTTP_USER_AGENT} Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} MCspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} Mirror [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Missigua\ Locator [OR]
RewriteCond %{HTTP_USER_AGENT} Mister\ PiX [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MMMtoCrawl\/UrlDispatcherLLL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla$ [OR]
RewriteCond %{HTTP_USER_AGENT} MSProxy [OR]
RewriteCond %{HTTP_USER_AGENT} multithreaddb [NC,OR]
RewriteCond %{HTTP_USER_AGENT} nationaldirectory [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} NetCarta [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NetMechanic [OR]
RewriteCond %{HTTP_USER_AGENT} netprospector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NetResearchServer [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} NetZip\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} NetZippy [OR]
RewriteCond %{HTTP_USER_AGENT} NEWT [NC,OR]
RewriteCond %{HTTP_USER_AGENT} nicerspro [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NPBot [OR]
RewriteCond %{HTTP_USER_AGENT} Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} OpaL [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Openfind [OR]
RewriteCond %{HTTP_USER_AGENT} OpenTextSiteCrawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} OrangeBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PackRat [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} pavuk [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PersonaPilot [OR]
RewriteCond %{HTTP_USER_AGENT} pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} PingALink [OR]
RewriteCond %{HTTP_USER_AGENT} Pockey [OR]
RewriteCond %{HTTP_USER_AGENT} Proxy [OR]
RewriteCond %{HTTP_USER_AGENT} psbot [OR]
RewriteCond %{HTTP_USER_AGENT} PSurf [NC,OR]
RewriteCond %{HTTP_USER_AGENT} puf [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Pump [OR]
RewriteCond %{HTTP_USER_AGENT} PushSite [NC,OR]
RewriteCond %{HTTP_USER_AGENT} QRVA [OR]
RewriteCond %{HTTP_USER_AGENT} RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} Reaper [OR]
RewriteCond %{HTTP_USER_AGENT} Recorder [OR]
RewriteCond %{HTTP_USER_AGENT} ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} replacer [NC,OR]
RewriteCond %{HTTP_USER_AGENT} RepoMonkey [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Robozilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Rover [NC,OR]
RewriteCond %{HTTP_USER_AGENT} RPT-HTTPClient [OR]
RewriteCond %{HTTP_USER_AGENT} Rsync [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SearchExpress [OR]
RewriteCond %{HTTP_USER_AGENT} searchhippo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} searchterms\.it [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Second\ Street\ Research [OR]
RewriteCond %{HTTP_USER_AGENT} Seeker [OR]
RewriteCond %{HTTP_USER_AGENT} Shai [NC,OR]
RewriteCond %{HTTP_USER_AGENT} sitecheck [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} SlySearch [OR]
RewriteCond %{HTTP_USER_AGENT} SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} snagger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SpaceBison [OR]
RewriteCond %{HTTP_USER_AGENT} Spegla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SpiderBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SqWorm [OR]
RewriteCond %{HTTP_USER_AGENT} Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} SurfWalker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Szukacz [OR]
RewriteCond %{HTTP_USER_AGENT} tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} tarspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Templeton [NC,OR]
RewriteCond %{HTTP_USER_AGENT} TrueRobot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} TV33_Mercator [OR]
RewriteCond %{HTTP_USER_AGENT} UIowaCrawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} URL_Spider_Pro [OR]
RewriteCond %{HTTP_USER_AGENT} UtilMind [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Vacuum [OR]
RewriteCond %{HTTP_USER_AGENT} vagabondo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} vayala [NC,OR]
RewriteCond %{HTTP_USER_AGENT} visibilitygap [NC,OR]
RewriteCond %{HTTP_USER_AGENT} VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} vspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} w3mir [NC,OR]
RewriteCond %{HTTP_USER_AGENT} web\.by\.mail [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Web\ Data\ Extractor [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Web\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} Webclipping [OR]
RewriteCond %{HTTP_USER_AGENT} webcollector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} webcraft@bea [NC,OR]
RewriteCond %{HTTP_USER_AGENT} webdevil [NC,OR]
RewriteCond %{HTTP_USER_AGENT} webdownloader [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Webdup [OR]
RewriteCond %{HTTP_USER_AGENT} WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} WebHook [OR]
RewriteCond %{HTTP_USER_AGENT} Webinator [OR]
RewriteCond %{HTTP_USER_AGENT} WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} WEBMASTERS [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebMiner [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebMirror [OR]
RewriteCond %{HTTP_USER_AGENT} webmole [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} WebSnake [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Webster [OR]
RewriteCond %{HTTP_USER_AGENT} WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} websucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} webvac [NC,OR]
RewriteCond %{HTTP_USER_AGENT} webwalk [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} webweasel [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} whizbang [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WhosTalking [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Widow [OR]
RewriteCond %{HTTP_USER_AGENT} WISEbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WUMPUS [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Wweb [OR]
RewriteCond %{HTTP_USER_AGENT} WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} XGET [NC,OR]
RewriteCond %{HTTP_USER_AGENT} x-Tractor [OR]
RewriteCond %{HTTP_USER_AGENT} Yandex [OR]
RewriteCond %{HTTP_USER_AGENT} Yahoo!Slurp [NC,OR]
RewriteCond %{REMOTE_HOST} ^private$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} traffixer [NC,OR]
RewriteCond %{HTTP_USER_AGENT} netfactual [NC,OR]
RewriteCond %{HTTP_USER_AGENT} netcraft [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^[^?]*iaea\.org [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^[^?]*addresses\.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} [0-9A-Za-z]{15,} [OR]
RewriteCond %{HTTP_USER_AGENT} ^[0-9A-Za-z]+$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^[^?]*\.ideography\.co\.uk [NC]
RewriteRule .*$ http://www.blind-guardian.fr [R,L]
Tu peux t'inspirer de celui-ci
-

Leonick - WRInaute accro

- Messages: 19595
- Inscription: 8 Aoû 2004
Re: Interdire aspirateur et bot via le robots.txt
le mieux étant de faire les 2 : un robots.txt, pour que les gentils robots bien corrects comprennent qu'on ne veut pas d'eux et le htaccess pour les vilains robots qui ne lisent pas, ou ne tiennent pas compte du robots.txt
comme cela, ceux qui suivent les directives de robot.txt ne viennent pas prendre bêtement des ressources à ton serveur
comme cela, ceux qui suivent les directives de robot.txt ne viennent pas prendre bêtement des ressources à ton serveur
-

Leonick - WRInaute accro

- Messages: 19595
- Inscription: 8 Aoû 2004
Re: Interdire aspirateur et bot via le robots.txt
absolument pas, mais cela filtre déjà ceux qui utilisent les aspirateurs sans les reconfigurer.kitten13 a écrit:Ok, par contre je vais être casse bonbon, j'ai une dernière question : est ce fiable de se basé sur le USER_AGENT ?
Parce que après, en vérifiant tes logs, tu verras que tu auras des googlebot qui se connectent avec wanadoo
Donc après, il faut une analyse de la navigation plus subtile pour éliminer encore quelques autres aspirateurs, faire des bloquages sur ip, ...
- kitten13
- WRInaute discret

- Messages: 213
- Inscription: 30 Avr 2007
Re: Interdire aspirateur et bot via le robots.txt
Merci pour les complément d'info 
édit : Une dernière question et promis j'arrête
Donc je viens de faire mais règles pr le .htaccess basé sur le modèle de darkjukka, par contre pr le robots.txt j'ai ma liste User-agent à bloquer au début du fichier et ensuite le :
Je voulais savoir dans quel sens je dois ajouter User-Agent: * Allow: / ? avant ou apres ma liste User-agent a bloquer ?
édit : Une dernière question et promis j'arrête
Donc je viens de faire mais règles pr le .htaccess basé sur le modèle de darkjukka, par contre pr le robots.txt j'ai ma liste User-agent à bloquer au début du fichier et ensuite le :
- Code: Tout sélectionner
User-Agent: *
Allow: /
Je voulais savoir dans quel sens je dois ajouter User-Agent: * Allow: / ? avant ou apres ma liste User-agent a bloquer ?
10 messages
• Page 1 sur 1
Lectures recommandées sur ce thème :
- Interdire certains robots ? Comment ? Quels robots ?
- Eviter les robots aspirateur de courriel
- Autoriser Google Adsense Bot, interdire tous les autres ?
- Interdire l'acces au fichier Robots
- Faut-il interdire certains robots ?
- Robots.txt interdire une url dynamique
- robots.txt : interdire tout sauf la racine
- interdire un dossier sans htaccess no robots.txt ?
- Interdire le passage des robots avec googlestats
- Interdire des pages dynamiques dans robots.txt
Qui est en ligne
Utilisateurs parcourant ce forum: Aucun utilisateur enregistré et 2 invités
