Methode simple pour interdire l'aspiration de votre site
30 messages • Page 1 sur 2 • 1, 2
Consultez la formation au référencement naturel Google de WebRankInfo / Ranking Metrics
- franceradio
- WRInaute passionné

- Messages: 667
- Inscription: Jeu Avr 19, 2007 18:27
Methode simple pour interdire l'aspiration de votre site
Bonjour,
C'est pratiquement impossible d'interdire l'aspiration d'un site à 100% mais voilà une liste des "bots" les plus connus,
Il suffit d'ajouter dans votre fichier robots.txt cette liste:
Fichier après simplification de ohax:
Liste initiale (Avant la réponse de ohax)
C'est pratiquement impossible d'interdire l'aspiration d'un site à 100% mais voilà une liste des "bots" les plus connus,
Il suffit d'ajouter dans votre fichier robots.txt cette liste:
Fichier après simplification de ohax:
User-agent: Fasterfox
User-agent: Alexibot
User-agent: asterias
User-agent: BackDoorBot/1.0
User-agent: Black Hole
User-agent: BlowFish/1.0
User-agent: BotALot
User-agent: BuiltBotTough
User-agent: Bullseye/1.0
User-agent: BunnySlippers
User-agent: Cegbfeieh
User-agent: CheeseBot
User-agent: CherryPicker
User-agent: CherryPickerElite/1.0
User-agent: CherryPickerSE/1.0
User-agent: CopyRightCheck
User-agent: cosmos
User-agent: Crescent
User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
User-agent: DISCo Pump 3.1
User-agent: DittoSpyder
User-agent: EmailCollector
User-agent: EmailSiphon
User-agent: EmailWolf
User-agent: EroCrawler
User-agent: ExtractorPro
User-agent: Foobot
User-agent: Harvest/1.5
User-agent: hloader
User-agent: httplib
User-agent: humanlinks
User-agent: InfoNaviRobot
User-agent: JennyBot
User-agent: Kenjin Spider
User-agent: LexiBot
User-agent: libWeb/clsHTTP
User-agent: LinkextractorPro
User-agent: LinkScan/8.1a Unix
User-agent: LinkWalker
User-agent: lwp-trivial
User-agent: lwp-trivial/1.34
User-agent: Mata Hari
User-agent: Microsoft URL Control - 5.01.4511
User-agent: Microsoft URL Control - 6.00.8169
User-agent: MIIxpc
User-agent: MIIxpc/4.2
User-agent: Mister PiX
User-agent: moget
User-agent: moget/2.1
User-agent: NetAnts
User-agent: NetAttache
User-agent: NetAttache Light 1.1
User-agent: NetMechanic
User-agent: NICErsPRO
User-agent: Offline Explorer
User-agent: Openfind
User-agent: Openfind data gathere
User-agent: ProPowerBot/2.14
User-agent: ProWebWalker
User-agent: psbot
User-agent: QueryN Metasearch
User-agent: RepoMonkey
User-agent: RepoMonkey Bait & Tackle/v1.01
User-agent: RMA
User-agent: SiteSnagger
User-agent: SpankBot
User-agent: spanner
User-agent: SuperBot
User-agent: SuperBot/2.6
User-agent: suzuran
User-agent: Szukacz/1.4
User-agent: Teleport
User-agent: Telesoft
User-agent: The Intraformant
User-agent: TheNomad
User-agent: TightTwatBot
User-agent: Titan
User-agent: toCrawl/UrlDispatcher
User-agent: True_Robot
User-agent: True_Robot/1.0
User-agent: turingos
User-agent: URLy Warning
User-agent: VCI
User-agent: VCI WebViewer VCI WebViewer Win32
User-agent: Web Image Collector
User-agent: WebAuto
User-agent: WebBandit
User-agent: WebBandit/3.50
User-agent: WebCopier
User-agent: webcopy
User-agent: WebEnhancer
User-agent: WebmasterWorldForumBot
User-agent: webmirror
User-agent: WebReaper
User-agent: WebSauger
User-agent: website extractor
User-agent: Website Quester
User-agent: Webster Pro
User-agent: WebStripper
User-agent: WebStripper/2.02
User-agent: WebZip
User-agent: WebZip/4.0
User-agent: Wget
User-agent: Wget/1.5.3
User-agent: Wget/1.6
User-agent: WinHTTrack
User-agent: WWW-Collector-E
User-agent: Xenu's
User-agent: Xenu's Link Sleuth 1.1c
User-agent: Zeus
User-agent: Zeus 32297 Webster Pro V2.9 Win32
User-Agent: MJ12bot
User-agent: HTTrack
User-agent: HTTrack 3.0
User-agent: TurnitinBot
User-agent: QuepasaCreep
Disallow: /
Liste initiale (Avant la réponse de ohax)
User-agent: Alexibot
Disallow: /
User-agent: Aqua_Products
Disallow: /
User-agent: BackDoorBot
Disallow: /
User-agent: BackDoorBot
Disallow: /
User-agent: Black.Hole
Disallow: /
User-agent: BlackWidow
Disallow: /
User-agent: BlowFish
Disallow: /
User-agent: BlowFish
Disallow: /
User-agent: Bookmark search tool
Disallow: /
User-agent: BotALot
Disallow: /
User-agent: BotRightHere
Disallow: /
User-agent: BuiltBotTough
Disallow: /
User-agent: Bullseye
Disallow: /
User-agent: Bullseye
Disallow: /
User-agent: BunnySlippers
Disallow: /
User-agent: Cegbfeieh
Disallow: /
User-agent: CheeseBot
Disallow: /
User-agent: CherryPicker
Disallow: /
User-agent: CherryPickerElite
Disallow: /
User-agent: CherryPickerSE
Disallow: /
User-agent: ChinaClaw
Disallow: /
User-agent: Copernic
Disallow: /
User-agent: CopyRightCheck
Disallow: /
User-agent: Crescent
Disallow: /
User-agent: Custo
Disallow: /
User-agent: DISCo
Disallow: /
User-agent: DISCoFinder
Disallow: /
User-agent: DittoSpyder
Disallow: /
User-agent: Download Demon
Disallow: /
User-agent: EirGrabber
Disallow: /
User-agent: EmailCollector
Disallow: /
User-agent: EmailSiphon
Disallow: /
User-agent: EmailWolf
Disallow: /
User-agent: EroCrawler
Disallow: /
User-agent: Express WebPictures
Disallow: /
User-agent: ExtractorPro
Disallow: /
User-agent: EyeNetIE
Disallow: /
User-agent: FairAd Client
Disallow: /
User-agent: Flaming AttackBot
Disallow: /
User-agent: FlashGet
Disallow: /
User-agent: FlashGet WebWasher 3.2
Disallow: /
User-agent: Foobot
Disallow: /
User-agent: FrontPage
Disallow: /
User-agent: FrontPage
Disallow: /
User-agent: Gaisbot
Disallow: /
User-agent: GetRight
Disallow: /
User-agent: GetWeb!
Disallow: /
User-agent: Go!Zilla
Disallow: /
User-agent: Go-Ahead-Got-It
Disallow: /
User-agent: Googlebot-Image
Disallow: /
User-agent: GrabNet
Disallow: /
User-agent: Grafula
Disallow: /
User-agent: HMView
Disallow: /
User-agent: HTTrack
Disallow: /
User-agent: Harvest
Disallow: /
User-agent: Image Stripper
Disallow: /
User-agent: Image Sucker
Disallow: /
User-agent: Indy Library
Disallow: /
User-agent: InfoNaviRobot
Disallow: /
User-agent: InterGET
Disallow: /
User-agent: Internet Ninja
Disallow: /
User-agent: Iron33
Disallow: /
User-agent: JOC Web Spider
Disallow: /
User-agent: JennyBot
Disallow: /
User-agent: JetCar
Disallow: /
User-agent: Kenjin Spider
Disallow: /
User-agent: Kenjin.Spider
Disallow: /
User-agent: Keyword Density
Disallow: /
User-agent: Keyword.Density
Disallow: /
User-agent: LNSpiderguy
Disallow: /
User-agent: LeechFTP
Disallow: /
User-agent: LexiBot
Disallow: /
User-agent: LinkScan
Disallow: /
User-agent: LinkWalker
Disallow: /
User-agent: LinkextractorPro
Disallow: /
User-agent: MIDown tool
Disallow: /
User-agent: MIIxpc
Disallow: /
User-agent: MIIxpc
Disallow: /
User-agent: MSIECrawler
Disallow: /
User-agent: Mass Downloader
Disallow: /
User-agent: Mass Downloader
Disallow: /
User-agent: Mata Hari
Disallow: /
User-agent: Microsoft URL Control
Disallow: /
User-agent: Microsoft.URL
Disallow: /
User-agent: Mister PiX
Disallow: /
User-agent: Mister PiX version.dll
Disallow: /
User-agent: Mister Pix II 2.01
Disallow: /
User-agent: Mister Pix II 2.02a
Disallow: /
User-agent: Mister.PiX
Disallow: /
User-agent: NICErsPRO
Disallow: /
User-agent: NPBot
Disallow: /
User-agent: NPbot
Disallow: /
User-agent: Navroad
Disallow: /
User-agent: NearSite
Disallow: /
User-agent: Net Vampire
Disallow: /
User-agent: NetAnts
Disallow: /
User-agent: NetMechanic
Disallow: /
User-agent: NetSpider
Disallow: /
User-agent: NetZIP
Disallow: /
User-agent: NetZip Downloader
Disallow: /
User-agent: NetZippy
Disallow: /
User-agent: Octopus
Disallow: /
User-agent: Offline Explorer
Disallow: /
User-agent: Offline Navigator
Disallow: /
User-agent: Offline.Explorer
Disallow: /
User-agent: Openbot
Disallow: /
User-agent: Openfind
Disallow: /
User-agent: Openfind data gatherer
Disallow: /
User-agent: Oracle Ultra Search
Disallow: /
User-agent: PageGrabber
Disallow: /
User-agent: Papa Foto
Disallow: /
User-agent: PerMan
Disallow: /
User-agent: ProPowerBot
Disallow: /
User-agent: ProWebWalker
Disallow: /
User-agent: Python-urllib
Disallow: /
User-agent: QueryN Metasearch
Disallow: /
User-agent: QueryN.Metasearch
Disallow: /
User-agent: RMA
Disallow: /
User-agent: Radiation Retriever 1.1
Disallow: /
User-agent: ReGet
Disallow: /
User-agent: RealDownload
Disallow: /
User-agent: RepoMonkey
Disallow: /
User-agent: SiteSnagger
Disallow: /
User-agent: SlySearch
Disallow: /
User-agent: SmartDownload
Disallow: /
User-agent: SpankBot
Disallow: /
User-agent: Sqworm
Disallow: /
User-agent: SuperBot
Disallow: /
User-agent: SuperHTTP
Disallow: /
User-agent: Surfbot
Disallow: /
User-agent: Szukacz
Disallow: /
User-agent: Teleport
Disallow: /
User-agent: Teleport Pro
Disallow: /
User-agent: TeleportPro
Disallow: /
User-agent: Telesoft
Disallow: /
User-agent: The Intraformant
Disallow: /
User-agent: The.Intraformant
Disallow: /
User-agent: TheNomad
Disallow: /
User-agent: TightTwatBot
Disallow: /
User-agent: Titan
Disallow: /
User-agent: True_Robot
Disallow: /
User-agent: True_Robot
Disallow: /
User-agent: TurnitinBot
Disallow: /
User-agent: URL Control
Disallow: /
User-agent: URL_Spider_Pro
Disallow: /
User-agent: URLy Warning
Disallow: /
User-agent: URLy.Warning
Disallow: /
User-agent: VCI
Disallow: /
User-agent: VCI WebViewer VCI WebViewer Win32
Disallow: /
User-agent: VoidEYE
Disallow: /
User-agent: WWW-Collector-E
Disallow: /
User-agent: WWWOFFLE
Disallow: /
User-agent: Web Image Collector
Disallow: /
User-agent: Web Sucker
Disallow: /
User-agent: Web.Image.Collector
Disallow: /
User-agent: WebAuto
Disallow: /
User-agent: WebBandit
Disallow: /
User-agent: WebCapture
Disallow: /
User-agent: WebCopier
Disallow: /
User-agent: WebEMailExtrac
Disallow: /
User-agent: WebEnhancer
Disallow: /
User-agent: WebFetch
Disallow: /
User-agent: WebGo IS
Disallow: /
User-agent: WebLeacher
Disallow: /
User-agent: WebReaper
Disallow: /
User-agent: WebSauger
Disallow: /
User-agent: WebStripper
Disallow: /
User-agent: WebWhacker
Disallow: /
User-agent: WebZIP
Disallow: /
User-agent: WebZip
Disallow: /
User-agent: WebmasterWorldForumBot
Disallow: /
User-agent: Website Quester
Disallow: /
User-agent: Website eXtractor
Disallow: /
User-agent: Website.Quester
Disallow: /
User-agent: Webster Pro
Disallow: /
User-agent: Webster.Pro
Disallow: /
User-agent: Wget
Disallow: /
User-agent: Widow
Disallow: /
User-agent: Xaldon WebSpider
Disallow: /
User-agent: Xenu's
Disallow: /
User-agent: Zeus
Disallow: /
User-agent: Zeus Link Scout
Disallow: /
User-agent: asterias
Disallow: /
User-agent: b2w
Disallow: /
User-agent: cosmos
Disallow: /
User-agent: eCatch
Disallow: /
User-agent: eCatch/3.0
Disallow: /
User-agent: hloader
Disallow: /
User-agent: httplib
Disallow: /
User-agent: humanlinks
Disallow: /
User-agent: ia_archiver
Disallow: /
User-agent: larbin
Disallow: /
User-agent: libWeb/clsHTTP
Disallow: /
User-agent: lwp-trivial
Disallow: /
User-agent: moget
Disallow: /
User-agent: pavuk
Disallow: /
User-agent: pcBrowser
Disallow: /
User-agent: psbot
Disallow: /
User-agent: searchpreview
Disallow: /
User-agent: spanner
Disallow: /
User-agent: suzuran
Disallow: /
User-agent: tAkeOut
Disallow: /
User-agent: toCrawl
Disallow: /
User-agent: turingos
Disallow: /
User-agent: webfetch
Disallow: /
User-agent: wget
Disallow: /
Dernière édition par franceradio le Sam Juil 21, 2007 12:33, édité 1 fois.
Plus simple :
- Code: Tout sélectionner
User-agent: Mediapartners-Google*
Disallow:
User-agent: Fasterfox
User-agent: Alexibot
User-agent: asterias
User-agent: BackDoorBot/1.0
User-agent: Black Hole
User-agent: BlowFish/1.0
User-agent: BotALot
User-agent: BuiltBotTough
User-agent: Bullseye/1.0
User-agent: BunnySlippers
User-agent: Cegbfeieh
User-agent: CheeseBot
User-agent: CherryPicker
User-agent: CherryPickerElite/1.0
User-agent: CherryPickerSE/1.0
User-agent: CopyRightCheck
User-agent: cosmos
User-agent: Crescent
User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
User-agent: DISCo Pump 3.1
User-agent: DittoSpyder
User-agent: EmailCollector
User-agent: EmailSiphon
User-agent: EmailWolf
User-agent: EroCrawler
User-agent: ExtractorPro
User-agent: Foobot
User-agent: Harvest/1.5
User-agent: hloader
User-agent: httplib
User-agent: humanlinks
User-agent: InfoNaviRobot
User-agent: JennyBot
User-agent: Kenjin Spider
User-agent: LexiBot
User-agent: libWeb/clsHTTP
User-agent: LinkextractorPro
User-agent: LinkScan/8.1a Unix
User-agent: LinkWalker
User-agent: lwp-trivial
User-agent: lwp-trivial/1.34
User-agent: Mata Hari
User-agent: Microsoft URL Control - 5.01.4511
User-agent: Microsoft URL Control - 6.00.8169
User-agent: MIIxpc
User-agent: MIIxpc/4.2
User-agent: Mister PiX
User-agent: moget
User-agent: moget/2.1
User-agent: NetAnts
User-agent: NetAttache
User-agent: NetAttache Light 1.1
User-agent: NetMechanic
User-agent: NICErsPRO
User-agent: Offline Explorer
User-agent: Openfind
User-agent: Openfind data gathere
User-agent: ProPowerBot/2.14
User-agent: ProWebWalker
User-agent: psbot
User-agent: QueryN Metasearch
User-agent: RepoMonkey
User-agent: RepoMonkey Bait & Tackle/v1.01
User-agent: RMA
User-agent: SiteSnagger
User-agent: SpankBot
User-agent: spanner
User-agent: SuperBot
User-agent: SuperBot/2.6
User-agent: suzuran
User-agent: Szukacz/1.4
User-agent: Teleport
User-agent: Telesoft
User-agent: The Intraformant
User-agent: TheNomad
User-agent: TightTwatBot
User-agent: Titan
User-agent: toCrawl/UrlDispatcher
User-agent: True_Robot
User-agent: True_Robot/1.0
User-agent: turingos
User-agent: URLy Warning
User-agent: VCI
User-agent: VCI WebViewer VCI WebViewer Win32
User-agent: Web Image Collector
User-agent: WebAuto
User-agent: WebBandit
User-agent: WebBandit/3.50
User-agent: WebCopier
User-agent: webcopy
User-agent: WebEnhancer
User-agent: WebmasterWorldForumBot
User-agent: webmirror
User-agent: WebReaper
User-agent: WebSauger
User-agent: website extractor
User-agent: Website Quester
User-agent: Webster Pro
User-agent: WebStripper
User-agent: WebStripper/2.02
User-agent: WebZip
User-agent: WebZip/4.0
User-agent: Wget
User-agent: Wget/1.5.3
User-agent: Wget/1.6
User-agent: WinHTTrack
User-agent: WWW-Collector-E
User-agent: Xenu's
User-agent: Xenu's Link Sleuth 1.1c
User-agent: Zeus
User-agent: Zeus 32297 Webster Pro V2.9 Win32
User-Agent: MJ12bot
User-agent: HTTrack
User-agent: HTTrack 3.0
User-agent: TurnitinBot
User-agent: QuepasaCreep
Disallow: /
Salut,
merci pour cette info mais:
je n'y connait rien en "bot aspireur"!
Puis je vous faire confiance aveuglément et appliquer votre technique?
Etes vous sur que votre liste ne contient pas des "bot gentils" !
Merci d'avance
ps:
C'est pour autorise adsense sur tout le site?
merci pour cette info mais:
je n'y connait rien en "bot aspireur"!
Puis je vous faire confiance aveuglément et appliquer votre technique?
Etes vous sur que votre liste ne contient pas des "bot gentils" !
Merci d'avance
ps:
- Code: Tout sélectionner
User-agent: Mediapartners-Google*
C'est pour autorise adsense sur tout le site?
- franceradio
- WRInaute passionné

- Messages: 667
- Inscription: Jeu Avr 19, 2007 18:27
Ohax a écrit:Plus simple :
merci,
J'ai modifié le sujet prinicipal pour que les visiteurs utilisent ta méthode dès le début
User-agent: Mediapartners-Google*
Disallow:
Ca sert à autoriser media sur tout le site quelles que soient les règles.
Tu peux mettre mon code originel qui prend en compte cette règle ?
Source : https://www.google.com/adsense/support/ ... swer=10532
Disallow:
Ca sert à autoriser media sur tout le site quelles que soient les règles.
Tu peux mettre mon code originel qui prend en compte cette règle ?
Source : https://www.google.com/adsense/support/ ... swer=10532
SparH a écrit:Salut,
merci pour cette info mais:
je n'y connait rien en "bot aspireur"!
Puis je vous faire confiance aveuglément et appliquer votre technique?
Etes vous sur que votre liste ne contient pas des "bot gentils" !
Merci d'avance
ps:
- Code: Tout sélectionner
User-agent: Mediapartners-Google*
C'est pour autorise adsense sur tout le site?
Ma liste est clean et n'empêche aucun robots dit "gentil" de venir vous sucer de la bande passante et des ressources.
Note : Le contreversé Slurp souvent indiqué dans les listes du net n'est pas présent dans mon fichier robots.txt
Après il faut que les robots respectent le contenu du fichier...
Ce n'est pas toujours le cas, c'est là qu'interviennent d'autres méthodes plus brutes mais ce n'est pas le sujet initial.
Mon robots.txt adojeunz : http://www.adojeunz.com/robots.txt
-

Marie-Aude - WRInaute accro

- Messages: 4942
- Inscription: Lun Juin 05, 2006 14:15
Le fichier robots.txt ne sert à rien contre les robots "pas gentils" puisque par définition ils ne vont pas le respecter. C'est dans le .htacess qu'il faut interdire les robots.
Je ne serait pas aussi catégorique.
Le robots.txt fait déjà pas mal le ménage.
Le htaccess est complémentaire.
Un script php qui compte les pages /ip pour ban si la limite est atteinte est lui même complémentaire au htaccess.
Sinon cela ce passe au niveau du serveur mais c'est une autre histoire et tout le monde ne peu pas le mettre en oeuvre.
Le robots.txt fait déjà pas mal le ménage.
Le htaccess est complémentaire.
Un script php qui compte les pages /ip pour ban si la limite est atteinte est lui même complémentaire au htaccess.
Sinon cela ce passe au niveau du serveur mais c'est une autre histoire et tout le monde ne peu pas le mettre en oeuvre.
Exemple de mon .htaccess
Options -Indexes
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ADSARobot [OR]
RewriteCond %{HTTP_USER_AGENT} ah-ha [NC,OR]
RewriteCond %{HTTP_USER_AGENT} aktuelles [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Anarchie [OR]
RewriteCond %{HTTP_USER_AGENT} ASPSeek [OR]
RewriteCond %{HTTP_USER_AGENT} Alexibot [OR]
RewriteCond %{HTTP_USER_AGENT} Asterias [OR]
RewriteCond %{HTTP_USER_AGENT} ASSORT [NC,OR]
RewriteCond %{HTTP_USER_AGENT} autoemailspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} BackWeb [OR]
RewriteCond %{HTTP_USER_AGENT} BackDoorBot [OR]
RewriteCond %{HTTP_USER_AGENT} Bandit [OR]
RewriteCond %{HTTP_USER_AGENT} BatchFTP [OR]
RewriteCond %{HTTP_USER_AGENT} Black\ Hole [OR]
RewriteCond %{HTTP_USER_AGENT} bdfetch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} big.brother [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Backstreet [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} bmclient [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Boston\ Project [OR]
RewriteCond %{HTTP_USER_AGENT} BlowFish [OR]
RewriteCond %{HTTP_USER_AGENT} BravoBrian\ SpiderEngine\ MarcoPolo [OR]
RewriteCond %{HTTP_USER_AGENT} Buddy [OR]
RewriteCond %{HTTP_USER_AGENT} BotALot [OR]
RewriteCond %{HTTP_USER_AGENT} Bullseye [NC,OR]
RewriteCond %{HTTP_USER_AGENT} bumblebee [NC,OR]
RewriteCond %{HTTP_USER_AGENT} capture [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} Download\ Wonder [OR]
RewriteCond %{HTTP_USER_AGENT} Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} Drip [OR]
RewriteCond %{HTTP_USER_AGENT} DSurf15a [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} Fasterfox [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Memoweb [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebBandit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^WinHTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]
ErrorDocument 403 http://www.---.com/404.htm
ErrorDocument 401 /v-web/errdocs/401.html
ErrorDocument 500 /v-web/errdocs/500.html
ErrorDocument 400 /v-web/errdocs/400.html
ErrorDocument 404 http://www.---.com/404.htm
Options -Indexes
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ADSARobot [OR]
RewriteCond %{HTTP_USER_AGENT} ah-ha [NC,OR]
RewriteCond %{HTTP_USER_AGENT} aktuelles [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Anarchie [OR]
RewriteCond %{HTTP_USER_AGENT} ASPSeek [OR]
RewriteCond %{HTTP_USER_AGENT} Alexibot [OR]
RewriteCond %{HTTP_USER_AGENT} Asterias [OR]
RewriteCond %{HTTP_USER_AGENT} ASSORT [NC,OR]
RewriteCond %{HTTP_USER_AGENT} autoemailspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} BackWeb [OR]
RewriteCond %{HTTP_USER_AGENT} BackDoorBot [OR]
RewriteCond %{HTTP_USER_AGENT} Bandit [OR]
RewriteCond %{HTTP_USER_AGENT} BatchFTP [OR]
RewriteCond %{HTTP_USER_AGENT} Black\ Hole [OR]
RewriteCond %{HTTP_USER_AGENT} bdfetch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} big.brother [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Backstreet [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} bmclient [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Boston\ Project [OR]
RewriteCond %{HTTP_USER_AGENT} BlowFish [OR]
RewriteCond %{HTTP_USER_AGENT} BravoBrian\ SpiderEngine\ MarcoPolo [OR]
RewriteCond %{HTTP_USER_AGENT} Buddy [OR]
RewriteCond %{HTTP_USER_AGENT} BotALot [OR]
RewriteCond %{HTTP_USER_AGENT} Bullseye [NC,OR]
RewriteCond %{HTTP_USER_AGENT} bumblebee [NC,OR]
RewriteCond %{HTTP_USER_AGENT} capture [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} Download\ Wonder [OR]
RewriteCond %{HTTP_USER_AGENT} Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} Drip [OR]
RewriteCond %{HTTP_USER_AGENT} DSurf15a [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} Fasterfox [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Memoweb [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebBandit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^WinHTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]
ErrorDocument 403 http://www.---.com/404.htm
ErrorDocument 401 /v-web/errdocs/401.html
ErrorDocument 500 /v-web/errdocs/500.html
ErrorDocument 400 /v-web/errdocs/400.html
ErrorDocument 404 http://www.---.com/404.htm
-

takeiteasy - WRInaute occasionnel

- Messages: 101
- Inscription: Mer Juil 09, 2003 18:32
Un "mauvais" robot ne peut-il pas se faire passer pour un "gentil" robot ?
Ohax a écrit:SparH a écrit:Salut,
merci pour cette info mais:
je n'y connait rien en "bot aspireur"!
Puis je vous faire confiance aveuglément et appliquer votre technique?
Etes vous sur que votre liste ne contient pas des "bot gentils" !
Merci d'avance
ps:
- Code: Tout sélectionner
User-agent: Mediapartners-Google*
C'est pour autorise adsense sur tout le site?
Ma liste est clean et n'empêche aucun robots dit "gentil" de venir vous sucer de la bande passante et des ressources.
Note : Le contreversé Slurp souvent indiqué dans les listes du net n'est pas présent dans mon fichier robots.txt
Après il faut que les robots respectent le contenu du fichier...
Ce n'est pas toujours le cas, c'est là qu'interviennent d'autres méthodes plus brutes mais ce n'est pas le sujet initial.
Mon robots.txt adojeunz : http://www.adojeunz.com/robots.txt
merci
-

Marie-Aude - WRInaute accro

- Messages: 4942
- Inscription: Lun Juin 05, 2006 14:15
Cela dit les robots gentils... une fois qu'on a google, Yahoo et livesearch, si on en loupe un autre c'est pas trop grave, non ?
30 messages • Page 1 sur 2 • 1, 2
Formation recommandée sur ce thème :
Formation Référencement naturel Google : apprenez une méthode efficace pour optimiser à fond le référencement naturel dans Google de façon durable... Formation animée par Olivier Duffez et Fabien Facériès, experts en référencement naturel.
Tous les détails sur le site Ranking Metrics : programme, prix, dates et lieux, inscription en ligne.
Lectures recommandées sur ce thème :
Consultez la description détaillée des produits ou services de Google suivants : Google TrustRank
- Transformer des simples citations de votre site en liens
Cet outil vous permet de trouver des pages citant votre site mais ne faisant pas (encore) de lien. Il suffira parfois d'un simple mail pour transformer cette simple citation en lien (backlink).
Qui est en ligne
Utilisateurs parcourant ce forum: Aucun utilisateur enregistré et 0 invités



le forum