Register

If this is your first visit, please click the Sign Up now button to begin the process of creating your account so you can begin posting on our forums! The Sign Up process will only take up about a minute of two of your time.

Results 1 to 3 of 3
  1. #1
    Senior Member krystof's Avatar
    Join Date
    Jul 2005
    Posts
    155
    Member #
    10668
    Liked
    6 times
    I thought I learned about the robots.txt from robotstxt.org several years ago. However, this key article leaves me scratching my head, now that I re-read it:
    http://www.robotstxt.org/wc/exclusion-admin.html

    Question 1: what is meant by a squiggle (~) in this article?

    In particular, this example:
    Disallow: /~joe/

    (~) = replace this squiggle with any letters for this example? Or...
    (~) = actually use this squiggle in a robots.txt file as a wildcard? Or...
    (~) = some URLs actually contain a squiggle...?

    Question 2: can a single directive exclude multiple "image" folders on multiple levels?

    Using the standard method:
    Disallow: /images/
    Disallow: forum/images/
    Disallow: clients/joes-diner/images/
    Disallow: clients/hotel-albion/images/

    Can all of this and more be replaced by the following single line...or something similar...?
    Disallow: images

    Question 3. Is it a good idea to place a blank index.htm file in every "disallowed" folder (such as the images folders) to discourage snooping? And will it discourage snooping?

    I.e., if a folder has no index file--anyone who types that folder in his address bar is fed a convenient list of everything in that folder. I have read that this type of snoop particularly likes to go to the robots.txt file for directions where to snoop. And that an index file that is blank--or has a notice "you are not authorized to enter this directory..."--may significantly deter snooping...?

    Also, by the way...

    I did a search here at WDF for threads with "robots" in the title. I only found this 2004 thread:
    http://www.webdesignforums.net/showthread.php?t=13061

    It seems to me that this statement in that thread is incorrect (?):
    Quote Originally Posted by rosland
    Disallow: /web-design-tips/index.shtml
    It will also exclude all files in the 'web-design-tips folder.
    ref: http://www.searchengineworld.com/rob...s_tutorial.htm
    However I suspect this statement is correct (?):
    Quote Originally Posted by rosland
    Disallow: index.shtml
    Would exclude only that file, regardless of which folder it resides in.

  2.  

  3. #2
    WDF Staff Wired's Avatar
    Join Date
    Apr 2003
    Posts
    7,656
    Member #
    1234
    Liked
    137 times
    (~) = some URLs actually contain a squiggle
    The Rules
    Was another WDF member's post helpful? Click the like button below the post.

    Admin at houseofhelp.com

  4. #3
    Senior Member filburt1's Avatar
    Join Date
    Jul 2002
    Location
    Maryland, US
    Posts
    11,774
    Member #
    3
    Liked
    21 times
    ~ usually denotes a home directory, so on some web hosts, especially crappy ones, your account is at /~username.
    filburt1, Web Design Forums.net founder
    Site of the Month contest: submit your site or vote for the winner!


Remove Ads

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
All times are GMT -6. The time now is 12:04 PM.
Powered by vBulletin® Version 4.2.3
Copyright © 2019 vBulletin Solutions, Inc. All rights reserved.
vBulletin Skin By: PurevB.com