Register

If this is your first visit, please click the Sign Up now button to begin the process of creating your account so you can begin posting on our forums! The Sign Up process will only take up about a minute of two of your time.

Page 1 of 3 1 2 3 LastLast
Results 1 to 10 of 24
  1. #1
    Junior Member
    Join Date
    Aug 2014
    Posts
    12
    Member #
    39949

    Help with 404 and robots.txt

    Hey guys and girls,

    New to the site and new to website building. Here is my problem:

    I have a old angelfire website that I tried to remove. I know this is a common problem. So I got the old account information after countless emails and now have access to the account and the website. I unpublished the website but now I want to remove the site from google's search results. I have signed up for google's webmaster tools and have authenticated the website.

    I have attempted to create both a robots.txt file and uploaded it to my main directory but that and my custom 404 page do not seem to work. I emailed anglefire and they told me that the hosting service does not support robots.txt... I do not know if that is true or the rep. was just lazy. I am thinking about upgrading to premium and attempting to do something about it with another method but wanted to check first.

    I have been unable to remove the link because it comes back as still active even when I unpublish the site. Google webmasters told me that in order to truly remove indexing I need to get robots.txt and a 404 error to achieve the result I am seeking.

    To summarize, how do I create a 404 error when people visit the main page of my website. Secondly, is there a roundabout way to get a robots.txt to work on anglefire with a premium account. Also, here is some relevant information. I am currently attempting to do this on a Mac. I do have a old pc laying around that I can use notepad on. The text editor on Mac is a pain to use because it does like the .txt file or .htaccess file. I was able to show the hidden .htaccess file with a terminal command. I would appreciate any answers for either Mac or PC and just as an additional note the PC does have wordpress, but I do not know if anglefire supports wordpress (I think its a premium feature?).


    Thanks!
    Last edited by smokiedabear; Aug 21st, 2014 at 01:29 PM.

  2.  

  3. #2
    Member DerrickE's Avatar
    Join Date
    Jul 2007
    Location
    Houston, TX
    Posts
    58
    Member #
    15580
    Liked
    10 times
    robots.txt works with any account as long as you can upload actual files. You don't need any special server configuration. It is just a file accessible to the internet. It just happens to have a specific standardized name that search engines look for.

    Can you view robots.txt in your browser? You should be able to go to http://domain.com/robots.txt and see it.

    For further assistance:
    What does your .htaccess file contain?
    What is the actual url of the site you are wanting to fix?

  4. #3
    Banned
    Join Date
    Jun 2014
    Location
    Laredo, Texas
    Posts
    16
    Member #
    39482
    The purpose of the robots.txt file is to tell the spider robots from google, bing, yahoo what you do not want them to look at. I try to avoid any robots.txt file since i keep only what i need for the website on the server. If you only have what you need on the server, there is no need for robots.txt file.

    My suggestion is following:

    1 ) in Google webmaster, go to crawl section.

    2 ) Go to Fetch As Google.

    3 ) Select Fetch and Render. Please note that you will be asked to index the site. Confirm google to index it.

    4 ) if you get a checkmark in column " render requested " and a complete in column " status" you are good to go.

    Custom 404's are none value added in my opinion. Never use them.

    Quote Originally Posted by smokiedabear View Post
    Hey guys and girls,

    New to the site and new to website building. Here is my problem:

    I have a old angelfire website that I tried to remove. I know this is a common problem. So I got the old account information after countless emails and now have access to the account and the website. I unpublished the website but now I want to remove the site from google's search results. I have signed up for google's webmaster tools and have authenticated the website.

    I have attempted to create both a robots.txt file and uploaded it to my main directory but that and my custom 404 page do not seem to work. I emailed anglefire and they told me that the hosting service does not support robots.txt... I do not know if that is true or the rep. was just lazy. I am thinking about upgrading to premium and attempting to do something about it with another method but wanted to check first.

    I have been unable to remove the link because it comes back as still active even when I unpublish the site. Google webmasters told me that in order to truly remove indexing I need to get robots.txt and a 404 error to achieve the result I am seeking.

    To summarize, how do I create a 404 error when people visit the main page of my website. Secondly, is there a roundabout way to get a robots.txt to work on anglefire with a premium account. Also, here is some relevant information. I am currently attempting to do this on a Mac. I do have a old pc laying around that I can use notepad on. The text editor on Mac is a pain to use because it does like the .txt file or .htaccess file. I was able to show the hidden .htaccess file with a terminal command. I would appreciate any answers for either Mac or PC and just as an additional note the PC does have wordpress, but I do not know if anglefire supports wordpress (I think its a premium feature?).


    Thanks!

  5. #4
    Member DerrickE's Avatar
    Join Date
    Jul 2007
    Location
    Houston, TX
    Posts
    58
    Member #
    15580
    Liked
    10 times
    Some search engines (google is especially bad about this) will not reindex your site without a robots.txt. They don't want to accidentally index something you don't want them to or in case your site has a temporary issue and is returning 500 errors. If they try to get a robots.txt and they can't see it. They will normally come back later to try again. Robots.txt is the most important file on your site.

  6. #5
    Junior Member
    Join Date
    Aug 2014
    Posts
    12
    Member #
    39949
    Quote Originally Posted by DerrickE View Post
    robots.txt works with any account as long as you can upload actual files. You don't need any special server configuration. It is just a file accessible to the internet. It just happens to have a specific standardized name that search engines look for.

    Can you view robots.txt in your browser? You should be able to go to http://domain.com/robots.txt and see it.

    For further assistance:
    What does your .htaccess file contain?
    What is the actual url of the site you are wanting to fix?
    Yes, I was able to go to the link you provided and see the usual information that is included in a robots.txt, the problem is that I cannot send a 404 unless I create one with anglefire and upload it, but I was not able to get the 404 error, just your directory could not be found message, I do not know if that is the same thing but google does not think so because they said that my site is still live. I do not want to give out the URL to my site because it contains my real name. I need help creating a .htaccess file because I have been unable to create one that works on anglefire and their outdated interface.

    Thanks btw for the useful info!

  7. #6
    Junior Member
    Join Date
    Aug 2014
    Posts
    12
    Member #
    39949
    I should also clarify that I need to create a true 404 error not just a page that displays the "404 and page not found." I just added <meta name="robots" content="no index"> to my index.html and a 404 html command but I do not know if that will work on getting the link removed I just requested removal and I will see what happens.

  8. #7
    Junior Member
    Join Date
    Aug 2014
    Posts
    12
    Member #
    39949
    So a quick update, I was able to block google from indexing using the meta robots no index html command. What can I put into my code that will return a true 404 and not a soft 404? I want to be able to break my website so that google will accept my removal request.. I know that sounds strange because people don't want 404's but I want to make my site both inaccessible and prevent indexing.

  9. #8
    Member DerrickE's Avatar
    Join Date
    Jul 2007
    Location
    Houston, TX
    Posts
    58
    Member #
    15580
    Liked
    10 times
    I'd go with the 403 forbidden but if you want to do 404 it's also listed. Pick one not both. Remove all files except .htaccess and robots.txt

    Return 403 - forbidden errors
    Code:
    ErrorDocument 403 "forbidden"
    RewriteEngine On
    RewriteCond %{REQUEST_FILENAME} -f
    RewriteRule ^ - [F]
    Return 404 not found
    Code:
    ErrorDocument 404 "Not found"
    RewriteEngine On
    RewriteCond %{REQUEST_FILENAME} -f 
    RewriteRule (.*) - [R=404,L]

  10. #9
    Junior Member
    Join Date
    Aug 2014
    Posts
    12
    Member #
    39949
    I am also considering a 410 error does anyone know or have some html code that I could cut and paste and add in the index.html path so that it does return a 410

  11. #10
    Junior Member
    Join Date
    Aug 2014
    Posts
    12
    Member #
    39949
    this is what I have in the code so far what do I have to add to break the site? BTW I am editing the index.html file itself instead of going with a robots.txt because anglefire told me its not supported.

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html>
    <head><meta name="robots" content="noindex">
    <title>Untitled Document</title>
    </head><meta name="robots" content="noindex">

    <body>

    <html>
    <head><meta name="robots" content="noindex">
    <title>404 Page Not Found</title>
    <body>

    <H2>We're sorry but the page you're looking for could not be found</H2>

    </body>
    </html>

    </body>
    </html>




    Again Thanks Derrick!


Page 1 of 3 1 2 3 LastLast

Remove Ads

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
All times are GMT -6. The time now is 06:27 AM.
Powered by vBulletin® Version 4.2.3
Copyright © 2019 vBulletin Solutions, Inc. All rights reserved.
vBulletin Skin By: PurevB.com