Register

If this is your first visit, please click the Sign Up now button to begin the process of creating your account so you can begin posting on our forums! The Sign Up process will only take up about a minute of two of your time.

Results 1 to 5 of 5
  1. #1
    Junior Member
    Join Date
    Apr 2005
    Posts
    3
    Member #
    9706
    I have a similar problem to the person who started this thread:
    http://www.webdesignforums.net/php_a...age_19048.html

    I would like to extract information from a a webpage, but the URL is not known initially. Instead I use Google's 'I'm Feeling Lucky' script to produce a page for me - this is the page I would like to extract from.

    For example, if I wanted to extract information from the page produced using:
    http://www.google.co.uk/search?hl=en...UK%7CcountryGB
    how can I set the $url variable in my program to be equal to the page produced by the above address? I've tried just setting it directly (i.e. $url = "http://www.google.co.uk/search?hl=en&q=web design forums&btnI=I%27m+Feeling+Lucky&meta=cr%3DcountryU K%7CcountryGB"), but this doesn't work - I receive errors.

    Any help would be GREATLY appreciated.

    Thanks.

    Tom

  2.  

  3. #2
    Senior Member rosland's Avatar
    Join Date
    Jul 2003
    Location
    Norway
    Posts
    1,944
    Member #
    2096
    I'm not sure I understand.

    What kind of info are you looking to extract?
    A reg-ex has to be tailormade to the page/document you're working with. That means you have to know the structure of the page when designing your reg-ex.

    If you try to lift info from randomly generated pages, that's close to impossible.
    Of course you can do simple stuff like lifting the page title by extracting info enclosed by <title></title> or similar, but more complex operations require knowledge of the page structure.
    S. Rosland

  4. #3
    Junior Member
    Join Date
    Apr 2005
    Posts
    3
    Member #
    9706
    Quote Originally Posted by rosland
    I'm not sure I understand.

    What kind of info are you looking to extract?
    A reg-ex has to be tailormade to the page/document you're working with. That means you have to know the structure of the page when designing your reg-ex.

    If you try to lift info from randomly generated pages, that's close to impossible.
    Of course you can do simple stuff like lifting the page title by extracting info enclosed by <title></title> or similar, but more complex operations require knowledge of the page structure.

    Sorry, I should have mentioned that the pages I want to extract from are from the same site. I add 'site: www.examplesite.com' to the Google search string. Therefore the pages generated have pretty much the same structure. The exact page from the site varies though, depending on the user.

    I have three frames on my site. In the first frame are links, the second is the main frame and the third is an invisible frame. One link opens a PHP file in the invisible frame. This file sets the header to be equal to a Google 'I'm Feeling Lucky' search (e.g. http://www.google.co.uk/search?hl=en...K%7CcountryGB). Therefore, the page returned using this URL is now open in the invisible frame. I've opened it in an invisible frame so I can extract some text from it to display in the main frame but I don't know how to get the URL of the invisible frame's page.

    For example, the scraper (http://www.webdesignforums.net/showthread.php?t=10555) PHP file opens in the mainFrame and I want the $url variable to refer to the page open in the invisible frame, but I don't know how to.

    Any help is (again) greatly appreciated.

    Thanks.

    Tom

  5. #4
    Senior Member rosland's Avatar
    Join Date
    Jul 2003
    Location
    Norway
    Posts
    1,944
    Member #
    2096
    If you add the following script on a page, it will return the complete URL.
    PHP Code:
    //URL= www.someplace.net/sketches/artwork/file.html
    $_SERVER['SERVER_NAME']; //returns: www.someplace.net
    $_SERVER['PHP_SELF']; //returns: sketches/artwork/file.html
    //Example
    $a "http://".$_SERVER['SERVER_NAME'].$_SERVER['PHP_SELF'];
    echo 
    $a//outputs the complete address 
    S. Rosland

  6. #5
    Junior Member
    Join Date
    Apr 2005
    Posts
    3
    Member #
    9706
    Quote Originally Posted by rosland
    If you add the following script on a page, it will return the complete URL.
    PHP Code:
    //URL= www.someplace.net/sketches/artwork/file.html
    $_SERVER['SERVER_NAME']; //returns: www.someplace.net
    $_SERVER['PHP_SELF']; //returns: sketches/artwork/file.html
    //Example
    $a "http://".$_SERVER['SERVER_NAME'].$_SERVER['PHP_SELF'];
    echo 
    $a//outputs the complete address 
    Thank you very much. I'll try that out.


Remove Ads

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
All times are GMT -6. The time now is 06:23 PM.
Powered by vBulletin® Version 4.2.3
Copyright © 2019 vBulletin Solutions, Inc. All rights reserved.
vBulletin Skin By: PurevB.com