Register

If this is your first visit, please click the Sign Up now button to begin the process of creating your account so you can begin posting on our forums! The Sign Up process will only take up about a minute of two of your time.

Results 1 to 9 of 9
  1. #1
    Senior Member leprechaun13's Avatar
    Join Date
    May 2005
    Location
    Northampton
    Posts
    487
    Member #
    10058
    Im trying to write a search engine and search bot using the snoopy PHP class.

    I can get the source code of the web page and cant figure out how I would extract the meta data.

    Any help is appreciated
    Regards Phil,


  2.  

  3. #2
    Senior Member
    Join Date
    May 2003
    Location
    UK
    Posts
    2,354
    Member #
    1326
    Depends how you "get the source code".

    If it using file() you could use the line number with the meta data present.

  4. #3
    Senior Member Steax's Avatar
    Join Date
    Dec 2006
    Location
    Bandung, Indonesia
    Posts
    1,207
    Member #
    14572
    Or use a regular expression, which are probably the more specific thing.

    On the other hand, why are you making this?
    Note on code: If I give code, please note that it is simply sample code to demonstrate an effect. It is not meant to be used as-is; that is the programmer's job. I am not responsible to give you support or be held liable for anything that happens when using my code.

  5. #4
    Senior Member leprechaun13's Avatar
    Join Date
    May 2005
    Location
    Northampton
    Posts
    487
    Member #
    10058
    Firstly im using the fetch command in Snoopy to get the source code then html entities to print the source here.


    and steax im not really sure why I am writing a search engine I just felt the urge to. I had to idea of using ereg(); but never used it before.
    Regards Phil,


  6. #5
    Senior Member Steax's Avatar
    Join Date
    Dec 2006
    Location
    Bandung, Indonesia
    Posts
    1,207
    Member #
    14572
    The thing with these devils is that they consume a lot of resources. And the coding is far more extensive then you might imagine. Not to mention extra stuff like allowing robots.txt files to work...

    You will probably need regular expressions to find and follow links all over the web. I highly suggest you learn them..
    Note on code: If I give code, please note that it is simply sample code to demonstrate an effect. It is not meant to be used as-is; that is the programmer's job. I am not responsible to give you support or be held liable for anything that happens when using my code.

  7. #6
    Senior Member
    Join Date
    May 2003
    Location
    UK
    Posts
    2,354
    Member #
    1326
    I wrote a search engine a while back and like Steax says from the outset it sounds fairly simple.

    I wish(ed).

  8. #7
    Senior Member Steax's Avatar
    Join Date
    Dec 2006
    Location
    Bandung, Indonesia
    Posts
    1,207
    Member #
    14572
    A more realistic option would be just to set up a directory, which is actually a lot more optimized than making a search engine. Search engines are easily fooled, as they go searching with their own hands and feet. In a directory, technology just needs to make sure the page exists, and people can judge directly each entry, and rate them as necessary.

    Besides, maintaining a directory will allow better results - search engines are designed to work in brutal force to pick each page they see. This means both good and bad places. Various engines have found ways to determine the good ones, but directories always have good ones because bad ones are found by hand and kicked out. Google would need an army if they wanted to moderate the pages they list.

    In fact, google does have an army doing this.
    Note on code: If I give code, please note that it is simply sample code to demonstrate an effect. It is not meant to be used as-is; that is the programmer's job. I am not responsible to give you support or be held liable for anything that happens when using my code.

  9. #8
    Senior Member
    Join Date
    Jun 2005
    Location
    Atlanta, GA
    Posts
    4,146
    Member #
    10263
    Liked
    1 times
    A better solution than straight regular expressions is writing (or using) a small framework that allows you to follow links and `click' buttons to submit forms and such. See the likes of WWW::Mechanize in the Perl world and scRUBYt! in the Ruby world. At their base, these work off of simple HTML parsers (see Hpricot). Of course, the words `simple' and `HTML parser' never really belong in the same sentence together... :-P But I'm sure someone's written an HTML parser for PHP somewhere, and you could use that to extract relevant elements and such.

  10. #9
    Senior Member leprechaun13's Avatar
    Join Date
    May 2005
    Location
    Northampton
    Posts
    487
    Member #
    10058
    lol... think i got a php html parser already just googled it, another point steax, snoopy has a function to get alll the links in the page and put them in an array.
    Regards Phil,



Remove Ads

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
All times are GMT -6. The time now is 03:55 AM.
Powered by vBulletin® Version 4.2.3
Copyright © 2019 vBulletin Solutions, Inc. All rights reserved.
vBulletin Skin By: PurevB.com