Register

If this is your first visit, please click the Sign Up now button to begin the process of creating your account so you can begin posting on our forums! The Sign Up process will only take up about a minute of two of your time.

Results 1 to 6 of 6
  1. #1
    Member zingmatter's Avatar
    Join Date
    Jun 2005
    Location
    Glasgow
    Posts
    84
    Member #
    10174
    Hi

    I have this code that is supposed to search a string (submitted from a form) and replace the format of a url from for example
    <a href="home">Go to Home page</a>
    gets converted to
    <a href="index.php?pageid=1">Go to Home page</a>.

    Here's the code:
    PHP Code:
    function setInternalLink($txt)  {


        
    $pttrn 'href="[a-z]*">';

        
    //get the value of the match and extract the pagename
            
    if (eregi($pttrn,$txt,$regs)) {
              foreach (
    $regs as $i => $value) {
               
    $startmc $value;
               
    $mc $value;
            
               
    //now replace the pagename with index.asp?id=...
               
    $mc str_replace'href="','',$mc);
               
    $mc str_replace('">','',$mc);
               
    //call function to convert the page name to a pageid
               
    $pid getPageId($mc);
               
    $txt str_replace($startmc,'href="index.php?id=' $pid '">',$txt);
           
              }
            }  
                
        return 
    $txt;  


    If the string being passed to the function has more than one url to parse then only the first url gets done. For some reason the array $regs only ever holds one value, when it's supposed to be storing each match in an array element.

    Can anyone see if I'm missing something really obvious, or is there something wrong with the syntax? I'm stumpted.

    Thanks

  2.  

  3. #2
    Senior Member rosland's Avatar
    Join Date
    Jul 2003
    Location
    Norway
    Posts
    1,944
    Member #
    2096
    I may be wrong, but I think 'eregi' only stores the first match in a string, at least when I try to replicate your problem. (It might have stored more matches in earlier versions)
    PHP Code:
    $string "this is a test is of is multiple is replacement";
    $needle "is";

    eregi($needle$string$result);

    print_r($result); //only contains one array element contrary to preg_match_all 
    If you use 'preg_match' you run into the same problem (although preg_match is faster than eregi), however, you can use 'preg_match_all'.

    You can also replace all at once with preg_replace:
    PHP Code:
    $str 'This is a link <a href="http://site.com">Click here</a> now. 
    Another link<a href="http://www.yahoo.com">here</a>. 
    And a last one<a href="http://google.com">here</a>.'

    print 
    "Original string: $str.<p>"

    $pattern '/(<a href=")(.*)(">)/'
    $replace "$1http://index.php?pid=$3"

    $txt preg_replace($pattern$replace$str, -1$count);  //the ", -1, $count" are optional parameters 

    echo "Altered string: $txt<br>(Number of replacements: $count)"
    I'm not sure how you plan to insert pageID's or organize your pages, as your function getPageId above is undefined, and I haven't offered it too much thought.
    S. Rosland

  4. #3
    Senior Member
    Join Date
    Jun 2005
    Location
    Atlanta, GA
    Posts
    4,146
    Member #
    10263
    Liked
    1 times
    Well, if you only match once, of course you'll get only one match ;-)

    Since you don't have any capturing groups, all you'll have returned is the overall match, and there's only one of those. rosland's will run into a similar issue, but it can probably be fixed something like this:
    PHP Code:
    $pattern ='/(?:<a href="([^"]*)">)+/'
    That should match that whole <a href block one or more times, and within each of those matches it'll look for the href text. The (? is a non-capturing group (I hope -- I'm fairly certain that's the appropriate syntax), so the only captures you get should be those on the href text.

  5. #4
    Member zingmatter's Avatar
    Join Date
    Jun 2005
    Location
    Glasgow
    Posts
    84
    Member #
    10174
    Thanks for that.

    The urls I'm trying to catch will be something like this

    <a href="asingleword">link text</a>

    What should happen is "asingleword" is found, sent to a function which searches the database and finds an id number and returns it. The <a href="asingleword"> then gets replaced by <a href="index.php?id=[the return id number]">. (full urls like http://www.google.com get omitted). I have a reverse function that converts an <a href="index.php?id=[a number]"> to <a href="[the page name as a single word]">. This too only seems to get the first instance.

    My understanding from the php manual at www.php.net was that all matches by ereg() or eregi() get put into an array, but as you say, this doesn't appear to happen. Indeed, doing count($regs) in the code example only returns 1.

    This works a treat in ASP using the regex model and was sure php had an equivalent.

  6. #5
    Senior Member rosland's Avatar
    Join Date
    Jul 2003
    Location
    Norway
    Posts
    1,944
    Member #
    2096
    Quote Originally Posted by Shadowfiend
    Well, if you only match once, of course you'll get only one match ;-)

    Since you don't have any capturing groups, all you'll have returned is the overall match, and there's only one of those. rosland's will run into a similar issue, but it can probably be fixed something like this:
    Actually, the example I provided earlier works fine (i.e replaces all occurences of pattern in the string) :classic:

    With regard to multiple matches, I guess regex's behave slightly different between different languages.
    In PHP, I believe it's the function type itself (eregi, preg_match, etc) that decides how many times a certain pattern is to be recognized in a string.
    In the expression itself, you can add various quantifiers to specify how many mathes you want of a more generalized pattern (\w, \w+, \w*, \w?, \w{n}, etc). Groups are normally used for backreference manipulation.

    ereg() will only store the first occurence of a match in the array (as far as I can see). preg_match() does the same, but is (according to the manual) faster then eregi, even though they both invoke the overhead of the regular expressions engine (contrary to string manipulation through stristr() and their like).

    preg_match_all() stores every occurence of the pattern.

    preg_replace() alters all occurences of pattern.

    (I tried your pattern example Shadowfiend, and it too only returned one match with preg_replace() (think the pattern has to be rewritten to work with eregi)).
    Quote Originally Posted by zingmatter
    My understanding from the php manual at www.php.net was that all matches by ereg() or eregi() get put into an array, but as you say, this doesn't appear to happen. Indeed, doing count($regs) in the code example only returns 1.

    This works a treat in ASP using the regex model and was sure php had an equivalent.
    If you need to store the original content in <a href="*****"> for later DB interaction, I guess preg_match_all() will do the job for you.
    S. Rosland

  7. #6
    Senior Member
    Join Date
    Jun 2005
    Location
    Atlanta, GA
    Posts
    4,146
    Member #
    10263
    Liked
    1 times
    Oh, haha. You used preg_replace :-) I thought you were using another matcher. Mine actually would have run into another issue -- namely that it didn't take the intervening text into account (probably needed a .* on the end inside the non-capturing group, though that relies on non-greedy matching).

    Anyway, matching only one at a time is fairly standard behavior for regular expression engines. Typically they'll let you pass in an offset to match from, so you can move through it that way, or you can structure your regular expression to match multiple times yourself (which is what I was trying to do).


Remove Ads

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
All times are GMT -6. The time now is 07:34 AM.
Powered by vBulletin® Version 4.2.3
Copyright © 2019 vBulletin Solutions, Inc. All rights reserved.
vBulletin Skin By: PurevB.com