Register

If this is your first visit, please click the Sign Up now button to begin the process of creating your account so you can begin posting on our forums! The Sign Up process will only take up about a minute of two of your time.

Results 1 to 4 of 4
  1. #1
    Senior Member rosland's Avatar
    Join Date
    Jul 2003
    Location
    Norway
    Posts
    1,944
    Member #
    2096
    My reg-ex skills are a bit rusty.

    I have a large text file with multiple lines holding structured type of information. I need to extract segments of each line, and insert it into a (DB)table.

    The problem is best illustrated with an example.
    Below is an excerpt of some of the information from a couple of lines:
    (each line holds a lot more information than what is shown here)
    Code:
    18 NILSEN SVEIN BOGEN LINECHEC OSL 
    11 INGVALDSEN-TORP ODD Arne FC OSL 
    71 MATHIASSEN PER FLIGHT/SI SVG
    9 SØRENSEN KÅRE CAPTAIN TRD 
    10 NILSEN TIMMY BREEN VP OSL
    I'm working my way through the different elements, so extracting the beginning numbers was obviously easy. Next was extracting the lastnames, which was fairly easy as well (they vary in length, and some are double names seperated by a hyphen).
    Extracting the firstnames was no big deal either, but then:

    The next piece of info is describing their professional function within the company. The easiest way to isolate that word (which vary in length and number of characters), is to look at something they have in common.
    The next 'word' after the descriptive word, will always be one of these three:
    OSL, SVG or TRD.

    The expression I made looks like this:
    /([a-z|\/|\-]+)\sosl|svg|trd/i

    The mystery is that this extracts all info correct except this line:
    71 MATHIASSEN PER FLIGHT/SI SVG
    Instead of returning FLIGHT/SI as it should, it returns SVG.

    However, if I make the following minute alteration to the reg-ex:
    /([a-z|\/|\.|\-]+)\s[osl|svg|trd]/i
    It returns FLIGHT/SI as it should.

    The problem is, that if I run that expression on the other lines, I get weird results!
    One example:
    18 NILSEN SVEIN BOGEN LINECHEC OSL
    Returns NILSEN instead of LINECHEC as the first expression did (?)

    Obviously I need one expression that will extract the correct information from all lines, otherwise the task at hand will be impossible (thousands of lines).

    Any ideas or explanations?

    EDIT:
    Fixed it.
    /([a-z|\/|\.|\-]+)\s(osl|svg|trd)/i
    S. Rosland

  2.  

  3. #2
    Senior Member
    Join Date
    Aug 2003
    Posts
    444
    Member #
    2801
    Hi Rosland

    Try this regex:

    Code:
    /([a-z|\/|\-]+)\s(osl|svg|trd)/i
    I just played with the Regular Expression Online Tester till it worked
    eKstreme
    eKstreme.com - Free website tools!
    fontfox - free fonts Hand-picked quality fonts.

  4. #3
    Senior Member
    Join Date
    Aug 2003
    Posts
    444
    Member #
    2801
    How did you fix it?

    And why did you do it exactly when I posted a fix? Grrrr
    eKstreme
    eKstreme.com - Free website tools!
    fontfox - free fonts Hand-picked quality fonts.

  5. #4
    Senior Member rosland's Avatar
    Join Date
    Jul 2003
    Location
    Norway
    Posts
    1,944
    Member #
    2096
    lol,
    I continued to play around with it until it matched (actually finished the rest of the string matching as well, which is four times longer than the example string posted above)

    I re-edited my post back to reflect the initial problem.
    Like I said, my reg-ex is a bit rusty so I used a square bracket around the OSL|SVG|.. bit, which only returns the first character of a potential match. :cross-eyed:
    I tried a 'look behind' as well, but that failed as it needed an exact number-of-characters argument to match (You can define alternate number of CH, but since that number vary a lot throughout my file, it was not feasible)

    My solution, which I posted in the 'EDIT' section of my post, is pretty much the same as you came up with.
    (I only added an escaped period (.) symbol, as I encounter that further down in the list.)

    Nice link BTW. I'll bookmark that.
    S. Rosland


Remove Ads

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
All times are GMT -6. The time now is 11:52 PM.
Powered by vBulletin® Version 4.2.3
Copyright © 2019 vBulletin Solutions, Inc. All rights reserved.
vBulletin Skin By: PurevB.com