Register

If this is your first visit, please click the Sign Up now button to begin the process of creating your account so you can begin posting on our forums! The Sign Up process will only take up about a minute of two of your time.

Page 1 of 2 1 2 LastLast
Results 1 to 10 of 13

Thread: RegEx ideas

  1. #1
    Senior Member rosland's Avatar
    Join Date
    Jul 2003
    Location
    Norway
    Posts
    1,944
    Member #
    2096
    I'm a bit exhausted over this issue, so the creative/intuitive part of my mind has gone on a well deserved leave.

    The problem at hand is a schedule. The source of the schedule, only leaves two options:
    -present it on-screen
    -print it out

    What I want to do, is transform it into a CSV-file to be imported into MS Outlook, and from there, ported to my cellphone.

    -------------

    Progress so far:
    I've made a "print to file" version of the printed version. This file contains a lot of unwanted characters and a lot of unwanted info.

    Through some RegEx work, I've made the two files below. The 'raw' is the one stripped of unwanted characters, and the other is as far as I've come in refining the file, which among other things have the dates in the correct MS format.

    As you can see, I have a multiple dates followed by the specifics for that day.

    This:
    332 OSL 0805 0900 TRD 347 TRD 0925 1020 OSL

    Means:
    Flight number: 332
    Departure OSL at time 0805, landing TRD at 0900
    Flight number: 347
    Departure TRD at time 0925, landing OSL at 1020
    etc...

    So, based on the 'ExampleRef.txt' file, can anyone come up with a RegEx that splits the information on date, and then makes sub-arrays holding all the relevant flight info as listed above?

    The point is to have a multi dimentional array containing all details for individual days as well as the same on a monthly scale.
    Example array (not relevant to the attached TXT files, but represents the idea:

    Code:
    Array
    (
        [05.23.2006] => Array
            (
                [0] => Array
                    (
                        [0] => OSL
                        [1] => 0800
                        [2] => 0900
                        [3] => TRD
                    )
    
                [1] => Array
                    (
                        [0] => TRD
                        [1] => 1030
                        [2] => 1130
                        [3] => OSL
                    )
    
            )
    
        [05.24.2006] => Array
            (
                [0] => Array
                    (
                        [0] => OSL
                        [1] => 1300
                        [2] => 1600
                        [3] => SPU
                    )
    
                [1] => Array
                    (
                        [0] => SPU
                        [1] => 1700
                        [2] => 2000
                        [3] => CPH
                    )
    
            )
    
    )
    If so, then I can easily incorporate that into a function that feeds a CSV file in a comprehensible Outlook format.
    S. Rosland

  2.  

  3. #2
    Senior Member
    Join Date
    Jun 2005
    Location
    Atlanta, GA
    Posts
    4,146
    Member #
    10263
    Liked
    1 times
    So, why do you need a regular expression for this? I guess I'm confused as to exactly what you're looking for. Can't you just extract the date via a regular expression like `([0-9]{1,2}\.[0-9]{1,2}\.[0-9]{4} (.*)' and then take the second capturing group and explode it by spaces once and then extract every 4 as a separate index? Or do you want a single regular expression that will produce that complete result?

  4. #3
    Senior Member
    Join Date
    Jun 2005
    Location
    Atlanta, GA
    Posts
    4,146
    Member #
    10263
    Liked
    1 times
    For reference, it looks like if you want to do it with a singular regexp, it will be difficult if not impossible, as if you add a quantifier to a subpattern in the PHP regexp implementation, you'll only actually capture the last occurrence of the subpattern :-/

  5. #4
    Senior Member rosland's Avatar
    Join Date
    Jul 2003
    Location
    Norway
    Posts
    1,944
    Member #
    2096
    I was mentally exhausted and sort of 'target fixated' on the organized multi-dimentional array.
    There might be other work arounds.

    I'm open to any solution.

    There has to be some sort of pattern recognition though. If you think you can solve it without using regular expressions, then I'm all ears.

    If you look at the 'raw' file, you will see a lot of unwanted info that has to be removed.
    With regard to what comes after the date in the refined version, it can be a number of things.
    It can be combination of letter blocks like LHR EWR MUC and times 0800 0915 1000, or it can be codes containing numers that do not represent time, like OSL E45 CDG BX45-7 F4 F3 etc, or it can be a combination of all.

    I need to have the script recognize numbers that are representing time, three letter groups as geographical locators, and anything else as info related to the above.

    The solution need not necessarily be one all encompassing regEx. It can be a series of them that through if's and loops build the array, or builds the CSV file. (speed is not an issue at all)

    (if you have access to Outlook, then you can export a couple of days that contains appointments/info to a CSV file, and see how it's structured).

    What I want the final script to do, is access the dumped schedule txt file and spit out the completed CSV in one operation.
    S. Rosland

  6. #5
    Senior Member
    Join Date
    Jun 2005
    Location
    Atlanta, GA
    Posts
    4,146
    Member #
    10263
    Liked
    1 times
    So is the `raw' file is one that's already undergone some parsing, right? Or is it the one that came directly out of Outlook? I just tried to export, but the export to CSV isn't installed on my version of Outlook, and I don't have the CD handy :-/

  7. #6
    Senior Member rosland's Avatar
    Join Date
    Jul 2003
    Location
    Norway
    Posts
    1,944
    Member #
    2096
    Quote Originally Posted by Shadowfiend
    So is the `raw' file is one that's already undergone some parsing, right? Or is it the one that came directly out of Outlook?
    No, that is not from Outlook, but it has undergone some parsing to loose symbols like € $ @ ` etc.

    The previously attached 'raw' file, is produced by a third party program that releases, among other things, a schedule for the following month.
    The options available in this third party program, is either to display the schedule on-screen (see attachment), or send it to printer.

    By adding "print to file" to my printer list, I can produce a simple txt file. That text file is identical to the previously mentioned 'raw' file (after having removed some gibberish).

    The end result (CSV file) would look something like what is listed below. That is a format MS Outlook can read.
    (My version of Outlook is in Norwegian, hence the 'strange' header names)
    Code:
    "Emne","Startdato","Starttidspunkt","Sluttdato","Sluttidspunkt","Vis tid som"
    "BGO - OSL","14.6.2006","09:00:00","14.6.2006","10:30:00","2"
    Attached Images Attached Images
    S. Rosland

  8. #7
    Senior Member
    Join Date
    Jun 2005
    Location
    Atlanta, GA
    Posts
    4,146
    Member #
    10263
    Liked
    1 times
    Okay, now it's making more sense.

    Quote Originally Posted by rosland
    It can be combination of letter blocks like LHR EWR MUC and times 0800 0915 1000, or it can be codes containing numers that do not represent time, like OSL E45 CDG BX45-7 F4 F3 etc, or it can be a combination of all.
    Okay, so what do the codes with numbers represent? Or is it just that the only thing you want is those five pieces of information -- date, departure airport, arrival airport, departure time, arrival time -- and ignoring the rest?

    (Sorry if I'm being a little slow on the uptake.)

  9. #8
    WDF Staff smoseley's Avatar
    Join Date
    Mar 2003
    Location
    Boston, MA
    Posts
    9,729
    Member #
    819
    Liked
    205 times
    Here's what I got.

    Run these 5 regular expressions in sequence:
    Code:
    ^ +[A-Z][a-z]\** +{[0-9]^2}{[A-Z]^3}{[0-9]^2} +[A-Z][0-9]* *\n
    ['\1.\2.20\3'],\n
    
    ^ +[A-Z][a-z]\** +{[0-9]^2}{[A-Z]^3}{[0-9]^2}
    ['\1.\2.20\3',\n
    
    ^.*{[A-Z]^3} +{[0-9]^4} +{[0-9]^4} +{[A-Z]^3}.*\n +[0-9]^2\n
    \t['\1', '\2', '\3', '\4'],\n
    
    {\t\[.*\]},\n\[
    \1],\n[
    
    ^ +[^\[]*\n
    -- Replace this one with nothing... it eliminates "junk" --
    And you wind up with this:
    Code:
    ['25.MAY.2006'],
    ['26.MAY.2006'],
    ['27.MAY.2006'],
    ['28.MAY.2006'],
    ['29.MAY.2006',
    	['OSL', '0805', '0900', 'TRD'],
    	['TRD', '0925', '1020', 'OSL']],
    ['30.MAY.2006',
    	['OSL', '0700', '0755', 'TRD'],
    	['TRD', '0820', '0915', 'OSL'],
    	['OSL', '0945', '1035', 'SVG']],
    ['31.MAY.2006',
    	['SVG', '0640', '0715', 'BGO'],
    	['BGO', '0735', '0835', 'TRD'],
    	['TRD', '0900', '0955', 'BOO'],
    	['BOO', '1015', '1100', 'TOS'],
    Then, just wrap it all in a big array, and you get this:
    PHP Code:
    $schedule = [
        [
    '25.MAY.2006'],
        [
    '26.MAY.2006'],
        [
    '27.MAY.2006'],
        [
    '28.MAY.2006'],
        [
    '29.MAY.2006',
            [
    'OSL''0805''0900''TRD'],
            [
    'TRD''0925''1020''OSL']],
        [
    '30.MAY.2006',
            [
    'OSL''0700''0755''TRD'],
            [
    'TRD''0820''0915''OSL'],
            [
    'OSL''0945''1035''SVG']],
        [
    '31.MAY.2006',
            [
    'SVG''0640''0715''BGO'],
            [
    'BGO''0735''0835''TRD'],
            [
    'TRD''0900''0955''BOO'],
            [
    'BOO''1015''1100''TOS']]
    ]; 

  10. #9
    WDF Staff smoseley's Avatar
    Join Date
    Mar 2003
    Location
    Boston, MA
    Posts
    9,729
    Member #
    819
    Liked
    205 times
    PS - to replace the month names, you'll have to do that programmatically.

    I suggest defining an array like this,

    $months = ['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'etc.'];

    then just replace the name with its array index + 1.

  11. #10
    Senior Member rosland's Avatar
    Join Date
    Jul 2003
    Location
    Norway
    Posts
    1,944
    Member #
    2096
    Hi Steve!

    I'm just checking in from a short pit stop in BGO (haven't got my laptop with me).

    I'll try your suggestion tonight.
    Are your code aimed at the 'raw' file above, or the refined one?
    The refined one is produced through regex's I've already implemented, which BTW already have changed the month names.

    (I haven't got time to read through all your code right now.)

    Ståle

    EDIT:

    Shadowfiend,
    the other information not pertaining to time, is still important.
    The reason I need the script to recognize time blocks, is that in the final CSV file a time reference like 0830 needs to be positioned correctly. Otherwise Outlook won't understand that it's a reference to time, and will be unable to correctly place the other information (like E47, F4, BGO - OSL, etc)
    S. Rosland


Page 1 of 2 1 2 LastLast

Remove Ads

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
All times are GMT -6. The time now is 12:19 AM.
Powered by vBulletin® Version 4.2.3
Copyright © 2019 vBulletin Solutions, Inc. All rights reserved.
vBulletin Skin By: PurevB.com