Register

If this is your first visit, please click the Sign Up now button to begin the process of creating your account so you can begin posting on our forums! The Sign Up process will only take up about a minute of two of your time.

Page 1 of 2 1 2 LastLast
Results 1 to 10 of 13
  1. #1
    Senior Member medlington's Avatar
    Join Date
    Nov 2005
    Location
    Sheffield, UK
    Posts
    377
    Member #
    11968
    Liked
    4 times
    Hi,

    Im wanting to work with a full UK DB of postcodes which has about 1.8 million entries. I want to process these entries so I can list everything within a set distance of a postcode.

    Is this going to be really slow? I notice that just opening the file on my computer took a while and a couple of programmes couldnt handle it but what will MySql make of it?

    Thanks

  2.  

  3. #2
    Senior Member ravensjeff's Avatar
    Join Date
    Aug 2007
    Location
    Maryland
    Posts
    125
    Member #
    15751
    Off the top of my head, I believe mySQL can handle up to about 5 million rows. But that speaks nothing of speed.

    Speed will likely be more largely determined by what you are doing/how you are querying the data, coupled with the speed of your server.

    Having never worked with a table of that size myself, I don't have any benchmarks to provide, but I can recommend that you check with your host/other users at your host to gain experienced opinions on the same setup you'll be using.

  4. #3
    WDF Staff smoseley's Avatar
    Join Date
    Mar 2003
    Location
    Boston, MA
    Posts
    9,729
    Member #
    819
    Liked
    205 times
    Databases are designed to do this type of thing efficiently.

    How fast it works will depend on how you construct your query and how you optimize your table.

    Typically, postal code databases have associated latitude and longitude fields.

    Let's assume your table looks like this:

    postcode
    associated_region_id
    latitutde
    longitude


    You'll want to index the postcode, latitude, and longitude fields for search speed. This will increase the storage size, but will significantly decrease search time.

    So create a table like this:
    Code:
    CREATE TABLE postcodes (
        postcode INT NOT NULL PRIMARY KEY,
        latitude FLOAT NOT NULL INDEX,
        longitude FLOAT NOT NULL INDEX
    );
    We use INT for postcode because it has higher performance in a DB.

    Google's methodology for Android latitude and longitude is to multiply lat and long x 100000 and round. This gives you 6 decimal places of accuracy while using an integer format for performance. Ingenius, but difficult to reproduce in a DB.

    So then you have to do a query for results, based on a latitude, longitude, and radius.

    A stored procedure will yield the best performance. Be careful if you're using a proc with MySQL and PHP 5.2, however - some versions of PHP (I think after 5.2.3) have a bug in the MySQL library that won't retrieve results from a procedure. Pain in the ***!!!

    If you can't get it working in a stored procedure, just use the query.

    Note: radius is in miles:
    Code:
    DELIMITER $$
    CREATE PROCEDURE get_postcodes (IN i_lat FLOAT, IN i_long FLOAT, IN i_radius FLOAT)
    BEGIN
        SELECT *,
            ACOS(SIN( PI()* i_lat/180 )*SIN( PI()*latitude/180 ))
                + (cos(PI()* i_lat/180)*COS( PI()*latitude/180) 
                * COS(PI()*longitude/180-PI()* - i_long/180))
                * 3963.191 AS distance
    FROM postcodes
        WHERE distance < i_radius
        ORDER BY distance ASC
    END$$
    DELIMITER ;
    (Note: not tested)

  5. #4
    WDF Staff smoseley's Avatar
    Join Date
    Mar 2003
    Location
    Boston, MA
    Posts
    9,729
    Member #
    819
    Liked
    205 times
    Quote Originally Posted by ravensjeff
    Off the top of my head, I believe mySQL can handle up to about 5 million rows. .
    MySQL can handle billions of records in a table. If designed properly, speed should never be a concern.

  6. #5
    WDF Staff smoseley's Avatar
    Join Date
    Mar 2003
    Location
    Boston, MA
    Posts
    9,729
    Member #
    819
    Liked
    205 times
    BTW, I'm now creating a zip code proximity search DB for myself, and found this GREAT post on how to use MySQL Geometry classes to do lightning-fast proximity searches.

    http://forum.geonames.org/gforum/posts/list/692.page

    Read the 2nd post.

  7. #6
    Senior Member medlington's Avatar
    Join Date
    Nov 2005
    Location
    Sheffield, UK
    Posts
    377
    Member #
    11968
    Liked
    4 times
    Brilliant, thanks alot mate!

    Thats going to make quoting and actually doing this job so much easier. Im sure Il get stuck somewhere along the way when actually making it as Im only just starting to learn MySQL but Im alot better prepaired now than I was a couple of days ago.

    Thanks again

  8. #7
    Senior Member medlington's Avatar
    Join Date
    Nov 2005
    Location
    Sheffield, UK
    Posts
    377
    Member #
    11968
    Liked
    4 times
    ok, ive made a start on this now.

    I've got over the first hurdle of getting my 2 million rows of data into MySQL but now comes the complicated stuff.

    I've just read your first post smosely and think I should have read it before I created my DB.

    Firstly I dont think INT will work for UK postcodes as they are depicted as letters and numbers so I've used VARCHAR, they are no more than 7 or 8 digits though. Is there a better thing to use? the amount of digits does vary so I dont think CHAR is the correct one to use.

    I've not indexed the columns either, what does this actually do? Im guessing just build some form of index within the DB so that it has a betteer idea of where to start the search from.

    I have been wondering if it would improve performance to have all the postcode data split into seperate tables so that one table had all the postcodes that started with A, then next table started with B etc then when the Lat and Long needed finding I could write a small script that would first check the first letter of the postcode and then search the relevant table rather than searching through all the data. Is this basically what idexing is doing?

    The other thing is that I have only stored postcode, longitude and latitude data, do you think I need the associated_region_id column too?

    Im going to go back and recreate the DB now as you've advised using Indexing as Id much rather get this right from the start rather than trying to botch it later.

    Im sure Il be stuck again though when it comes to the geometry classes as I dont understand classes yet. so much to learn in so little time.

  9. #8
    WDF Staff smoseley's Avatar
    Join Date
    Mar 2003
    Location
    Boston, MA
    Posts
    9,729
    Member #
    819
    Liked
    205 times
    Use a VARCHAR(8) - it will use only one byte of data for allocating the memory, which is good.

    do this: ALTER TABLE mytable ADD NDEX (postalcodecolumn);

    Indexes are very important... they increase seek times by about 100x. The closest metaphor I can draw is that an index is like an index in a book. It's a quick look-up for data in your table, allowing your computer to map items to a location in memory very quickly. You should index any column you ever plan to do searches by.

  10. #9
    WDF Staff smoseley's Avatar
    Join Date
    Mar 2003
    Location
    Boston, MA
    Posts
    9,729
    Member #
    819
    Liked
    205 times
    Oh, also, you may want to use a Unique index for your postal code column.

    You could do this:

    ALTER TABLE mytable CHANGE COLUMN postcodecolumn postcodecolumn VARCHAR(8) NOT NULL UNIQUE;

  11. #10
    Senior Member medlington's Avatar
    Join Date
    Nov 2005
    Location
    Sheffield, UK
    Posts
    377
    Member #
    11968
    Liked
    4 times
    Thanks mate.

    One quick Q, whats the difference between a Primary Key and a Unique key and do I need a primary key aswell if I change the postcode field to a Unique key?


Page 1 of 2 1 2 LastLast

Remove Ads

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
All times are GMT -6. The time now is 08:01 PM.
Powered by vBulletin® Version 4.2.3
Copyright © 2019 vBulletin Solutions, Inc. All rights reserved.
vBulletin Skin By: PurevB.com