Register

If this is your first visit, please click the Sign Up now button to begin the process of creating your account so you can begin posting on our forums! The Sign Up process will only take up about a minute of two of your time.

Results 1 to 8 of 8
  1. #1
    WDF Staff smoseley's Avatar
    Join Date
    Mar 2003
    Location
    Boston, MA
    Posts
    9,729
    Member #
    819
    Liked
    205 times
    I'm building a web app for a client which will contain a hundred-million record database which will be populated by data imports on the order of a million records each. He wanted to prototype the concept on the cheap using a few million records, so I decided on PHP/MySQL for the first iteration of the project, with an upgrade plan consisting of C++/Oracle upgrades (if necessary).

    Unfortunately, I have little experience doing this scale of app with MySQL... the only multi-million-record database development I've done is for MS-SQL (and a little Oracle and Informix stuff, but negligible). So I'm in unchartered waters!!

    Well, I'm doing my first import test, and after 12 hours, it's only imported about 300,000 records total... It started out great... I was like "piece of cake" but performance seems to be linearly degrading per the size of the database. :huh:

    Some things I'm considering:
    1. Dumping MySQL and going with Postgres, which has stored procedures... not sure if they'd improve insert performance, though!

    2. Finding a way to do batch inserts to improve performance... unfortunately, I'm normalizing the data as I import it, which requires development of relationships... but still.

    3. Eliminating indexing during inserts?

    4. Switching to prepared statements...

    5. Switching from MyIsam to InnoDB (for data integrity and possibly performance increase?)

    Any MySQL experts have any advice?

    Thanks!

  2.  

  3. #2
    Member cctech's Avatar
    Join Date
    Jul 2007
    Location
    North Carolina Coast
    Posts
    70
    Member #
    15488
    We have never done a database that large either, so I am anxious to see how it is resolved.

    Here is a thread on PHP Builder forum that may give you some ideas:
    http://phpbuilder.com/board/showthread.php?t=10342378

  4. #3
    Member
    Join Date
    Apr 2007
    Posts
    96
    Member #
    15165
    Quote Originally Posted by transio
    I'm building a web app for a client which will contain a hundred-million record database which will be populated by data imports on the order of a million records each. He wanted to prototype the concept on the cheap using a few million records, so I decided on PHP/MySQL for the first iteration of the project, with an upgrade plan consisting of C++/Oracle upgrades (if necessary).
    I would suggest that the database reside on its own server and that the server is optimized to handle typical large database loads. I would imagine that a solution from Oracle would push things along a similar line. You might find more information on this website. The following link is related to getting performance out of selects.

    http://www.mysqlperformanceblog.com/...-large-tables/

  5. #4
    Senior Member
    Join Date
    Jun 2005
    Location
    Atlanta, GA
    Posts
    4,146
    Member #
    10263
    Liked
    1 times
    Hmm... As I recall it, InnoDB actually sacrifices performance for the additional integrity. However, if either one deals with transactions, wrapping many inserts in a transaction would be a good plan, as I believe non-transactioned inserts by default flush to the hard drive very often, while transactioned inserts only flush when you commit. This presentation (PDF!) seems like it has some performance hints for various table types and other MySQL aspects. Slide 82 starts talking about various table-type-specific optimizations.

    Where are you doing the normalization? In MySQL or in PHP? If you're doing it in PHP, it's worth considering writing a script (in PHP, Python, Ruby, Perl, whatever) to just go through and generate all the actual insert statements, and then separately insert the actual data.

    According to this post PostgreSQL would likely be faster, and would scale better with additional hardware. In fact, this summary of a real benchmark run on PostgreSQL indicates it comes close to Oracle's performance.

    The trouble with eliminating indexing as you insert is that you'll have to do it afterwards anyway. Nonetheless, that's probably worth playing with as a way to do it -- doing the indexing all at once may in fact help. Here's a post that seems to vaguely address that and some general batch update/insert information. More information on that delay_key_write directive here.

  6. #5
    Senior Member
    Join Date
    Jun 2005
    Location
    Atlanta, GA
    Posts
    4,146
    Member #
    10263
    Liked
    1 times
    kingmundi -- that blog post seems to be more related to reading rather than the large amount of inserts that transio seems to be doing.

  7. #6
    Member
    Join Date
    Apr 2007
    Posts
    96
    Member #
    15165
    Quote Originally Posted by Shadowfiend
    kingmundi -- that blog post seems to be more related to reading rather than the large amount of inserts that transio seems to be doing.
    True, I hit submit before I had time to put in some verbage. Went back and edited it a bit.

    Perhaps though, the nature of this question can not be easily answered. When the number of records start to approach "millions", my impression from working in the education industry is that a dba should spend quality time looking at the nature of the data. Perhaps the table data can be partitioned along a value to make a set of smaller tables?

  8. #7
    WDF Staff smoseley's Avatar
    Join Date
    Mar 2003
    Location
    Boston, MA
    Posts
    9,729
    Member #
    819
    Liked
    205 times
    Quote Originally Posted by Shadowfiend
    Where are you doing the normalization? In MySQL or in PHP? If you're doing it in PHP, it's worth considering writing a script (in PHP, Python, Ruby, Perl, whatever) to just go through and generate all the actual insert statements, and then separately insert the actual data.
    That's what I'm doing right now. I think that's where the problem is arising, because parsing the file and normalizing in the PHP prevents me from doing a batch import.

    I'm gonna try switching to a temporary table / batch import, then normalize and map data from table-to-table once it's in. I think that'll significantly improve the import performance.

  9. #8
    Senior Member
    Join Date
    Jun 2005
    Location
    Atlanta, GA
    Posts
    4,146
    Member #
    10263
    Liked
    1 times
    Well, what I had in mind was doing the normalization and everything and dumping the insert statements you *would* run to a file, and then import the file directly. Let us know how your new attempt turns out


Remove Ads

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
All times are GMT -6. The time now is 10:08 PM.
Powered by vBulletin® Version 4.2.3
Copyright © 2019 vBulletin Solutions, Inc. All rights reserved.
vBulletin Skin By: PurevB.com