Register

If this is your first visit, please click the Sign Up now button to begin the process of creating your account so you can begin posting on our forums! The Sign Up process will only take up about a minute of two of your time.

Results 1 to 6 of 6
  1. #1
    Member
    Join Date
    Jun 2003
    Location
    Madison, WI, USA
    Posts
    30
    Member #
    1590
    Hi,

    Some time ago, I wrote a simple page hit counting script that used PHP/MySQL. All I wanted was to track the # of hits on a page, and use a timeout value that prevented a user from refreshing to bump the count. I set the timeout to 15 minutes, so it tracks IPs and checks to see if the same IP has accessed that same page within the timeout period, and if so, does *not* count a hit, but re-sets the timeout. Otherwise, it records a hit.

    It also uses a "page ID" feature, so I could use the same script for as many pages across the site as I'd like. So on page One, for example, I would call it, counter.php?pageid=1. On page Two, counter.php?pageid=2 and so on. Once again, very simple, and when I create a new page, all I need to do is make a new unique number, and it automatically starts it off at zero.

    So all of that sounds good, right? Yup.

    However, now there's a problem, one I hadn't previously foreseen. Now that it has been running for awhile, it is having huge performance issues. Observe:

    Code:
    mysql> select count(*) from counter;
    +----------+
    | count(*) |
    +----------+
    |   807245 |
    +----------+
    1 row in set (0.00 sec)
    Counting the entire amount of rows in the database is still an efficient query. However, here is what happens when we look at a particular page:

    Code:
    mysql> select count(*) from counter where pageid=0;
    +----------+
    | count(*) |
    +----------+
    |   453526 |
    +----------+
    1 row in set (6.01 sec)
    Ouch. It's killing my page-load times, and it's in dire need of some rethinking. One option, of course, is to create a separate table for each page I want to use, but that's not really an option I'd like.

    My next idea is to create a second table, and use it as a counter cache. When a user visits a counted page, the new process would be something like:

    - browser records IP/host and pageid
    - script checks database for matching IP/timestamp
    - if the IP has not visited the page within the timeout period, increment the count in the cache table, and insert a new row into the data table with the ip/timestamp info
    - grab the cached counter for this page, and display

    So if anybody actually read this far (haha), what do you think? Is there a better way to do this that I'm overlooking? I honestly have not looked at other counter scripts too much, mainly because they were full of fancy user interfaces and other garbage that I don't need or want. I just want something that's simple, effective, and fast, and that I can add new page counts to simply by putting a different number on a page.

    Any feedback is welcomed, of course.
    futureal@rctech.net | R/C Tech

  2.  

  3. #2
    WDF Staff smoseley's Avatar
    Join Date
    Mar 2003
    Location
    Boston, MA
    Posts
    9,729
    Member #
    819
    Liked
    205 times
    Try putting an CLUSTERED INDEX on pageid and querying for COUNT(pageid) instead of COUNT(*). That should improve performance significantly.

    Also, why are you using a timeout instead of inserting the Session ID into the database and cross-referencing that way?

    Finally, there are some issues on pipeline performance that have to be discussed. A "pipeline" is a type of table that's used primarily for recording transactions. All it does is stack up records. So all you should do to it in production is INSERTs. Use an INT field for your page_id. Do not create an AUTO_INCREMENT id. There's no need for it in a pipeline table since you won't be selecting against it. You should not have a primary key on this table either. Do not SELECT to check data prior to INSERT. That could kill you. Instead, store the datetime of the transaction so that you can parse your data when reporting against it. You don't need to know at the time someone's visiting whether they've visited before.

  4. #3
    WDF Staff smoseley's Avatar
    Join Date
    Mar 2003
    Location
    Boston, MA
    Posts
    9,729
    Member #
    819
    Liked
    205 times
    Final thought. Why call your counter page like this: "counter.php?pageid=1" ? This will require that you open an iFrame or dialog for each page to use the counter. Instead, if your site is already in php, you could just <?php include 'counter.php' ?> and have a function like this:
    PHP Code:
    function addHit(scriptName) {

    Then, you could just call addHit() with the script's name (gotten from the SERVER_VARIABLES collection) and blammo, you're done.

    Don't submit the pageId from the page. Then you have to create a "pages" table, and decipher which page goes with which id.

    In fact, having to join with this table to get your reports after the fact will slow you down immensely.

    You should probably store the pageid AND the pagename in the counter table (denormalizing the data) to optimize it for reporting.

    You should be able to run a count on 800,000 records in under a second with proper indexing, though.

    Hope that helps !!!!

    Steve

  5. #4
    Member
    Join Date
    Jun 2003
    Location
    Madison, WI, USA
    Posts
    30
    Member #
    1590
    I'm not terribly familiar with the different types of indexing on MySQL tables (such as what you mentioned); I'll read up a bit more on it.

    So what you are suggesting is, optimize the process for an insert-only table, rather than selecting to check data and then updating/inserting as I am currently doing. So on the whole, reducing the process to two queries:

    Query 1: Insert new row for this page hit
    Query 2: Count applicable rows for whichever page and display

    Am I following, or am I lost?

    As for calling the script, I am currently calling it as a server side include, such as:

    <!--#include virtual="/counter/procounter.php?id=0&show=yes"-->

    I did not create a pages table; I just added the "pageid" column to the regular hit table. Which is where I was coming up with my queries, something like (not a real query, just off the top of my head):

    select count(*) from table where pageid=0;

    Since I was calling the script along with the pageid, I never worried about storing the page name at all.
    futureal@rctech.net | R/C Tech

  6. #5
    Member
    Join Date
    Jun 2003
    Location
    Madison, WI, USA
    Posts
    30
    Member #
    1590
    Oh, and in regard to sessionid's, I'm not sure what you mean there. I know PHP includes all sorts of session-related functions, but I'm not terribly familiar with how they work. If you point me in the right direction, I'm sure I can figure it out.

    Thanks for all the help!
    futureal@rctech.net | R/C Tech

  7. #6
    Member
    Join Date
    Jun 2003
    Location
    Madison, WI, USA
    Posts
    30
    Member #
    1590
    One other question related to what transio posted, if anybody has the time.

    I did some research on clustered indexes, and it seems that they either are not supported or not mentioned in the MySQL docs for MyISAM tables. I'm not terribly familiar with other table types (in this case, InnoDB).

    How difficult is it to "port" a table from one type to another? What else might I have to take into consideration?

    If anybody can help, thanks!
    futureal@rctech.net | R/C Tech


Remove Ads

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

Search tags for this page

php mysql page hit

Click on a term to search for related topics.
All times are GMT -6. The time now is 10:20 PM.
Powered by vBulletin® Version 4.2.3
Copyright © 2019 vBulletin Solutions, Inc. All rights reserved.
vBulletin Skin By: PurevB.com