Register

If this is your first visit, please click the Sign Up now button to begin the process of creating your account so you can begin posting on our forums! The Sign Up process will only take up about a minute of two of your time.

Results 1 to 6 of 6
  1. #1
    Junior Member
    Join Date
    Feb 2009
    Posts
    3
    Member #
    18349
    I need a method for scraping some statistical data out of tables on a website and place it onto another website that I wish to optimize for the iPhone. This information gets updated frequently, so it can't be a one time thing, but something that gets pulled everytime the iPhone optimized page is opened. I don't have access to the data source, so I won't be able to get the information from there. Does anyone have any ideas for how to go about that?

    I have played around a little bit with Google Spreadsheet which has a simple ImportHTML() function so I imagine there should be a fairly easy option out there for doing it onto a website.

    The ImportHTML() function/formula went something like:

    =ImportHTML("http://url", "table", 8)

    8 being the 8th table in the page that it pulls the data from. Then it filled the spreadsheet with all of the data how it's displayed in the table.

  2.  

  3. #2
    Senior Member
    Join Date
    Jun 2005
    Location
    Atlanta, GA
    Posts
    4,146
    Member #
    10263
    Liked
    1 times
    What language are you using to parse it? If you're using Ruby, the Hpricot and nokogiri gems are both useful for scraping pages, and they can be used with scRUBYt! for optimally nice syntax.

    Other languages vary in their support (Perl has WWW::Mechanize, for example), but just about every language out there has an XML parser that will work if the page you're looking to scrape is well-formed XML. If it's poorly formed XML, the aforementioned libraries will all work, but some languages don't have quite as sleek support for regular SGML-style HTML tag soup.

  4. #3
    Junior Member
    Join Date
    Feb 2009
    Posts
    3
    Member #
    18349
    Thanks for the quick reply.

    Well, I don't have any particular language that I'm trying to do it in. More likely languages I'm trying not to do it in. The website I will be making is going to be just a plain and simple mobile optimized website for the iPhone for an app I'm making. No prettiness, just plain data that I can format on the screen. I'm not really proficient in any web-based language other than HTML and a little CSS so I'm looking for the simplest thing that I can implement without too much trouble. I'd prefer something javascript I imagine because I'm hosting the pages on a MobileMe account and that won't allow for php, perl, etc. I don't believe.

  5. #4
    Senior Member
    Join Date
    Jun 2005
    Location
    Atlanta, GA
    Posts
    4,146
    Member #
    10263
    Liked
    1 times
    Actually Javascript is a perfectly fair choice, with a little intermediary action. Basically, you will need something server side in some language that will be able to feed you the table section to do something with. You can use AJAX to call your site and request the table contents. Server-side, you will need a script that will basically fetch the page, strip everything before and after the body, and return it. Then, the Javascript can set the innerHTML of a hidden element (hidden via CSS [minicode]display: none[/minicode]) to the returned data.

    Once the HTML becomes a part of the document body, you can use Javascript to parse it the way you do anything else on the page (you can ask specific questions about that when you get to it).

    The biggest problem, of course, is going to be getting that page data. Unfortunately, you can't just embed an iframe in the page, because your Javascript will not be able to drill into it due to cross-site scripting protection in the browser. You also won't be able to do an AJAX request to another domain for the same reason. So I'm thinking some intermediary script will actually be necessary, meaning hosting this on a MobileMe account may not work.

    If you're willing to update the information often manually (or have a script do so), then you can actually have a script that runs away from MobileMe fetch the page, strip the uninteresting stuff out, and then upload the resulting file to MobileMe so that it can be fetched via AJAX.

  6. #5
    Junior Member
    Join Date
    Feb 2009
    Posts
    3
    Member #
    18349
    Well, I'm going to be doing this for schedules and rosters for 16 different sports so I won't really have time to for this to be a manual process everytime the information gets updated.

    If it's going to be more difficult with javascript then I guess it wouldn't be out of the question to look into a different host that would allow me to use Ruby or Perl.

  7. #6
    Senior Member
    Join Date
    Jun 2005
    Location
    Atlanta, GA
    Posts
    4,146
    Member #
    10263
    Liked
    1 times
    Well, for the AJAX-JS based solution, you actually only really need a PHP host that can handle what will probably be relatively low traffic. Host most of your stuff on MobileMe and then hand off just that one request to PHP, which can do the fetch from the other site and the parsing.

    The advantage of this is that a really cheap shared PHP host is.. Well... Really cheap :-D


Remove Ads

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
All times are GMT -6. The time now is 12:03 PM.
Powered by vBulletin® Version 4.2.3
Copyright © 2019 vBulletin Solutions, Inc. All rights reserved.
vBulletin Skin By: PurevB.com