Register

If this is your first visit, please click the Sign Up now button to begin the process of creating your account so you can begin posting on our forums! The Sign Up process will only take up about a minute of two of your time.

Results 1 to 3 of 3
  1. #1
    Junior Member
    Join Date
    Feb 2015
    Posts
    1
    Member #
    41753

    Website template suggestions for newpaper site with 10k searchable .pdf files

    Hi , thanks for reading,

    I am creating a website for an archived newspaper . I have approximately 10k .pdf files and am looking for suggestions on site design . I have domain name registered , hosting provided (go daddy) and have been unable to figure out what I thought should be a simple solution .

    looking for a front page , then a "By Year" , "By Month" and "By issue".menus... Also would like to incorporate a search page that returns specified criteria based on the full text pdf files...

    I am looking for any templates suggestions , design advice etc .. I tried Joomla and can't seem to figure out the pdf part .. I'm willing to restart from scratch ... thanks so much - Paul

  2.  

  3. #2
    Unpaid WDF Intern TheGAME1264's Avatar
    Join Date
    Dec 2002
    Location
    Not from USA
    Posts
    14,483
    Member #
    425
    Liked
    2783 times
    Since the content isn't likely to change, you'd probably be best off extracting the text from the content, storing it into a database, and then searching the text in the database. You can display the PDFs based on the text returned, since the text returned will be identical to what's in the PDFs...assuming they're formatted correctly.

    How exactly you do this will depend on what programming languages you're comfortable with and what you want to use for a CMS. There's a tool called Sphider for PHP that allegedly will do what I've outlined, although since I'm not a PHP (and by extension not a Joomla) guy, I've never tried it.
    If I've helped you out in any way, please pay it forward. My wife and I are walking for Autism Speaks. Please donate, and thanks.

    If someone helped you out, be sure to "Like" their post and/or help them in kind. The "Like" link is on the bottom right of each post, beside the "Share" link.

    My stuff (well, some of it): My bowling alley site | Canadian Postal Code Info (beta)

  4. #3
    WDF Staff mlseim's Avatar
    Join Date
    Apr 2004
    Location
    Cottage Grove, Minnesota
    Posts
    7,720
    Member #
    5580
    Liked
    718 times
    PHP can serve a PDF file in such a manner that the dialog box opens for the user .... "open or save-as" dialog box.

    If you renamed your PDF files so the filenames were relevant to the issue, that would make it easy.

    Example: Issue_13_2015_02.pdf

    Issue 13
    year 2015
    month 02

    Now, a search can be done directly with filenames, by using wildcards ( * ) and PHP.

    Doing it this way would not require a database at all unless the actual content of each paper needs to be searched.

    I know of no way to actually load and search a PDF file text content ... at least not with PHP. If there was a server-side script or class to search PDF content, it would require a lot of CPU processing. Maybe more than a shared webhost like GoDaddy would allow? That's a lot of file content to search.

    EDIT:
    Never heard of Sphider, that might be interesting ... look into that.
    I saw this old post from 2005: http://php.livejournal.com/295413.html

    Sphider may require something installed in the server (Apache), which GoDaddy won't support. Also looks like memory usage and PHP configuration would be a deal-breaker for GoDaddy.
    Last edited by mlseim; Feb 09th, 2015 at 03:07 PM.



Remove Ads

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
All times are GMT -6. The time now is 12:19 AM.
Powered by vBulletin® Version 4.2.3
Copyright © 2019 vBulletin Solutions, Inc. All rights reserved.
vBulletin Skin By: PurevB.com