Register

If this is your first visit, please click the Sign Up now button to begin the process of creating your account so you can begin posting on our forums! The Sign Up process will only take up about a minute of two of your time.

Results 1 to 10 of 10
  1. #1
    WDF Staff AlphaMare's Avatar
    Join Date
    Oct 2009
    Location
    Montreal, Canada
    Posts
    4,570
    Member #
    20277
    Liked
    878 times

    Question Stripping tags from code

    OK - this is a fairly strange question, but does anyone have an idea on how to quickly strip all the tags from a coded HTML page?

    I have a new client who has come to me to fix a mess that was created by someone else in some strange software (not DW - even worse for bloat and unnecessary inline coding, if you can imagine that).

    The site consists of mainly text, and this software has inserted so much inline styling crap (spans, <em>, stuff like that) that it is taking me forever to cut out all the tags so I can copy and paste the text into a decent page.

    I did ask if they had the original text files, but they don't - all they have are the HTML files with all these really cruddy tags scattered throughout the paragraphs and even in each sentence. I have tried search and replace, but almost every tag has some unique content that prevents the efficient use of that.

    I really don't want to have to re-type the whole thing, but that may actually be faster that hunting down the tags and removing them.

    Anybody have any ideas about how to speed things up?
    Good design should never say "Look at me!"
    It should say "Look at this." ~ David Craib


    http://digitalinsite.ca ~ my current site . . info@digitalinsite.ca ~ my email

    If you feel that someone's post helped you fix your problem, answered your question, or just made you feel better, feel free to "Like" their post. The "Like" link is at the bottom right of each post, along side the "reply" link. And if you are being helped here, try to help someone else - pass it on!

  2.  

  3. #2
    Senior Member Ronald Roe's Avatar
    Join Date
    Mar 2011
    Location
    Oklahoma City
    Posts
    3,141
    Member #
    27197
    Liked
    959 times
    What about HTML Tidy?

    EDIT: Here's a Stackoverflow article and another HTML cleaning app that may be of use.
    Last edited by Ronald Roe; Jan 09th, 2014 at 01:43 PM.
    Ron Roe
    Web Developer
    "If every app were designed using the same design template, oh wait...Bootstrap."

  4. #3
    WDF Staff AlphaMare's Avatar
    Join Date
    Oct 2009
    Location
    Montreal, Canada
    Posts
    4,570
    Member #
    20277
    Liked
    878 times
    HTML Tidy helps, but there are still a lot of tags left to deal with. I'm really looking for something that will strip ALL of the tags and leave plain text.
    Good design should never say "Look at me!"
    It should say "Look at this." ~ David Craib


    http://digitalinsite.ca ~ my current site . . info@digitalinsite.ca ~ my email

    If you feel that someone's post helped you fix your problem, answered your question, or just made you feel better, feel free to "Like" their post. The "Like" link is at the bottom right of each post, along side the "reply" link. And if you are being helped here, try to help someone else - pass it on!

  5. #4
    Unpaid WDF Intern TheGAME1264's Avatar
    Join Date
    Dec 2002
    Location
    Not from USA
    Posts
    14,483
    Member #
    425
    Liked
    2783 times
    What about the PHP strip_tags() function?
    If I've helped you out in any way, please pay it forward. My wife and I are walking for Autism Speaks. Please donate, and thanks.

    If someone helped you out, be sure to "Like" their post and/or help them in kind. The "Like" link is on the bottom right of each post, beside the "Share" link.

    My stuff (well, some of it): My bowling alley site | Canadian Postal Code Info (beta)

  6. #5
    WDF Staff AlphaMare's Avatar
    Join Date
    Oct 2009
    Location
    Montreal, Canada
    Posts
    4,570
    Member #
    20277
    Liked
    878 times
    Thanks for the suggestion - I'll look into that.
    But right now, given that I have not written any PHP in ages, it is looking like it'll be faster to just re-type the whole f-ing thing.
    Good design should never say "Look at me!"
    It should say "Look at this." ~ David Craib


    http://digitalinsite.ca ~ my current site . . info@digitalinsite.ca ~ my email

    If you feel that someone's post helped you fix your problem, answered your question, or just made you feel better, feel free to "Like" their post. The "Like" link is at the bottom right of each post, along side the "reply" link. And if you are being helped here, try to help someone else - pass it on!

  7. #6
    Senior Member Ronald Roe's Avatar
    Join Date
    Mar 2011
    Location
    Oklahoma City
    Posts
    3,141
    Member #
    27197
    Liked
    959 times
    Well, the output is exactly what you're asking for (just the text), but I ran up this little bad boy with jQuery right fast:
    Code:
    $("body *").each(function(){
    var t=$(this).text();
    $(this).contents().unwrap();
    document.write(t);
    });
    You can just paste that in the JS console. It works, I suppose, but perhaps a little...too stripped down.

    EDIT: Tried to create a bookmarklet, but the forum software isn't parsing the URL properly. If you set the target of a bookmark in your browser to the following code, you can create a bookmarklet that'll strip the tags with one click.

    Code:
    javascript:(function(){document.body.appendChild(document.createElement('script')).src='http://cdn.roedesigns.com/js/stripper.js';})();
    Last edited by Ronald Roe; Jan 09th, 2014 at 05:23 PM.
    Ron Roe
    Web Developer
    "If every app were designed using the same design template, oh wait...Bootstrap."

  8. #7
    WDF Staff AlphaMare's Avatar
    Join Date
    Oct 2009
    Location
    Montreal, Canada
    Posts
    4,570
    Member #
    20277
    Liked
    878 times
    Thankee, both of you! (Ron, Game)

    I actually found this - it works to get ALL the tags out, and even though the paragraph order ends up a bit catawumpus due to nested divs, it makes things much easier.

    I've got most of the pages done now...

    Ron - When I have a minute I'll get back to you and get you to tell me how to run that javascript on my local machine...
    Good design should never say "Look at me!"
    It should say "Look at this." ~ David Craib


    http://digitalinsite.ca ~ my current site . . info@digitalinsite.ca ~ my email

    If you feel that someone's post helped you fix your problem, answered your question, or just made you feel better, feel free to "Like" their post. The "Like" link is at the bottom right of each post, along side the "reply" link. And if you are being helped here, try to help someone else - pass it on!

  9. #8
    Senior Member Ronald Roe's Avatar
    Join Date
    Mar 2011
    Location
    Oklahoma City
    Posts
    3,141
    Member #
    27197
    Liked
    959 times
    Quote Originally Posted by AlphaMare View Post
    Ron - When I have a minute I'll get back to you and get you to tell me how to run that javascript on my local machine...
    If the site has jQuery, you can just pull up the dev tools and paste the code into the JS console.

    If not, copy one of your browser's bookmarks and change the copied bookmark's target to the bookmarklet code I posted. Then, just click the newly created bookmark when you're on the page.

    I have to warn you, though. Every tag that contains text will be removed. Images and forms will still be there, anything inside script tags will be there, and there will be no formatting of any kind. If people find something like this useful outside your use-case, which is the only I've ever heard of, I'll take some time to make it more robust and actually present useful, formatted text.
    Ron Roe
    Web Developer
    "If every app were designed using the same design template, oh wait...Bootstrap."

  10. #9
    WDF Staff AlphaMare's Avatar
    Join Date
    Oct 2009
    Location
    Montreal, Canada
    Posts
    4,570
    Member #
    20277
    Liked
    878 times
    Quote Originally Posted by Ronald Roe View Post
    If the site has jQuery, you can just pull up the dev tools and paste the code into the JS console.

    If not, copy one of your browser's bookmarks and change the copied bookmark's target to the bookmarklet code I posted. Then, just click the newly created bookmark when you're on the page.
    OMG! Facepalm! I should have explained that I have not got this online yet, not even loading in WAMP - I'm just editing it in Notepad++ (and a bit in Word) to get some semblance of the content organized into paragraphs etc.

    Then it's going to be a cut-and-paste into WP (I heard that groan all the way here in Montreal, Game...) - that's the easiest way for this client to get a quick site up that they can maintain themselves. They're a non-profit and have just about no budget, but it's a good cause and I'm not really busy right now so I've decided to give them 2 free days to put together a site. The theme is a purchased one left over from another job a few months ago, and all I have to do is tweak the CSS and arrange the content areas so it suits the topic of the site.
    Good design should never say "Look at me!"
    It should say "Look at this." ~ David Craib


    http://digitalinsite.ca ~ my current site . . info@digitalinsite.ca ~ my email

    If you feel that someone's post helped you fix your problem, answered your question, or just made you feel better, feel free to "Like" their post. The "Like" link is at the bottom right of each post, along side the "reply" link. And if you are being helped here, try to help someone else - pass it on!

  11. #10
    Senior Member Ronald Roe's Avatar
    Join Date
    Mar 2011
    Location
    Oklahoma City
    Posts
    3,141
    Member #
    27197
    Liked
    959 times
    Quote Originally Posted by AlphaMare View Post
    OMG! Facepalm! I should have explained that I have not got this online yet, not even loading in WAMP - I'm just editing it in Notepad++ (and a bit in Word) to get some semblance of the content organized into paragraphs etc.
    The bookmarklet should still work.
    Ron Roe
    Web Developer
    "If every app were designed using the same design template, oh wait...Bootstrap."


Remove Ads

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
All times are GMT -6. The time now is 11:16 PM.
Powered by vBulletin® Version 4.2.3
Copyright © 2019 vBulletin Solutions, Inc. All rights reserved.
vBulletin Skin By: PurevB.com