Thursday, September 06, 2007

Converting a web page to plain text, like code samples that are in HTML

I came across a very interesting article on using Lynx or Linemode console web browsers to convert a HTML formatted page to plain text. This comes up more often than I would like since I often find code snippets online in HTML form that have been syntax highlighted. However if I just copy these items to the clipboard, then try to paste in vim in a terminal, the spacing gets all messed up. I'm not sure if i'm doing something wrong there, but this seems to be a good alternative.

For example, this page PHP_GNUPlot.htm shows some PHP code I want to use internally. It's syntax highlighted. I'd prefer to have a plain text version that I can scp to my web development box and just start testing.

This is where lynx comes in. Right on my web development box. I run the command

lynx -dump "some-URL" > my-text

Update: I forgot that in Firefox, IE, and Mosaic you can just go to File/Save As to save the page as plain text, which solves your problem right there. Either way, it's still nice to have a way to do this strictly from the terminal, so I'll leave this article up.



Resources:

http://www.w3.org/Tools/html2things.html

Labels: , , ,

0 Comments:

Post a Comment

<< Home