How to render an HTML file offline? -
i have collection of html files gathered website using wget. each file name of form details.php?id=100419&cid=13%0d, id , cid varies. portions of html files contain articles in asian language (unicode text). intention extract asian-language text only. dumping rendered html using command-line browser first step have thought of. eliminate of frills.
the problem is, cannot dump rendered html file (using, say, w3m -dump ). dumping works if direct browser (at command-line) formed url : http://<blah-blah>/<filename>
. way have spend time download files once again web. how around this, other tools use?
w3m -dump <filename>
complains saying: w3m: can't load details.php?id=100419&cid=13%0d.
file <filname>
shows: details.php?id=100419&cid=13%0d: non-iso extended-ascii html document text, long lines, crlf, cr, lf, nel line terminators
Comments
Post a Comment