[RndTbl] Scrape active web page

Dan Martin ummar143 at shaw.ca
Thu Mar 22 15:04:53 CDT 2012


Problem:  A web page / application presents information needed, but in the wrong format.  

The user can interact with the web page using check boxes and data fields.  When the information appears complete, the user presses a "submit" button and the information is formatted in pdf to be printed out  (call this format 1).

This printout is needed, but in addition the same information (or possibly a few things added) must be printed in a different format (format 2).

The solution I envision:  run this web app in the browser as usual, and interact with it until the information is correct.  The user can then launch a simple app which will find the open web page and scrape it, producing output in format 2.  The user can then proceed by pressing the submit button on the page to output in format 1.

At home on a Unix based system, I would use firewatir and ruby code to do this, starting an older version of firefox with -jssh.  At work, this would be awkward.

I saw a demonstration of PHP to do both scraping and presentation, but it seems to have server-side orientation.  Running Apache or other server software would be out of the question.  I need a client app simple to install, runable on Windows.

A programmable browser would be ideal.  Does anyone know of one that is multi-platform and can be installed without special services / privileges?  Has anyone used XUL for something like this?

Any suggestions are appreciated.

-Dan

Dan Martin
GP Hospital Practitioner
Computer Scientist
ummar143 at shaw.ca
(204) 831-1746
answering machine always on




More information about the Roundtable mailing list