Wednesday, January 7, 2009

Small addition to John Resig's Bringing the Browser to the Server

After reading John Resig's article: http://ejohn.org/projects/bringing-the-browser-to-the-server which was written over a year ago. I wanted to try out some of things he demonstrated.
By combining Mozilla's Rhino project with a few javascript files, John was able to do
a number of useful things like automated javascript testing and web screen scraping.
He also outlined some pseudo code for creating a web app environment.
The one thing he ran out of time was integrating an html parser into this setup.
At his suggestion I have integrated the nekohtml html parser into this setup.
In order to use this setup you should follow these steps:

1. Get jquery source code . Via SVN at http://jqueryjs.googlecode.com/svn/trunk/. Important get REVISION 2302
2. Get Rhino ( source code optional), site: http://www.mozilla.org/rhino/
3. Get Nekohtml source code, site: http://nekohtml.sourceforge.net/
4. Edit /jquery/jquery/build/runtest/env.js (starting at line 135)
window.DOMDocument = function(file){
this._file = file;
//OLD
//this._dom = Packages.javax.xml.parsers.
// DocumentBuilderFactory.newInstance()
// .newDocumentBuilder().parse(file);

//NEW
var parser = new Packages.org.cyberneko.html.parsers.DOMParser();
var source = new Packages.org.apache.xerces.xni.parser.XMLInputSource(null,null,null,file,"UTF8");
parser.parse(source);
this._dom = parser.getDocument();




if ( !obj_nodes.containsKey( this._dom ) )
obj_nodes.put( this._dom, this );
};

5. You can run your javascript file via the Rhino Debugger App
command line:
java -cp build/nekohtml.jar;build/nekohtmlXni.jar;build/xml-apis.jar;build/xercesImpl.jar;build/xercesSamples.jar;build/js.jar org.mozilla.javascript.tools.debugger.Main filename.js

Note: Some further updates to env.js will be necessary to get it to run the html parser outside the Rhino Debugger App.