- We design and build extraordinary applications for companies looking to make the next great idea a reality.
- learn more
Using Sarissa to Scrape HTML with Ajax
I've been playing around with more disruptive technology lately: Bookmarklets. What is a Bookmarklet? That's where you bookmark a hunk of JavaScript rather than a URL. It allows you to execute JavaScript within the context of the currently loaded page.
That makes for a different sort of BJAX -- instead of Browser extensions and Ajax, we have Bookmarklets and Ajax. Why disruptive? They let you manipulate a third party's website or application. I'll be publishing some cool Bookmarklets stuff in a few days -- an example that will make use of the dynamic script tag. If an online service, like Yahoo, provides JSON Web services that can be run as scripts, you can use their own services to manipulate their own site. For example, you could include weather information on the Yahoo movie showtimes page by means of a Bookmarklet and Yahoo's weather Web service. That way you never forget to bring an umbrella to the movies if it's raining.
But what if your target site isn't as friendly or as accommodating as Yahoo? Well, these sites do generally still provide structured data services over HTTP -- their HTML webpages, dynamic or otherwise. Once you've used your Bookmarklet to insert the necessary JavaScript into the third-party webpage, you can use a simple XMLHttpRequest to download other HTML from the site and, after appropriate processing, insert it into the target page. It's that "appropriate processing" that can be a bit difficult. Websites generally don't return their HTML with Content-type text/xml, and thus you won't find a nicely parsed XML DOM sitting in responseXML. You'll have to pry the data out of responseText instead.
There are enough cross browser differences and sticky wickets to make this kind of XML processing unpleasant. Enter Sarissa, a very handy cross browser XML processing JavaScript library. It allows you to do things like the following:
var serializer = new XMLSerializer();var doc = (new DOMParser()).parseFromString(xhr.responseText, "text/xml");var content = doc.getElementsByTagName("div")[0];elem.innerHTML = serializer.serializeToString(content);
XPATH can also be used here to select elements, obviously. No, nothing earth shattering here, but enough convenience for XML processing to allow you to focus on writing functionality rather than cross browser support.
Topics: Javascript Libraries
Comments: 1 so far
Leave a comment
About Pathfinder
Recent
- Walk-Through Test Coverage
- Where minimalism fails: The problem with Apple’s less-is-more approach
- jQuery goodness with ASP .NET
- Design Thinking
- Bullseye Diagram
- Roles Testing For Security
- Blackbird takes the pain out of JavaScript logging
- Making GWT JSON not Quite so Painful
- IDEA - preconference workshop 06 Oct 08
- HTML5, Ajax history management, and The Ajax Experience 2008 Boston
Archives
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
- August 2006
- July 2006
- June 2006
- May 2006
- April 2006
- March 2006


hi, can u provide some working example!
it will be very useful.
Comment by sagar, Wednesday, November 29, 2006 @ 1:32 am