A Case For Validating HTML

My 2-year old Samsung LCD monitor died on me a few weeks ago, and little did I know that this would lead me down the path of investigating an issue and looking through Firefox' HTML parser code.  But that's what I did, and my experience simply reinforced my belief that HTML markup used on websites need to be validated.

So the story begins simply enough with my LCD monitor going black on me.  I wanted to first check to see if warranty was still available, and diligently went on Samsung's website.  I proceeded to create an account and then did a check on my specific model and serial number.  Fortunately for me, my monitor still had a few months worth of warranty remaining and I began creating a service request online.  Things seemed to be working well for me, and after registering my monitor I proceeded to create a service request for it.  That's when I started having problems.  Upon clicking on the Product Service Request link I was presented with....nothing!

Samsung Service Request Page

The page should have displayed a service request form similar to the one below:

Samsung Good Form

The first screen shot was taken from Firefox 3.6 and the 2nd one was taken from IE 8.  So I temporarily put aside my quest to have my monitor serviced and began investigating why Firefox was not displaying the service request form.  I turned to my trusted Firebug and examined the console, where I saw the error: document.repairForm is undefined.  I then examined the HTML within Firebug and noticed that the page I was loading contained an IFRAME and its source loaded a page with a script in the HEAD and a hidden form with ID repairForm in the BODY.  Examining the script, I saw the following code:

	document.repairForm.submit();

So the error made sense now.  The script being in the HEAD, it was being parsed and executed before the rest of the IFRAME page was loaded and the DOM built.  Since it referenced the form before it was available in the DOM, the script generated an error and the form was not submitted.  The submitted form data was used to retrieve a 2nd page into the IFRAME - in this case a service request page where a customer could fill out a form for their particular Samsung device.

So that explained the behavior in Firefox, but why wasn't I seeing the same thing in IE?  By loading the same page in IE, I noticed that it was keeping the script after the form and therefore when the script was executed the form would already be available in the DOM.  Examining the HTML source for the IFRAME in both Firefox and IE, I confirmed that the web server was sending a partially complete page with no HTML, HEAD, or BODY tags.  I also confirmed that the form markup was in fact appearing before the script code.  So why was Firefox changing the document order of the elements and causing the page to break?

I was curious to know what Firefox did when it was loading an page, so I proceeded to download the source code for v3.6.9 and setup my development environment to compile a debug version.  By stepping through the code as it processed a page similar in structure to my problem page I saw that under certain conditions scripts could be re-arranged relative to its original position in the source document if the HEAD was implied.  In this particular case, since the form contained hidden input elements it was considered misplace and would not trigger a BODY to be created.  The script following it would trigger a HEAD to be created and would be moved there (recall that the original source did not contain a HEAD or BODY element, and therefore those would need to be inferred).  Once Firefox reached the end of the document it would insert the misplaced content (i.e. the form previously seen) in a BODY element it would create.

I could have saved myself a bit of trouble by searching Firefox' bug database.  It turns out this particular problem had already been identified and logged under the following bug entry: http://bugzilla.mozilla.org/show_bug.cgi?id=178258 It was still interesting to look under Firefox' hood to investigate this problem.  One could argue that the problem lies with the browser; however, looking through the code I've come to appreciate how complex parsing a document can be - especially those that are not compliant to standard.  There are simply too many edge cases to guard against and leaving it to the browser to infer structure is not prudent and can lead to problems as with this particular website.  I've since notified the site's webmaster, but have not heard anything yet.  I checked the website recently, and it was still broken at the service request page.  frown

If you'd like me to investigate a particular problem you're having, just drop me a line.  Until then, keep IT simple!