Writer now supports XHTML* (emphasis on the asterisk)
Here’s a post for the markup experts out there.
For the latest version of Windows Live Writer, we’ve put a bit of work into generating more XHTML-friendly markup. What, specifically, have we done?
First of all, we created a separate code path for outputting XHTML from the WYSIWYG editor. You can specify on a per-blog basis whether you want XHTML or HTML. Also, when you add a blog (or “Update Account Configuration” on an existing blog), we’ll use the doctype on your blog homepage to choose a default code path; if the string “xhtml” appears in your doctype, you’ll get XHTML by default.
So what do we do different when the XHTML code path is used? Only the following.
Ensure that post bodies contain well-formed XML fragments (usually).
This addresses the top complaint of WLW-generated markup: that <br> and <img> tags are not “closed”. This is really the only cardinal sin in XHTML–that is, well-formedness errors are the only ones that are supposed to be brutally enforced by compliant user-agents, and so far the same seems to be true for XHTML-native blog servers.
Why “usually”? If you publish while in HTML Code view, we don’t touch the markup–we assume you know what you are doing. If you (or your plug-ins) insert elements that don’t actually exist in the HTML spec (such as <lj-cut>), we don’t attempt to fix them up.
Make a reasonable attempt to generate validating Transitional XHTML markup.
Anywhere we were spuriously generating non-validating Transitional XHTML, we fixed it. By “spuriously” I mean places where the invalid XHTML was not dictated by HTML compatibility concerns and making the fix did not involve serious amounts of design or implementation work. (Our built-in table editor is one big area where we punted, such is the complexity of that body of code.)
Use numeric, not named, entities.
Mozilla behaves thusly, as should all compliant XHTML processors:
Externally defined character entities other than the five pre-defined ones (
') are only supported if the document references a public identifier for which there is a mapping in Mozilla’s pseudo-DTD catalog and the document has not been declared standalone.
So we are careful to avoid any other entities. (Actually,
' is not a valid entity in HTML, so we avoid that one too.)
Some minor whitespace differences.
This is just an implementation detail, but we break/indent differently in XHTML than in HTML. The XHTML indenting looks a bit prettier.
I know of at least one bug in the XHTML mode that is not present in HTML mode: line breaks and other whitespace are not appropriately preserved within <pre> and <xmp> tags. That one will have to be fixed in our next release.
We don’t have any illusions that the steps we’ve taken so far will satisfy all of the hardcore XHTML-philes out there. If you’re a markup snob and you know it, I encourage you to give the latest version of Writer a spin and let us know exactly what you think. (Preferably in our MSN Group, as it’s much easier to have a discussion there than in my blog comments.)
Filed under: Windows Live Writer, XHTML | 8 Comments