What is the least amount of HTML you can serve to a browser?

UIScript DevBlog, Part 12

Bence Meszaros
6 min readSep 2, 2024
tl;dr

Intro

One key feature of UIScript is that we can remove all HTML and CSS code from our projects and focus solely on JS. Well, almost. While CSS can, in theory, be fully eliminated, HTML cannot. This is because browsers can only parse html files automatically, they cannot parse and execute a standalone js file without an html host. If you open a js file directly in a browser (or any other script file as a matter of fact), it will show up as raw text, just like a regular txt file would.

To put it simply, there has to be at least one HTML file in our project. This is a hard rule set by the browsers, an axiom if you will, that we cannot circumvent.

But if we are forced to embed our JS application into an HTML file, then what is the least amount of HTML we can serve to the browser?

The basics

Web developers are typically taught, that a valid HTML file consists of four required parts: a doctype declaration, an html element, a head element and a body element, like so:

<!doctype html>
<html>
<head></head>
<body></body>
</html>

This isn’t actually true at all. According to the HTML Living Standard § 13.1.2.4, the head, the body and even the html element are all optional (more precisely conditionally required, but precision is not exactly the strong suit of this spec).

In this case, this is a perfectly valid html file:

<!doctype html>

When the browser parses this file it inserts an html, a head and a body element automatically so we always end up with the same skeleton as the previous example. This approach is a perfectly valid and standard simplification.

The doctype

This begs the question: can we go even further and remove the doctype declaration as well? Well, according to HTML Living Standard § 13.1.1, a DOCTYPE is a required preamble, meaning that it is officially required, but according to MDN, its sole purpose is to prevent a browser from switching into quirks mode.

Quirks mode is the non-standard, browser specific rendering mode maintained for legacy reasons. In this mode, our document, including stuff that we create through JS, will render in an unpredictable way, causing a wide range of issues from incorrect character encodings to broken layouts.

So, basically, browsers accept so much legacy garbage and broken HTML that you have to explicitly notify the browser when you want to serve a fully standard document. Instead of opting in to legacy mode for old, pre-HTML5 documents, we need to opt out of legacy mode every single time we are serving a proper, standard HTML5 document. If you think this is beyond absurd, you are quite right.

Note: According to w3techs.com, HTML5 is used by 93.5% of all the websites, and HTML5 is used by 97.2% of all the websites that use HTML. So, every single time you create a new HTML document, just remember that the reason you are forced to include this boilerplate is that 6.5% of websites wouldn't have to update their 30-year-old code, even with a single legacy flag.

This leads to yet another question. What exactly is the point of file extensions, MIME types, the Content-type HTTP header and the type HTML attribute, if none of them can be used to differentiate between HTML5 and other document types and we still need to write the document type into the actual document itself? Have you ever considered how absurd it is to serve HTML5 documents using HTTP with a ton of metadata fields but still needing to mess up your actual content with a stupid declaration?

So, apparently neither of this was acceptable:

hello-world.html5
hello-world.legacyhtml

Nor this:

Content-type: text/html5
Content-type: text/legacyhtml

Or this:

<html type="text/html5">
<html type="text/legacyhtml">

Or just introducing a new HTML attribute, or putting this info into the <head> element along with the rest of the document metadata using the type attribute, or using a new attribute, or a new element, or a new element with a type attribute or a new element with a new attribute.

No, they wanted to reinvent the wheel with the document type as well and introduce a new tag, which is not an element, but a node, which is a preamble but also a declaration, which is a completely unique construct within the language and we should only use <!doctype html> anyway, because other values aren’t parsed by the browser as actual websites.

Splendid.

Including scripts

We have removed as much HTML from our projects as possible, now we need to execute our scripts somehow. To do that, we can simply add them to the minimized HTML file:

<!DOCTYPE html>
<script defer src="index.js"></script>

That’s it. When the browser parses this HTML file, it inserts the html, head and body elements, moves this script tag into the head automatically and fetches and executes the index.js file. It is, unfortunately, an extra round trip, but as we discussed before, we cannot bypass the HTML host.

Notice, that we also added the defer attribute to the <script> element. This is needed, because when the browser fetches our script file it doesn’t have an HTML body yet so the script cannot access it. With the defer flag we can tell the browser to only execute the script after finishing building the HTML skeleton. But if our script doesn't need the body tag, we can skip adding this attribute.

Notice also, that the closing </script> tag is mandatory, even when the src attribute is being used. The reason for this is that the HTML5 committee wanted new HTML5 websites to run in old, legacy browsers as well but those browsers had no concept of self-closing script tags and considered everything after an opening script tag as JS code, or in case of elements, as children of the script element. There had been several proposals to resolve this in a far more elegant way, but all of them were ultimately rejected, just like the entirety of XHTML. As a result, the web will carry this (and many, many more) bloat until the end of time.

For some time, I was trying to rationalize this limitation as having some sort of app manifest, a list of assets that my project needs, but HTML is spectacularly bad at this too. Linking style sheets, bitmaps, vector graphics, fonts, scripts and various other files all work with a different element, with a different tag or both. How awesome would it be to have a manifest like this in HTML:

<!DOCTYPE html>
<script url="index.js">
<style url="index.css">
<image url="profile-picture.png">
<vector url="logo.svg">
<data url="user-from-database.json">
<font url="Inter-Regular.otf">

This idea was short lived and I migrated this "manifest" concept to JS as well.

What about JS-in-HTML?

Technically, we could write all of our JS code inside <script> tags instead of linking it from an external file, saving one round trip to the server as well as an extra project file, but then the code would be tightly coupled to HTML and wouldn’t be able to run on its own, in a different environment.

Another “sinister” argument against this is that the goal is to fully separate JS from HTML so that one day browsers could load and execute JS code completely independently, without any HTML host.

And finally, it would also look much uglier, because the two syntaxes of the two languages would mix. But that’s just my subjective opinion, you are free to decide if this is what you prefer:

<!doctype html>
<script defer>
let welcome = "Hello World!";
</script>

Conclusion

Technically, this is a valid HTML file:

But if we want a standard HTML5 file, we need at least this:

<!doctype html>

We can add a script tag, but this won’t work in any browsers:

<!doctype html><script src="ui.js">

Unless, we also include a closing tag:

<!doctype html><script src="ui.js"></script>

And set defer on the script tag to execute after building the skeleton:

<!doctype html><script defer src="ui.js"></script>

That’s a wrap

Thank you for reading this article and making it to the end. If you like what I am doing please consider clapping, commenting and sharing my work with your friends, colleagues and other web enthusiasts. Your support helps me tremendously.

Thank you again, and have a great day.

⬅️ Week 11 — Frankenstein’s display property

--

--

Bence Meszaros
Bence Meszaros

Written by Bence Meszaros

Lead Software Engineer, Fillun & Decketts

Responses (1)