Understanding digital line breaks

Carriage return, line feed, newline, <br>, hard and soft breaks and all the line break mumbo-jumbo you can think of.

Bence Meszaros
5 min readJun 14, 2021
Image by Syed Ali on Unsplash.

Probably the most confusing thing when it comes to digital word processing is that there is no line break character. Even though a line break character would make the most sense today, this is the only thing that is missing. So what do we have instead? Well, a bunch of legacy concepts.

Carriage return and line feed

Digital text editors were the natural evolution of typewriters and as such, they borrowed heavily from their predecessors, including the way they handled line breaks.

When a typewriter reaches the end of the line, it has to perform two specific tasks before any more characters can be added: it has to go back to the beginning of the line (carriage return) and it has to move the paper up one line (line feed).

Early character sets like ASCII decided to keep these concepts separate and defined an individual carriage return (CR) and a line feed (LF) character instead of a unified line break character. We can argue that there are some exotic edge cases where these carriage movements might be useful separately in the digital world, but they are far less common than their combination — the actual line breaks.

And so, in the absence of a true line break character, early systems began to choose different solutions to denote their line breaks. Mac systems chose CR, Unix chose LF and Windows chose CRLF. But I am fairly sure that there were other systems out there using LFCR just to watch the world burn.

Today, all major platforms understand both LF and CRLF as line breaks, but the situation is still chaotic. Just take a look at any keyboard. Since there is no line break character, there is no line break key. Instead, US keyboards use the term return, while ISO keyboards use the line feed sign (
) to denote the line break functionality.

Newline

Newline is another name for line feed. The confusion stems from the fact that while carriage return is usually represented with the \r escape sequence in coding languages (as in return), line feed is represented with the \n escape sequence (as in newline), and not with \f (feed), or even \l (line).

Even more confusing that newline became a colloquial term denoting the system specific line break character(s), referring to LF on a Mac and CRLF on Windows.

The <br> tag

This confusion only grew with the invention of the web, or more precisely the invention of HTML. The initial idea was that line breaks are only used to format HTML code inside an editor and so browsers were designed to simply strip all line break characters from screen. But as the web quickly started to evolve, it became clear that this idea doesn’t work and there are in fact legitimate reasons to preserve line breaks on screen.

But instead of rethinking the stripping mechanism, a new <br> tag was added and the Unix-Mac-Windows CR-LF-CRLF incompatibility issue was recreated. Only this time it was the <br> tag versus the system default line break character(s) that had to be converted back and forth to preserve formatting.

Hard and soft breaks

The <br> tag obviously wasn’t a real solution and eventually the stripping mechanism had to be redesigned. Ironically, this was done using the CSS (or more precisely the white-space CSS property), yet another new tool on top of all this mess. And as you’ve guessed, this also introduced new issues.

Take a look at this example:

<p>This is| a visible text.</p>

Imagine that this is a web-based text editor and the cursor is the vertical bar. What would happen if you were to hit enter?

This?

<p>This is</p>
<p> a visible text.</p>

Or this?

<p>This is
a visible text.</p>

If p tags have a top margin, these two will look quite different on screen. So how can we resolve this ambiguity? With the introduction of yet another new concept: hard and soft breaks. In some editors (e.g., Squarespace or Medium) if you simply hit enter, it will add a hard break (first example) and if you hit shift+enter, it will add a soft break (second example).

Needless to say, this is again a bad idea born from necessity that just adds more complexity to an already overcomplicated but actually very, very simple problem. But if everything was a bad idea so far, what would be the ideal solution?

Line break character

Unfortunately as I said before, there is no universal line break character. But let’s imagine for a second that there is one and forget about everything we’ve talked about. What would it look like to use them?

If you think about it, storing text on a computer doesn’t even need line breaks. Wrapping text is just a visual necessity because the medium — whether it is a piece of clay, a sheet of paper or a digital screen — always has size limits. As long as our text is just a stream of characters there is no need for them.

So what happens if we output this stream to a display? Characters begin to fill the screen in a straight line until there is no more available space. At that point, the computer inserts a line break character and the stream continues on the second line. This process is repeated automatically whenever the characters exhaust the available space. This is what happens in any modern word processors or with inline boxes in HTML. Let’s call these characters implicit line breaks.

But what if we want to hard-code some of our own line breaks? It is easy too. Line break characters are just like regular characters and they can be inserted anywhere. The software will understand them and wrap the text accordingly. We can call these explicit line breaks.

And that’s it. Whenever the available space is exhausted, the software automatically inserts a line break and whenever it encounters a line break (explicit or implicit), it wraps the text accordingly. After all, every line break is exactly the same, they look the same and they work the same, regardless of the platform, the protocol or the software being used. Only if we deconstruct them into line feed and carriage return can they become ambiguous.

Conclusion

Text is one dimensional and in itself doesn’t need line breaks. Line breaks are only needed to display text on a medium and even then, we can recognize that every line break works exactly the same way, as long as they aren’t separated into line feed and carriage return. The only difference we should focus on is whether a line break was inserted explicitly by the user or implicitly by the display software.

Unfortunately there is no universal line break character that would achieve this simplicity and instead we are stuck with legacy concepts coming from the era of typewriters, old printers and the first web browsers. The best we can do is stick to using only the CRLF character sequence and/or the LF character to denote line breaks and avoid everything else, especially those solutions that are more abstract than characters. It’s not easy to do on the web, but fortunately not impossible either.

--

--

Bence Meszaros
Bence Meszaros

Written by Bence Meszaros

Lead Software Engineer, Fillun & Decketts

No responses yet