⇐ Blog

Em-Dashes and You

Why should you dismiss any writing that looks like real effort was put in?

May 15, 2025

Writing has a long history. Typesetting, the practice of carefully arranging characters on a written document, is considerably newer than writing yet by now also has a long history. Where letters are what generally signify the sounds that are made when speaking words, marks are used to dictate pacing and sentence structure. I may not vocalize a comma when I speak, but I certainly speak one nonetheless. One radically simple yet very important mark is the dash, of which contemporary writers generally are familiar with three varieties.

There’s your run-of-the-mill dash, which computer systems also know as the “minus” sign when writing mathematical formulæ. There is the en-dash, often used for number ranges such as specifying that I am 90–98% fed up with terrible media literacy advice. Finally—and I’ll put some in this sentence to illustrate what it is—we have the em-dash, the panacea for any writer who cannot help but make asides yet who really doesn’t want to litter her prose with unseemly parentheses.

This article is not about dashes. It’s not about the ordinary dash, or the en-dash, and it isn’t really about that which has recently been declared public enemy number one among paranoid readers who wish to avoid reading text regurgitated by a large language model: the em-dash.

I am alarmed by how easily a certain line of advice is spreading, which advises would-be skeptical readers that any text which contains em-dashes must necessarily originate from ChatGPT or Copilot or whatever else¹ the kids are using nowadays.

To put it bluntly, this advice is nonsense. It ignores how LLMs are trained, how they generate text, and it ignores the ends that LLM-enabled spammers will go to evade detection.

Many document-authoring applications will automatically convert dashes to en- or em-dashes when contextually appropriate. Other computer systems may have means of typing an these dashes. In my case, I hit “alt car” (alternate character) on my keyboard followed by the dash key three times to get an em-dash, or two dashes and a period to get the happy medium-sized en-dash. Regardless of how the characters are generated, documents bearing them find their way onto the Internet where they are then scraped by tech companies and fed into a machine learning algorithm which produces the large language model that powers ChatGPT and its ilk.

In other words, the em-dash needed to already be out there before ChatGPT could ever hope to output it. People needed to write things which contained em-dashes and post them online.

The large language model is statistical model of textual language, from words² to sentence fragments, entire sentences, and more. Text which is output by a large language model represents a statistically likely response for whichever prompt which has been given. The statistical nature of the LLM is the key to understanding its weakness. Even if all text that went into training the model was verified to be truth, the text which the LLM produces will only ever be a statistical approximation of writings which reflect the truth. The model may easily omit a “not” where one should be, or emit a “not” where one should not, which the model simply can’t catch because it is not made to reason. This aspect of LLMs is why so many are rightly skeptical, and understandably paranoid of reading text which was generated by a computer program powered by a statistical model, as opposed to a person’s unique insights which have been informed by their own experiences.

The source of conflict is this: because em-dashes are more likely to appear in certain contexts—namely, documents hosted online or on websites which can afford to pay an editor who would care about such things—that their presence elsewhere is an indicator that nefarious LLMs are afoot. Like all heuristics, it’s bound to make mistakes. What’s particularly irksome about this one however is that by the time the advice makes its way into the open, spammers will have already caught wind and adjusted their methods in kind. While the spammers update their methods, the advice continues to circulate, repeated by tired and impatient readers looking for any way to minimize their odds of finding themselves at the wrong end of the slop pipeline.

Someone like me—who intentionally uses em-dashes because they’re cool—faces quite a conundrum. I can either continue to write in such a way that my prose is more readable, or I can lose some sophistication. This conundrum is why I’ve written this post. It is my hope that at least a few eyes gaze upon this text and reconsider whether it’s worthwhile to reproduce a bad heuristic.

I really don’t want to be lumped in with the scam du jour simpy because I don’t fear the power of the em-dash.

I don’t care. ↩︎
It has been pointed out to me that LLMs do not actually use bigrams or trigrams and instead use a whole-word English tokenizer. The phrasing here has been updated to reflect this. Thank you, d@nny disc@. ↩︎

Victoria Lacroix

Em-Dashes and You