An Amazon product review with ellipsis … like this … and a lot of extra punctuation??!? That is, like, so likely to be sincere. I totally mean that, except that I don’t: Those two tics are among telltale signs of hostile irony that, when seen together, allowed a computer algorithm to correctly detect sarcasm in more than three-quarters of the examples it analyzed—as described in this paper (pdf).
Before you say “like I care,” know that the paper, by Oren Tsur, Dmitry Davidof and Ari Rappoport, is a breakthrough in artificial intelligence. Because sarcasm has always been, you know, soeasy for computers to grasp. How are they supposed to “get” when the explicit meaning of an utterance is actually the opposite of what the speaker intended? On every level—social setting, meaning, grammar, intonation—sarcasm demands so much mental processing and detailed knowledge that even people have a tough time with it. (A couple of different new punctuation marks, one of which is the thumbnail illustration for this post, have been proposed to help them.) No wonder some experts were surprised that an artificial intelligence could attain even a 75-percent-correct score for sarcastic phrases.
As reported in The Jerusalem Post, three authors’ SASI, or “Semi-supervised Algorithm for Sarcasm Identification,” is trained to recognize patterns that are linked to sarcastic meaning (like “…”, excess punctuation marks (!*?!!), lots of CAPITAL LETTERS for emphasis, and the use of certain key phrases like “I guess” (as in “I guess I’m naive: I didn’t think we needed computers to figure this out”). Alone, each indicator isn’t a very good predictor, but when they occur together, then, like duh—sarcasm’s likely in the air.
The researchers developed their taxonomy, and their training examples, from a collection of 66,000 product reviews on Amazon.com. The experience led them to some interesting conclusions about what provokes sarcasm. If a product is much-hyped, expensive and simple, they say, reviews are much more likely to be snarky. Presumably that’s because heavy advertising and a high price make people outraged, while a simple function makes it easier to tell if a product is a failure—and easier to sneer at (“Silly me, the Kindle and the Sony eBook can’t read these protected formats. Great!”).