Amazon.com recently introduced “Text Stats” for many of the books they sell; these stats purportedly record the readability and complexity of a book. Laura Kinsale found out, somewhat to her consternation, that Shadowheart scored as both very easy and less complex compared to other books. Being an enterprising woman, she looked up the stats for another one of her books.
For My Lady’s Heart, with its notorious Middle English dialogue, which you either love because you’re a nerd or hate because it interferes with your readability? It scores just about the same. Now, that’s just mind-boggling.
After being alerted to this, I did some digging around to see how other books compare.
Pride and Prejudice scores as somewhat harder to read and somewhat more complex than the two Kinsale medievals. OK, that’s a fair cop.
Let’s try a more modern novel, then—a nice, big meaty one. Like, say, The Corrections. Turns out it scores as a bit easier than P&P, but still harder than a Kinsale.
At that point, inspiration hit: let’s try some books that are quite notoriously difficult to read. The ones that cause college students to gnash their teeth and grip their heads in agony. So I looked up the stats for your favorite wet fart connoisseur and mine, James Joyce. Specifically, Ulysses. According to the magic numbers, it’s easier to read than both The Corrections and Pride & Prejudice—which brings up the question: IN WHICH FREAKING UNIVERSE would that be true?
(Answer: probably in the same universe where all those non-Muslim sheikhs kidnap shy British secretaries for nefarious erotic purposes.)
But the kicker—the one book whose stats made me laugh and laugh—was no other than The Sound and the Fury. It’s apparently MUCH easier to read than Shadowheart and For My Lady’s Heart. In fact, it scores as being so easy, I’m surprised kindergarten teachers aren’t substituting Dick and Jane with The Sound and the Fury.
I’m thinking they might need to work a little bit more on the algorithms that determine ease of reading and complexity.


Sound and Fury is easy? Cool, dude.
It seems to me that perhaps, just perhaps, the “largest retailer of books” is staffed by folks who don’t read much? Because, erm, wouldn’t you want to test this algorithm out *before* unleashing it on your drooling public?
There’s a couple of textbooks I just have to go enter into this thing….
I couldn’t find the feature on Amazon and didn’t have the patience to keep looking for it, so I couldn’t test it out myself; HOWEVER, I have sort of a theory about how it might work. Some 15 years ago, my mom came across something like an IQ test that analyzed writing samples and spit out the writer’s relative IQ. From what I was able to find out, it took into account the length of sentences in number of words, the number of syllables in each word, and the ratio of polysyllabic words to the monosyllabic ones in a single sentence, and it somehow averaged it all out and came up with a number.
My mother’s sentences were shorter and simpler and used shorter, simpler words, so she scored less than I did … because I sometimes write freaky, long-winded crap like the paragraph above.
Anyway, I *think* Amazon’s text stats feature might work the same way. I don’t have a copy of The Sound and the Fury on me, but I’m re-reading passages from it online, and I see that though many of the sentences are long, most of the words in any one sentence are one or two syllables long—so the sentences are less dense, which might be considered easier to read.
It’s just an uneducated guess, though. I haven’t seen the Amazon feature in action myself, and I haven’t seen the analog writing test in over a decade.
Silly me. I should have tried some of those links you provided.
Candy, maybe you were looking at the stats upside down. Try flipping your computer over.
No, really, I’m glad to see that the whole thing is just royalled f’ed up instead of some kind of system where books coded as romance novels are automatically given Dick and Jane level status whereas Great Literary Works are rated so far above mere mortals only the gods can comprehend their greatness.
I do read literary stuff. I’m not a snob. Honest. Although, I think I’ll pick up that The Sound and The Fury for my beach read this summer.
You’re exactly right, April—the algorithms that determine readability and complexity go only by rather crude measures such as number of syllables and length of sentences. It doesn’t take into account, say, archaic language, difficult sentence structures, difficult concepts, complicated narrative structures, theme, complex subject matter, etc.—all of which impact a book’s readability. Books like The Ghost Road, The English Patient and White Noise score as being easy reads, when I don’t think these books were particularly easy to read at all.
And all this still doesn’t provide any sort of a gauge for whether a book is any good or not, of course.
LOL, no no no no, it wasn’t a blow to my ego!
I don’t WANT my books to be difficult, but some people say they are…so that’s why I was perplexed to find them on the easy end of the scale.
So now I can just say, look, they’re easier to read than government manuals when I get a complaint. 😉
Laura: DOH! See, I approached this with the whole “Oh, man, people are going to use this as a tool to prove that romance novels are easy to read and therefore fit only for cretins” angle, which in turn informed the way I read and interpreted your post. See, kids? Reading is a DYNAMIC process! Apologies for the misreading. I’ll strikeout the text in the entry.
It´s a shame this method isn´t applied to technical texts. If I´d known the difficulty in reading Smith/Van Ness/Abbott’s Thermodynamics, I´d have cheated on the “open book final test” instead of checking it out in the library and wasting precious time looking for a definition of Entropy.
Oh, wait. Maybe it would be rated as Easier/Fewer.
How do you check this? I’m dying to feed “Finnegans Wake” and “The Golden Bowl” into the maw of whatever processor calculates these statistics. Those are the two that have defeated me & thrown me down to the mat repeatedly.
O.K., back to my job, where our goal is consistently attaining the fourth-grade reading level …
Oh sure, The Sound and the Fury is easy to read.
The problem is, by the time you finish it, you’re clinically insane.
“I looked through the fence and they were hitting. And she was hitting. And he was hitting …”
Yes, I see how a simplistic formula better suited for selecting ESL textbooks would determine this as an “easy book.”
I often wish Amazon would just chill the F out and stick to selling books. All their other endeavours are distinctly marked by conjecture and bullshit.
(damn I’m ill-humoured today.)
It looks to me like they’re using a subset of the Fleisch-Kinkaid “readability” tests.
They’re stupid; they were designed to score kid’s books for level of difficulty. The formulae are tailored to support assumptions about the nature of reading, and how kids read, with a few little options added after the fact to make the scores attracitive to businesses. Essentially, Amazon is doing counting of letters, words and punctuation.
Same thinga MSWord does when it measures “readability.”
This sounds like the Accelerated Reader list, the Children’s librarians’ nemesis. Teachers out there may be familiar with the program, which uses algorithms to measure the number of words, syllables, letters and/or whatever to generate an appropriate reading level for a children’s or young adult book. Then the children are assigned to read books on their level or above, and after reading they have to take a test on the book. If they pass, they get a certain number of points, and then, though I’m sure it varies from classroom to classroom and school to school, usually there’s some sort of reward system in place. Sounds good right, the kids are challenging themselves and they actually have to read in depth to pass the test right?
In theory it’s not a bad idea, but in reality you have a situation where a lot of books are graded inappropriately. The Old Man and the Sea might have the same grade level as a Berenstain Bears’ book. Don’t quote me on that b/c I’m not at work to check, but those bears are given a ridiculous level, and then kids want 5 Berenstain Bears books instead of something that might be more interesting and appropriate to them, age-wise. There’s also this slavish adherence to grade level. “I must have a 4.5.” But this book is really good, you’ll like it, and I’ve given to kids who don’t really like to read and…yadda yadda…” “But is it a 4.5?” “Well, no, it’s actually…” “Nope it’s gotta be a 4.5, how about one of those Berenstain Bears books?” It’s limiting to say the least.
Sorry if you use the AR program and you’re offended, I just hate it so much, b/c it so often prevents me from getting kids the kind of books that would make them better readers by virtue of the fact that they’re interesting and well-written. C’est la vie. Rant endeth.
That’s interesting, Devon, ‘cause I never knew how that was done. I do remember back in 1963 arguing with my 3rd grade teacher because I wanted to check a book out from the school library that she said was over my reading level. I even remember the book and found it in a search a few moments ago:
“Gay-Neck: The Story of a Pigeon
Dhan Gopal Mukerji, illustrated by Boris Artzybasheff—The story of the training of a carrier pigeon and his experiences as an allied messenger in World War I.”
Wowza, how weird is that, to remember that book over 40 years later? Anyway, I finally prevailed, after promising to do a report on the book to show I understood it. After that she never argued with me again about what I wanted to read.
Keep fighting to push kids to stretch their reading boundaries past the safe point. They’ll thank you for it later.
I checked out the text stats on Foucault’s Pendulum, and its numbers are, IMO, in no way indicative of how challenging it is to read. Okay, so a ninth grade student would be able to read most of the individual words in it—the English ones, anyway—but there are no guarantees that she’d have any clue what the hell was going on. 🙂
Machine analysis of text is much like machine language translation. The engine can identify words, look them up in a list, and maybe—if you’re lucky—differentiate between the same spelling used in different contexts. But understand what it’s reading? Hah.
Oh. My. Sweet. Jesus.
I don’t want to live in a universe where James Joyce is considered easier to read than Jane Austen. I may as well live in a country where I don’t know how to speak the language.
What are you talking about Nicole? I’m relieved that all this time everyone was lying and that “Ulysses” is a snap. I was wrong to fear it all this time so I’m gonna go pick it up right now (and it won’t even be the annotated edition because it’s so easy!).
Yeah, I can see how the amazon eezyreadulator would conclude that Ulysses was an easy read. Lots of repetition of monosyllabic words like ‘I’ and ‘yes’. Suitable for kids with a reading age of 5. ‘Course, Joyce didn’t bother much with punctuation, but then neither do many small children, so that seems fair.
*sigh* This is what happens when you try applying math to art. I’ll bet there’s an algorithm that rates paintings, too.
For some complexity ratings that are done by people who’ve actually read the books, check StoryCode. Though browsing through there, it appears that the “readability rating” has occasionally been confused with whether the person doing the coding liked the book or not. Still, I find it a much better marriage of math & art.
There’s an historical with Middle English dialogue????
Rushes out to buy…
Jackie Collins’ “Hollywood Divorces” – a book which, I believe, gave me brain damage because it’s so mind bogglingly idiotic – appears to be one of the easiest novels in the English language. Good, now I’m happy.