Tuesday, July 08, 2003 :::
Some Israeli computer scientists have written software that determines the sex of an author based on a sample of writing. They claim 80% success.
For example, Koppel's group found that the single biggest difference is that women are far more likely than men to use personal pronouns-''I'', ''you'', ''she'', ''myself'', or ''yourself'' and the like. Men, in contrast, are more likely to use determiners-''a,'' ''the,'' ''that,'' and ''these''-as well as cardinal numbers and quantifiers like ''more'' or ''some.'' As one of the papers published by Koppel's group notes, men are also more likely to use ''post-head noun modification with an of phrase''-phrases like ''garden of roses.''
It seems surreal, even spooky, that such seemingly throwaway words would be so revealing of our identity. But text-analysis experts have long relied on these little parts of speech. When you or I write a text, we pay close attention to how we use the main topic-specific words-such as, in this article, the words ''computer'' and ''program'' and ''gender.'' But we don't pay much attention to how we employ basic parts of speech, which means we're far more likely to use them in unconscious but revealing patterns. Years ago, Donald Foster, a professor of English at Vassar College, unmasked Joe Klein as the author of the anonymous book ''Primary Colors,'' partly by paying attention to words like ''the'' and ''and,'' and to quirks in the use of punctuation. ''They're like fingerprints,'' says Foster.
There's an interesting sideline to this story, too:
When the group submitted its first paper to the prestigious journal Proceedings of the National Academy of Sciences, the referees rejected it ''on ideological grounds,'' [lead author] Koppel maintains. ''They said, 'Hey, what do you mean? You're trying to make some claim about men and women being different, and we don't know if that's true. That's just the kind of thing that people are saying in order to oppress women!' And I said 'Hey-I'm just reporting the numbers.'''
...
Critics charge that experiments in gender-prediction don't discover inalienable male/female differences; rather, they help to create and exaggerate such differences. ''You find what you're looking for. And that leads to this sneaking suspicion that it's all hardwired, instead of cultural,'' argues Janet Bing, a linguist at Old Dominion University in Norfolk, Va. She adds: ''This whole rush to categorization usually works against women.'' Bing further notes that gays, lesbians, or transgendered people don't fit neatly into simple social definitions of male or female gender. Would Koppel's algorithm work as well if it analyzed a collection of books written mainly by them?
Koppel enthusiastically agrees it's an interesting question-but ''we haven't run that experiment, so we don't know.'' In the end, he's hoping his group's data will keep critics at bay. ''I'm just reporting the numbers,'' he adds, ''but you can't be careful enough.''
Deborah Tannen is also mentioned, saying that she has conducted (less scientific) explorations into "men's magazines" and "women's magazines", finding much more of a difference in the writing between the two genres than between pieces within the same genre written by authors of different sexes. In other words, the audience a piece is written for matters more than the sex of the author does.
Anyway, the paper is available for download at Moshe Koppel's website.
::: posted by Steven at 10:33 AM
|