[extropy-chat] The Gender Genie - analyzing writing styles

Amara Graps amara at amara.com
Mon Dec 1 14:42:56 UTC 2003


Ciao,

I ran across this online program that tries to analyze the gender
of the writer of a writing selection. I am writing quite a lot these
days, and I was curious about what the "gender genie" would
calculate for my gender.

http://www.bookblog.net/gender/genie.html

"According to Koppel and Argamon, the algorithm should predict the
gender of the author approximately 80% of the time."

However, in my case, it was wrong, 100% of the time. Some other
people experienced the same errors, apparently, because, when I
input my results into their database, I saw:

----------------
"accuracy results

Am I right?
yes
	107582 (67.93%)
no
	50800 (32.07%)
158382 total responses since September 13, 2003"
-----------------

In order to test whether it was a fluke, I tried six different
essays, and got the following scores:


Eternal City Grapsody #1 - Mythology for Transhumans
http://www.transhumanism.com/articles_more.php?id=P84_0_4_0_C
Words: 1767
Female Score: 1922
Male Score: 3452

Eternal City Grapsody #2 - Of Snakes and Immortality
http://www.transhumanism.com/articles_more.php?id=P89_0_4_0_C
Words: 1402
Female Score: 1618
Male Score: 2048

Eternal City Grapsody #3 - The Pause that Refreshes
http://www.transhumanism.com/articles_more.php?id=P94_0_4_0_C
Words: 1016
Female Score: 1188
Male Score: 1585

Eternal City Grapsody #4 - Scales of Man: Adapting Technology to
Transhumans
http://www.transhumanism.com/articles_more.php?id=P380_0_4_0_C
Words: 1682
Female Score: 1746
Male Score: 2777

Eternal City Grapsody #5 - Parmigianino's Golden Transformations
http://www.transhumanism.com/articles_more.php?id=P551_0_4_0_C
Words: 1790
Female Score: 1239
Male Score: 2356

Eternal City Grappsody #6 - Tricksters: Synchronicity, Dirt, and Laughter
http://www.transhumanism.com/articles_more.php?id=P877_0_4_0_C
Words: 1678
Female Score: 2019
Male Score: 2068


The last one was a different style for me, a kind of giggling-out-
loud-style, as if speaking to a group of close friends, but the
algorithm still calculated me to be male.


I am sad about these results and this algorithm, I must say,
for a range of reasons, including:

* Are women 'themselves' when they write, or are they adapting
to the style of the Internet?

* The algorithm seems to have pretty narrow definition of
gender writing style.


How did the algorithm work? The following is a note about that.

Amara



=======================================

http://www.nature.com/nsu/030714/030714-13.html

Computer program detects author gender

Simple algorithm suggests words and syntax bear sex and genre stamp.
18 July 2003

PHILIP BALL

A new computer program can tell whether a book was written by a man
or a woman. The simple scan of key words and syntax is around 80%
accurate on both fiction and non-fiction1,2.

The program's success seems to confirm the stereotypical perception
of differences in male and female language use. Crudely put, men
talk more about objects, and women more about relationships.

Female writers use more pronouns (I, you, she, their, myself), say
the program's developers, Moshe Koppel of Bar-Ilan University in
Ramat Gan, Israel, and colleagues. Males prefer words that identify
or determine nouns (a, the, that) and words that quantify them (one,
two, more).

So this article would already, through sentences such as this, have
probably betrayed its author as male: there is a prevalence of
plural pronouns (they, them), indicating the male tendency to
categorize rather than personalize.

If I were female, the researchers imply, I'd be more likely to write
sentences like this, which assume that you and I share common
knowledge or engage us in a direct relationship. These differing
styles have previously been called 'informational' and 'involved',
respectively.

Koppel and colleagues trained their algorithm on a few test cases to
identify the most prevalent fingerprints of gender and of fiction
and non-fiction. They then set it searching for these fingerprints
in 566 English-language works in a variety of genres, ranging from A
Guide to Prague to A. S. Byatt's novel Possession - which,
intriguingly, the programme misclassified by gender, along with
Kazuo Ishiguro's The Remains of the Day.

Strikingly, the distinctions between male and female writers are
much the same as those that, even more clearly, differentiate
non-fiction and fiction. The programme can tell these two genres
apart with 98% accuracy. This is perhaps unsurprising, given that
non-fiction is more informational and fiction more involved.

Most of the works studied were published after 1975. The Israeli
team now intends to probe whether the differences extend further
back in time - and so whether George Eliot was wasting her time
disguising herself with a male nom de plume - and also whether they
occur in other languages.

References

1.Koppel, M., Argamon, S. & Shimoni, A. R. Automatically
categorizing written texts by author gender. Literary and Linguistic
Computing, in the press, (2003). |Homepage| 2.Argamon, S., Koppel,
M., Fine, J. & Shimoni, A. R. Gender, genre, and writing style in
formal written texts. Text, in the press, (2003).

========================================
-- 

***********************************************************************
Amara Graps, PhD             email: amara at amara.com
Computational Physics        vita:  ftp://ftp.amara.com/pub/resume.txt
Multiplex Answers            URL:   http://www.amara.com/
***********************************************************************
"Life shrinks or expands in proportion to one's courage."
       --Anais Nin



More information about the extropy-chat mailing list