[ExI] newspeak again

Wed Nov 14 01:25:01 UTC 2012

Patrick McLaren <patrickkmclaren at gmail.com> wrote:
>I do something like this in a data mining application I have been working
>on, to observe how grammar changes over time as news sources cover topics
>repeatedly in a series of articles.
>
>Typically, in order to find "unusual" words, or "interesting" words, I plot
>probability_of_at_least_one_occurrence vs. total_occurrence_in_article,
>then use a nonlinear classifier to grab all words with a significant
>occurrence that hug the y-axis.
>
>What I found was that breaking stories contain a large amount of new
>information, while subsequent stories contain almost none in comparison.
>Although this is perfectly obvious in hindsight...

So, are subsequent stories adding little or no new information on top
of the breaking story, or do they tend to have less information about
the story than the breaking story?

In other words, if you had three datasets, called P for past
background information including everything leading up to the event
you're analyzing, B for the breaking story, and S for the subsequent
story:

  Would the novelty of B with respect to P be greater than the novelty
  of S with respect to P?

or

  Is it just that S contains little novelty with respect to P + B?

The former might be interesting.  The latter, as you say, seems pretty
obvious.

-eric