# [ExI] Reverse discrimination

rex rex at nosyntax.net
Mon Sep 14 20:50:21 UTC 2015

```In the US, disproportionate numbers of employees in a job has been
_prima facie_ evidence of discrimination since the 1971 Griggs v Duke
Power SCOTUS decision.

The post below shows that disproportionate representation may not only
NOT be evidence of discrimination in the obvious direction, it may be
evidence of _reverse_ discrimination.

It goes like this: the employer screens applicants from two groups in
some way that is thought to be correlated with job performance,
height, say. Those applicants who score above a minimum height are
hired.

If the two groups (matched on everything else) have the same mean
score, the expected fraction of hires in Group A has the same value as
the fraction of Group A in the population. For example, suppose the
population is 20% Group A and 80% Group B, and 500 people are hired.

If the groups are equally matched on the test the expected result of
random testing is 100 in Group A and 400 in Group B. Suppose the
actual number hired is 120 in Group A and 480 in Group B. Is this
result consistent with the null hypothesis that there is no
discrimination? If not, is it consistent with Group A being
discriminated against? (There are far fewer Group A than Group B
employees, after all.)

The R code below uses Monte Carlo simulation to estimate a confidence
interval for the number of hires from Group A.

We assume heights are normally distributed. In reality, heights are
not exactly normally distributed, but we'll assume that they are
because the goal is to illustrate the method, and the deviations from
normality are unlikely to be large enough to alter the conclusion
significantly.

Data from:

Assume females mean height is 165 cm, sd 6.0, male height 177 cm, sd 7.
(Change to suit local population.)

Suppose Sandblowers Co has 500 employees who have been randomly selected
from matched males and females with the only difference being height.
The sole criterion for hiring is a minimum height of 170 cm. Males and
females have respective means and standard deviations of 176 (7), 165 (6).

120 employees are female. If employees are randomly selected by a
minimum height of 170 cm alone, does this number suggest discrimination
has taken place? For, or against, females?

We can simulate the hiring process by sampling from the two groups and
accepting those who meet the minimum height. We sample until 500 people
have passed and then count the people in each group.

The R code below implements a simulation of selecting 500 employees
1000 times. valA is a 1000 x 500 matrix of group A heights, and
valB is a 1000 x 500 matrix of group B heights. Both are initialized
to NAs. When an employee is added the corresponding NA is replaced
with the height of that employee. At the end the number of non-NA
entries in each row is the number of people hired in that hiring cycle.
Since the hiring cycle is run 1000 times, the mean number hired is a
good estimate of what should happen if there is no discrimination.

Using the given values, the mean number in group A is 100.0 and
the standard deviation is 9.16. As the attached graph shows, the
distribution is approximately normal, and a 95% confidence
interval for the number in group A is (82.04 - 117.95).
The number of group A employees (120) is outside the CI, so we
reject the null hypothesis that hiring is by height alone. Since
there is less than a 5% chance of getting 120 in group A, it's
likely group A is favored in the hiring process.

Even though there are far more (380) in group B than in group A (120),
it appears group B may be discriminated against.

It's important to note that a result outside the 95% confidence
interval does not _prove_ there is discrimination. All it does is show
that the observation is inconsistent with the null hypothesis. In
fact, such a result is expected in about 5% of companies that do _not_
have discriminatory hiring practices.

See attached graph for height distribution of employees.

Paraphrasing Churchill, most who stumble over this (perhaps
astonishing) truth will pick themselves up and hurry off as if nothing
ever happened.

Data from:

Assume females mean height is 165 cm, sd 6.0, male height 176 cm, sd 7.

options(width=160)
N1    = 500                         #number in sample (employees, etc)
reps  = 1000                        #number of repetitions
meanA = 165                         #mean value of group A
sdA   = 6                           #SD of group A
meanB = 176                         #mean value of group B
sdB   = 7                           #SD of group B
minC  = 170                         #minimum criterion (height, etc)
valA = matrix(data = NA, nrow = reps, ncol = N1, byrow = TRUE)
valB = matrix(data = NA, nrow = reps, ncol = N1, byrow = TRUE)

for (i in 1:reps){
cnt = cntA = cntB = 0             #count of those who qualify
###################### > draw individuals from population <#####################
while (cnt < N1){                 #draw until we have N1 who pass criterion
mf = rnorm(1)                   #+ for group A draw, else B
if (mf > 0){                    #drew from group A
ap = rnorm(1, meanA, sdA)     #random A height (it's much faster to generate
#vector rands, but the code is easier to read this way)
if (ap >= minC){              #met criterion
cntA = cntA + 1               #bump group A count
cnt = cnt + 1                 #bump totol count
valA[i, cntA] = ap            #record A value (height, etc)
}
}else{                          #drew from group B
ap = rnorm(1, meanB, sdB)     #random B (height, etc)
if (ap >= minC){              #met criterion
cntB = cntB + 1               #bump group B count
cnt = cnt + 1                 #bump total count
valB[i, cntB] = ap            #record B value (height, etc)
}
}
}
}
aA = nA = nB = aB = numeric(reps)
for (i in 1:reps){
aA[i] = mean(na.omit(valA[i,]))
nA[i] = length(na.omit(valA[i,]))
aB[i] = mean(na.omit(valB[i,]))
nB[i] = length(na.omit(valB[i,]))
}

mean(nA)
# 100.469
sd(nA)
# 8.784658
mean(aA)
# 173.3475
sd(aA)
# 0.2929211

mean(nB)
# 399.531
sd(nB)
# 8.860406
mean(aB)
# 178.4038
sd(aB)
# 0.2725339

#A 95% confidence interval for the number of As is:
#(100.8 - 1.96*8.907, 100.8 + 1.96*8.907) = (83.34, 118.26)

plot(function(x) dnorm(x, mean=100, sd=9), 60, 150, main = "Density of Group A", xlab='Number', ylim=c(0, 0.06))
lines(density(nA))

print.noquote(c('Mean, SD for A & B', mean(aA), sd(aA), mean(aB), sd(aB)))
noquote(c('Observed As, Bs, fraction A', nA, nB, nA/(nA+nB)))

plot(density(na.omit(valA[1,])), xlim=c(165,195), col='red',
main= 'Fractions of A & B Meeting Height Criterion', xlab='Height (cm)')
lines(density(na.omit(valB[1,])),col='blue')
abline(v=170)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: groups-0.05.png
Type: image/png
Size: 46255 bytes
Desc: not available
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20150914/2ebf9258/attachment.png>
```