Quantcast
Channel: Minitab | Minitab
Viewing all articles
Browse latest Browse all 828

Online Dating: Is She in the Driver's Seat? A Test of 2 Proportions

$
0
0

The other day I was having lunch with my friend, Bob. Bob was sharing some observations from his recent foray into the world of online dating. 

"For some reason," Bob mused, "lots of women post pictures of themselves in their cars. They take a selfie while they're behind the wheel."

Not Bob

"Huh," I replied. "Sounds like online dating is more dangerous than I thought." 

"What? No, not while the car is moving...while it's parked."

"Oh. That makes more sense. Did you post a car selfie?" I asked.

"No. I don't think guys do that. But I guess I don't really know." 

Realizing we were running late, we discontinued our conversation to concentrate on slurping down our remaining ramen. 

I didn't think much of it, so I was surprised the next day when I got an email from Bob explaining that he had investigated the car-selfie phenomenon a little more and now guessed that only about 5% of guys posted them, while about 25% of women did. He wondered if this difference could just be chance.

I wrote back to say that we could use the 2 Proportions test in Minitab Statistical Software to find out.

His reply was swift and brief, "Great! Let me know what you find out." 

Power and Sample Size for the 2 Proportions Test

My reply was somewhat less swift, and somewhat less brief. I explained that I needed to know how many profiles of men and women he looked at, and how many of each included car selfies. And if only about 5% of men post car selfies, it would take a pretty big sample to confirm that. On average, you'd expect to sample about 20 men's profiles before you found just one with a car selfie. And you'd need more than just 1 or 2 such instances before you could be satisfied that you've got a reasonable estimate of the true proportion. 

I was about to hit Send, when I realized that I could very quickly do a sample size calculation in Minitab Statistical Software and take some of the guesswork out it. I choose Stat > Power and Sample Size > 2 Proportions and filled out the dialog box like so:

Power and Sample Size for 2 Proportions dialog box

For the Comparison proportion and the Baseline proportion, I used Bob's guesstimates of 5% (0.05) and 25% (0.25). I entered power values of 0.9 and 0.95 and left the sample size box blank. This tells Minitab to calculate the sample sizes that will give us the desired power if the real proportions are what we think they are.  

Power and Sample Size Test for Two Proportions Testing comparison p = baseline p (versus ≠) Calculating power for baseline p = 0.25 α = 0.05 Sample Target Comparison p Size Power Actual Power 0.05 65 0.90 0.900529 0.05 80 0.95 0.950371 The sample size is for each group.

As I had suspected, the samples need to be pretty large in order to have the desired amount of power. I told Bob that in order to have a 90% chance of detecting the hypothesized difference, he'll need to sample 65 men and 65 women. To have a 95% chance, he'll need to sample about 80 of each.   

I wasn't sure whether Bob was up the task of looking through 130 or so user profiles and recording how many included car selfies. So I was surprised to when I got his call the next day. 

"51 women and 75 men," he said.

I requested clarification, "Huh?" 

"On the dating site." Bob explained. "In my location and age range, there are 51 women and 75 men including me." 

"And I thought it was because you talk while you're chewing," I remarked (out loud, apparently).

"What?"

"The ratio," I said. "That's probably why you're not doing so well...too much competition."

"Oh, yeah. Definitely. Anyway, is that enough data?" 

I thought about it. Combined, that made 126 observations. Pretty close to the 130 we wanted. And the farther a proportion is from 0.5, the harder it is to estimate. Our guesstimated proportion for men (0.05) is much farther from 0.5 than is our guesstimated proportion for women (0.25). So having more observations from men than women probably won't hurt us.

I summed this up for Bob. "Probably."

"Great. I'll get back to you," he replied.

Performing the 2 Proportions Test with Collected Data

Upon my arrival at work the next day, I was greeted by an email from Bob with the subject line "Car selfies". The message explained that of the 75 men, 6 had posted car selfies. And of the 51 women, 12 had posted car selfies. Definitely sounded like Bob was on to something. I opened Minitab Statistical Software to confirm. I chose Stat > Basic Statistics > 2 Proportions, and entered the summary data that Bob had conveyed. In this scenario, each profile with a car selfie is considered an 'event'. And the number of trials is simply the number of profiles that were sampled:

Two-Sample Proportion dialog box

The results were quick and confirmatory:

Test and CI for Two Proportions Sample X N Sample p 1 6 75 0.080000 2 12 51 0.235294 Difference = p (1) - p (2) Estimate for difference: -0.155294 95% CI for difference: (-0.286910, -0.0236787) Test for difference = 0 (vs ≠ 0): Z = -2.31 P-Value = 0.021 Fisher’s exact test: P-Value = 0.019

I called Bob to give him the good news that his intuitions seem to be supported by the data. Only about 8% of men in the sample posted a car selfie, but about 24% of women did. Fisher's exact p-value of 0.019 tells us that there's only a 1.9% chance of observing a difference this big in samples of this size by accident alone. 

Risks of Overgeneralization (and Clowning Around)

"I knew it!" he exclaimed, with a bit more excitement than I thought was warranted.

"I don't think you're in line for a Nobel prize or anything," I remarked (out loud, apparently).

"What?"

"With a sample that size, it's probably not a chance happening...but," I cautioned, "is it possible that the women just post more photos?"

"I don't think so, why?"

"Well, I guess I assumed that men and women post approximately the same number of photos. But if women post a lot more photos, then you'd probably expect to see more car selfies as well as more photos of women standing under trees, sitting on benches, hugging circus clowns, ..."

"Circus clowns?" He seemed genuinely concerned. "I don't think I've seen any circus clowns. I don't know if I'd date a woman who hugged clowns. I hate clowns." 

"You're a clown, Bob, so that seems unethical." I remarked (out loud, apparently).

"What?"

"I said it's just a hypothetical. The point is, if men tend to post only 1 or 2 photos, and women post 7 or 8, then you can make up almost any category you want, and you'll probably see more such photos from the women, simply because they post more pictures."

Once he got over the circus clown thing, it occurred to him that he could go back and record the number of photos that were posted by each person. Using Minitab, I created the following dotplot, which made it clear that there was no systematic difference in the number of pictures posted by women and men.

Dotplot of car selfies for men and women

Epilogue

While the results of this little test were significant, it's important not to overgeneralize such findings. While these findings might be indicative of the men and women in Bob's demographic and location, the findings among other age groups, or in different locations could be quite different.

For example, car selfies might be less common in urban locations simply because more people use public transportation and fewer people own cars. And it is likely that you would find far fewer car selfies on a dating site that is targeted to the circus clown demographic ("Clown Harmony"?) simply because clown cars are notoriously overcrowded, making it practically impossible to take a good selfie.

Car selfie reprinted in its original form under Creative Commons License 2.0.


Viewing all articles
Browse latest Browse all 828

Trending Articles