by Matthew Barsalou, guest blogger
A good way to begin researching a topic is with exploratory data analysis (EDA). In his 1977 book Exploratory Data Analysis, John Tukey suggested using EDA to collect and analyze data—not to confirm a hypothesis, but to form a hypothesis that could later be confirmed through other methods.
In some cases, EDA can even eliminate the need for a more in-depth hypothesis test. Here's a case in point.
When I heard about the new Star Trek movie, I had started to complain to anybody who would listen (which was not many people) that director J. J. Abrams had used such a young cast in the 2009 Star Trekfilm.
With a tentative hypothesis of “the new Star Trek films use very young actors and actresses compared to the older Star Trek series,” I decided to look into this further. The first thing I did was collect data to use later in boxplots, which are a part of Tukey’s EDA.
Collecting Data for the Exploratory AnalysisI needed to determine the ages at which each main Star Trek actor first appeared; however, before I started looking for ages, I needed a method to determine whom I should consider as a main character in each series. To select the actors to consider I went to www.StarTrek.com and observed which characters were listed for each Star Trek series. This way I avoided biasing my results by selecting older or younger crewmembers who may not have had as much relevance as others.
The tables below list the characters and the episode or movie in which they first appeared. The name of the actor playing each character is then listed, and their year of birth as determined by viewing their entry at the Internet Movie Database. To determine the person’s age, the date of birth was subtracted from the year of first appearance. This resulted in rough calculations which could be wrong by a year, because month of birth and month of first appearance were not considered.
Table 1: Star Trek: The Original Series
Name
Character
First appeared in
Year of birth
Year of first appearance
Age
+/- 1 year
William Shatner
James T. Kirk
The Man Trap
1931
1966
35
Leonard Nimoy
Spock
The Man Trap
1931
1966
35
DeForest Kelley
Leonard “Bones” McCoy
The Man Trap
1920
1966
46
James Doohan
Montgomery “Scotty” Scott
The Man Trap
1920
1966
46
George Takei
Sulu
The Man Trap
1937
1966
29
Nichelle Nichols
Uhura
The Man Trap
1932
1966
34
Walter Koenig
Pavel Andreievich Checkov
Amok Time
1936
1967
31
Table 2: Star Trek: The Next Generation
Name
Character
First appeared in
Year of birth
Year of first appearance
Age
Patrick Stewart
Jean-Luc Picard
Encounter at Farpoint
1940
1987
47
Jonathan Frakes
Will Riker
Encounter at Farpoint
1952
1987
35
Brent Spiner
Data
Encounter at Farpoint
1949
1987
38
Levar Burton
Geordi La Forge
Encounter at Farpoint
1957
1987
30
Michael Dorn
Worf
Encounter at Farpoint
1952
1987
35
Marina Sirtits
Deana Troi
Encounter at Farpoint
1955
1987
32
Gates McFadden
Beverly Crusher
Encounter at Farpoint
1949
1987
38
Wil Wheaton
Wesley Crusher
Encounter at Farpoint
1972
1987
15
Table 3: Star Trek: Deep Space Nine
Name
Character
First appeared in
Year of birth
Year of first appearance
Age
Avery Brooks
Benjamin Sisko
Emissary
1948
1993
45
Nan Visitor
Kira Nerys
Emissary
1957
1993
36
Rene Auberjonois
Odo
Emissary
1940
1993
53
Alexander Siddig
Julian Bashir
Emissary
1965
1993
28
Colm Meany
Miles O’Brien
Emissary
1953
1993
40
Terry Farrell
Jadzia Dax
Emissary
1963
1993
30
Armin Shimerman
Quark
Emissary
1949
1993
44
Cirroc Lofton
Jake Sisko
Emissary
1978
1993
15
Michael Dorn
Worf
The Way of the Warrior
1952
1995
46
Nicole de Boer
Ezri Dax
Image in the Sand
1970
1998
28
Table 4: Star Trek: Voyager
Name
Character
First appeared in
Year of birth
Year of first appearance
Age
Kate Mulgrew
Kathryn Janeway
Caretaker
1955
1995
40
Robert Beltran
Chakotay
Caretaker
1953
1995
42
Tim Russ
Tuvok
Caretaker
1956
1995
39
Robert Duncan McNeill
Tom Paris
Caretaker
1964
1995
31
Roxann Dawson
B’Elanna Torres
Caretaker
1958
1995
37
Garrett Wang
Harry Kim
Caretaker
1968
1995
27
Robert Picardo
The Doctor
Caretaker
1953
1995
42
Ethan Phillips
Neelix
Caretaker
1955
1995
40
Jennifer Lien
Kes
Caretaker
1974
1995
21
Jerry Ryan
Seven of Nine
Scorpion:
Part 2
1968
1997
29
Table 5: Star Trek: Enterprise
Name
Character
First appeared in
Year of birth
Year of first appearance
Age
Scott Bakula
Jonathan Archer
Broken Bow:
Part 1
1954
2001
47
Jolene Blalock
T’pol
Broken Bow:
Part 1
1975
2001
26
Connor Trinneer
Charles “Trip”
Tucker III
Broken Bow:
Part 1
1969
2001
32
Dominic Keating
Malcom Reed
Broken Bow:
Part 1
1962
2001
39
John Billingsley
Phlox
Broken Bow:
Part 1
1960
2001
41
Linda Park
Hoshi Sato
Broken Bow:
Part 1
1978
2001
23
Anthony Montgomery
Travis Mayweather
Broken Bow:
Part 1
1971
2001
30
Table 6: Star Trek (2009)
Name
Character
First appeared in
Year of birth
Year of first appearance
Age
Chris Pine
James T. kirk
Star Trek (2009)
1980
2009
29
Zachary Quinto
Spock
Star Trek (2009)
1977
2009
32
Karl Urban
Leonard “Bones” McCoy
Star Trek (2009)
1972
2009
37
Zoe Saldana
Nyota Uhura
Star Trek (2009)
1978
2009
31
Simon Pegg
Montgomery “Scotty” Scott
Star Trek (2009)
1970
2009
39
John Cho
Hukaru Sulu
Star Trek (2009)
1972
2009
37
Anton Yelchin
Pavel Andreievich Checkov
Star Trek (2009)
1989
2009
30
EDA: Interpreting the Data with a BoxplotSimply looking at the results in tables 1 through 6 led to me suspect my hypothesis may have been incorrect, but I still proceeded to create a Minitab boxplot with the data.
The boxplot depicts the ages of the actors and actresses in each Star Trek series as well as in the 2009 reboot. The rectangular boxes represent the middle 50% of each data set and the vertical lines on top of the rectangular boxes represent the upper 25% of the data. The vertical lines on the bottom of the rectangular boxes represent the lower 25% of the data—except in the case of outliers. Outliers are unusually large or small observations and are represented by an asterisk. There is only one outlier in this boxplot, and that is Will Wheaton as Wesley Crusher in Star Trek: TNG.
The symbol that looks like a plus sign inside of a small circle is used to represent the average of the data set. The average age of actors and actresses in the 2009 reboot is 33.57 years, and this is just slightly lower than Star Trek: TNG, which had an average of 33.75 years of age. The highest average age was for Star Trek: TOS with an average of 36.57.
What truly stands out in the boxplot is the spread of the data. The distribution of actors' ages in the reboot was less than that of all of the other series. This would make sense as it would not be plausible to use actors or actresses in their 50s or 60s to portray people who are still attending Star Fleet Academy.
The hypothesis that originally started this was “the new Star Trek films use very young actors and actresses compared to the older Star Trek series,” but a look at the boxplots in figure one show that this may not be the case. In fact, there is no reason to proceed on to confirmation testing because my hypothesis can be discarded at this point.
It looks like I owe director J. J. Abrams an apology.
Exploratory Data Analysis Raises New QuestionsEven a hypothesis that was discarded after performing EDA can lead to the...um...next generation of hypotheses, and new insights. In this case, my new hypothesis could be, “The actors and actresses in Star Trek are not getting younger; I am getting older.” The new hypothesis could also be explored with EDA prior to moving on to more robust methods.
However, in this case, I will not investigate my new hypothesis. I would rather just change the subject.
About the Guest Blogger:
Would you like to publish a guest post on the Minitab Blog? Contact publicrelations@minitab.com.
Photo of Star Trek figures by Miguel Bernas, used under creative commons 2.0 license.