Keynote Slide Deck

Getting Started: Growing a Nose for Research
  1. Nobody cares about your grandmother

Almost any time a professor in intro soc relates a research finding about human behavior the hand of some eager first year sitting in the front row shoots up and she says "I don't think that's true, my grandmother…." And as gently as possible the professor says "nobody cares about your grandmother." The reason we don't care about your grandmother is the same reason you are so proud to bring her up as an example: she's unusual, she's famously unusual for the thing that you brought her up to make your point. The thing you told us is true about your grandmother, but it's probably not true of female senior citizens in general.

Grandmothers are the bête noire of intro soc classes, but the problem of being distracted by interesting and exceptional nearby example is a gigantic problem of research in general.

Grandmothers are related to "availability bias" and the "availability heuristic" in which we put too much weight on information that's ready to hand, easy to recall, or recently encountered.

And grandmothers represent another thorn in the side of good research: anecdote. When you know something and you want to communicate that something to others one of the most effective ways to do it is to tell a story.

Malcom Gladwell famously illustrated the research findings that expertise takes about 10,000 hours1 of practice with a delightful narrative about the Beatles playing in Hamburg night after night. That's a very effective way to illustrate the theory. But by itself it's not evidence for the theory.

This points us toward really important point number two.

Suppose you said "well, OK, but let's also talk about the Rolling Stones, and, and Ryan Lewis & Macklemore. And suppose I could name ten more.

Question: Doesn't that go a long way to support the idea that ten thousand hours of practice yields success? Or that ten thousand hours is necessary for success? What's wrong with my saying that?

Do NOT Sample on the Dependent Variable!

Whenever we study the world we are interested in the connection between things we can observe. Do middle aged men tend to buy bigger cars? What sorts of ads appeal to 20 something females with money? What kind of product placements yield the most increased sales? Which colors in logos are most effective in getting people to take a second look at serious stuff? At frivolous stuff?

In each of these questions there is an implicit cause and an implicit effect. Which is which in these examples?
Question Cause Effect
Do middle aged men tend to buy bigger cars?
What sorts of ads appeal to 20 something females with money?
What kind of product placements yield the most increased sales?
Which colors in logos are most effective in getting people to take a second look at serious stuff?
At frivolous stuff?

We call the effect part the "dependent variable" and the cause part the "independent variable." Why? We think about both as things we can measure:

If we measure or observe people's age and sex/gender we can think of that as a variable - if we look at 100 people we'll get different values for each person. So we think of it as a variable - it's something that can vary.

It's INDEPENDENT because in this situation we do not think that the kind of car a person decides to buy CAUSES their age and gender identity. Quite the other way round, the logic is that their automobile preferences DEPEND on their age and gender.

So let's go back to the awesome musicians. What was EFFECT variable? Being an awesome musician. But how is this a variable? The variable is "how good of a music group are they?" There are folks out there who are crappy musicians and there are folks out there who are excellent musicians. Hence, it's a variable. And we are calling the the DEPENDENT variable since we think that how good you are DEPENDS on how much you practice.

And what was the INDEPENDENT variable? How much time they practiced.

But how did we choose our examples? We went down the list of some awesome musicians and groups: the Beatles, the Rolling Stones,, and Macklemore. We chose as our examples cases that all had the same value of the DEPENDENT variable. And then we "measured" the independent variable for each of these.

What did we leave out? Bad musicians! In fact, we did our "research" backwards - if we are really interested in whether ten thousand hours is the secret to success, we need to select our examples from different values of the INDEPENDENT variable: we want to find some groups that didn't practice much at all, some that practiced 1000 hours, some that practiced 2000 hours, and so on.

We SAMPLE - select the cases we want to study - from values of the INDEPENDENT variable.

The most classic example of this is heroin use and marijuana use. Suppose I live in a state that legalizes marijuana smoking and I wonder if that' going to lead to more heroin use. To find out, I interview 1000 heroin users about whether or not they smoked marijuana before they started using heroin.

What could be wrong with that?

Right. The fact that almost all heroin users previously used marijuana is useless information. Why? Because we are sampling on the dependent variable.

We should have sampled on marijuana use. Find 500 people and ask whether they used marijuana or not in the past and then see how many of the ones who say yes went on to heroin and how many did not and then compare this to non-marijuana users. What will we find out? Probably that you almost have to smoke dope before you shoot heroin but that almost all dope smokers do NOT become heroin users.

We used an important word along the way above: "sampling." It's one of the two or three most important informational inventions of the past five hundred years.

Statistical sampling: most important invention of last 500 years

Suppose you have arranged with a factory in Shenzen China to manufacture your new thingamajig. You take quality seriously and so you want to make sure these things are built to last. The way to test them is to bend them in half over and over again until they break - and luckily you have a machine to do this. The problem is that testing one of your thingamajigs this way destroys it. So, how do you run your quality tests? How many do you have to bend to the point of breaking to know what the quality of the rest of them is? The answer: you take a random sample.

There is a whole science and lots of math that helps us figure out how to do sampling right and you might have to learn some of that somewhere along the way. For today, I just want to call your attention to the problem that good sampling helps you avoid: selection bias.

Selection Bias Invalidates a Lot of Research

Selection bias happens when the method you use to pick out what you study has an effect on what you find.

How is each of the following selection bias?

A German friend wants to study American high schools so she interviews people in our first year cohort.
I want to characterize the "typical outfits" worn by USC students so I take photos of people selected at random out on the crossway each morning at 9:00 for a week.
I do a tally of the different kinds of things my instagram friends post pictures of.
I interview 40 Southern California public school teachers about how they teach students about what they think about Adobe products.
I interview teachers who use Adobe products about how easy they find them to use.
I want to read three books about cancer so I go to Amazon, find one, and then buy two others that are recommended under "people also bought…."

If you find yourself thinking "selection bias is hard to distinguish from sampling on the dependent variable," you are right. The latter can be thought of as one type of the former.

And selection bias brings us to our last big idea: distributions.

Distributions are EVERYTHING

When I say "distribution" or "distributed" I mean that some characteristic that I care about comes in different amounts, flavors, styles, or values - it is a variable - and that the different values occur with different frequencies. Frequency just means how often do you find it. So, in an ordinary group of humans you will find a few extremely tall folks, a few extremely short folks - the extremes have low frequency - and the majority being in the middle of the pack height-wise.

Suppose, though, that there were two conferences happening on campus at the same time. On the second floor is a meeting of former professional basketball players and on the third floor is a meeting of former jockeys. When we look at the distribution of height in the food court at lunch time we would find a very high frequency of tall people and a very high frequency of short people and a relatively low frequency of medium-sized people.

It is critical, as you do any kind of research, to be mindful of the shape of the distribution of anything you are learning about. Whenever you encounter a particular case you should be wondering what it tells you about the distribution.

If you interview one cancer patient about their experience, where does that experience lie amidst the distribution of all experiences?

If you read a book about a topic, where does what the book has to say fit in the distribution of things people say about this topic?

And distributions come in more than one dimension. Last time, Professor Thomas showed you two "window" diagrams - more generally we call these "attribute spaces." A two dimensional attribute space takes two distributions and overlays them.

One way outcomes are distributed is along the variable of desirability. Another dimension along with outcomes can be distributed is "expectedness." If we put these dimensions together we get an attribute space and we can ask how cases are distributed in two dimensions:

Or suppose you want to know which placement of a new button called "like" gets more people to click on it? How do you test the different placements? The answer: you run a random trial.

Top N Things You Need to Know about Research to not be Dangerous

  1. Method matters for a reason


  1. Selection bias
  2. DV and IV
  3. The cases by variables rectangle
  4. Distributions are EVERYTHING
  5. Sampling on the dependent variable
  6. Association as Information
  7. Measurement: nominal, ordinal, interval, ratio
  8. Validity, Accuracy, Reliability, Precision
  9. Observations vs. Interpretations
  10. Cultural domain analysis
  11. Types of human data: internal states, external states, behavior, artifacts, environment
  12. The power of 2x2 tables
  13. What is an attribute space and why is it amazing?
  14. Multidimensional scaling - your new best friend

Validity, accuracy, reliability, and precision exercise based on card sorts. Groups get a handful of cards and try to ascertain what the differences are. Then explain the concepts and why they matter.
Sampling: the most important invention of the last 500 years.
Fundamental model of rows and columns, cases and variables.
Why sampling on the dependent variable ruins your work.

I measure my body weight by looking at the body impression I leave on my couch.
I grade students in this class by how much I like their shoes.
I'm worried about my blood pressure so I take pictures of myself to see how red my nose looks.
Brand popularity measured by instagram followers in a world where 50% of all followers are bots.
We use a single standardized test to measure the quality of human buildings.

My tin measuring cup is so dented that I'm pretty sure what it says is a pint isn't.
Measure the width of this room by stepping it off.


From H Russell Bernard Research Methods in Anthropology 4 edition


The five major kinds of variables are:

  1. Internal states. These include attitudes, beliefs, values, and perceptions. Cognition is an internal state.
  2. External states. These include characteristics of people, such as age, wealth, health status, height, weight, gender, and so on.
  3. Behavior. This covers what people eat, who they communicate with, how much they work and play—in short, everything that people do and much of what social scientists are interested in understanding.
  4. Artifacts. This includes all the physical residue from human behavior—radioactive waste, tomato slices, sneakers, arrowheads, computer disks, Viagra, skyscrapers—everything.
  5. Environment. This includes physical and social environmental characteristics. The amount of rainfall, the amount of biomass per square kilometer, location on a river or ocean front—these are physical features that influence human thought and behavior. Humans also live in a social environment. Living under a democratic vs. an authoritarian re´gime or working in an organization that tolerates or does not tolerate sexual harassment are examples of social environments that have consequences for what people think and how they behave.

Keep in mind that category (3) includes both reported behavior and actual behavior. A great deal of research has shown that about a third to a half of everything people report about their behavior is not true (Bernard et al. 1984). If you want to know what people eat, for example, asking them is not a good way to find out (Basiotis et al. 1987; Johnson et al. 1996). If you ask people how many times a year they go to church, you’re likely to get highly exaggerated data (Hadaway et al. 1993, 1998).

Some of the difference between what people say they do and what they actually do is the result of out-and-out lying. Most of the difference, though, is the result of the fact that people can’t hang on to the level of detail about their behavior that is called for when they are confronted by social scientists asking them how many times they did this or that in the last month. What people think about their behavior may be what you’re interested in, but that’s a different matter.

Most social research focuses on internal states and on reported behavior. But the study of humanity can be much richer, once you get the hang of putting together these five kinds of variables and conjuring up potential relations. Here are some examples of studies for each of the cells in table 3.1.

Aaron's List
Did you have any thoughts on what you would like to cover in D&M on Thursday?

I was thinking I would just run through a typical creative process that I would execute when pursuing a project. Something along the lines of:

  1. Establishing project subject:
    1. Client criteria.
    2. Inspiration, interest.
  2. Research techniques:
    1. Libraries.
    2. Online Archives, Databases.
    3. Journals, Newspapers, Blogs, Forums.
  3. Information Management:
    1. Google Drive organization.
    2. Social bookmarks.
    3. Tumblr process book.
    4. Pinterest.
  4. Information Analysis:
    1. Synopsis, Statistics.
    2. Morphological Analysis.
    3. Word clouds, Treemaps, Network graphs.