
This is an intermediatelevel course. If you know what a twosample ttest is, but you never tried a three, four, six or ninesample ttest, then this course may be for you. If you have ever concluded that a factor had no effect, just on the basis of a nonsignificant pvalue, then this course is definitely for you. If you ever wondered how evidence in favour of the null hypothesis can be collected (whereas significance testing can only reject null hypotheses), then this course is for you.
I will assume you can work with the statistical software R, especially in the user interface provided by RStudio. If you have worked with SPSS instead, then this is a good time to learn R. You can find many introductory R courses on the internet, including my own Methoden en Technieken course in Dutch, which is also an introductory statistics course, or Daniel Navarro's course in English. So please install both R and RStudio installed on your laptop computers, which you then bring to the lectures.
And here is the program:
Have you ever written something like "The Dutch subjects improved significantly during the training (p = 0.01), whereas the English subjects did not improve (p = 0.60)"? Then this course is for you. The example just mentioned is a simple case of the most common fallacy in published work in our field, statistical inference from comparing pvalues: conference proceedings are full of it, but the fallacy also abounds in journal articles by the leaders of the field. If you don't know what is wrong with it, then today you will learn; if you know it's wrong but think you have to do it because everybody does it, then today you'll learn that not everybody does it and that you can avoid it too; if you know it's wrong but think you have to do it because otherwise your results cannot be published, then you're thinking in the same way as some of our leaders, but this week you will learn many ways to publish your results without cheating with statistical inference.
For a review of how often this problem occurs in psychology, read Erroneous analyses of interactions in neuroscience: a problem of significance by Sander Nieuwenhuis, Birte Forstmann & EricJan Wagenmakers (2011, Nature Neuroscience).
You can read here the presentation of today's lecture about comparing pvalues.
One of the problems for obtaining good pvalues is the low number of participants in many studies in linguistics. However, there are ways to obtain better pvalues by choosing a sensitive design: many of you will be familiar with repeatedmeasures designs, which keep a major source of variability, namely the participant, constant across many measurements. One can also obtain better pvalues by choosing a sensitive analysis method: much data in linguistics is of a discrete nature (e.g. correctness scores) rather than of a continuous nature (e.g. durations), and for discrete data the workhorse analysis method should not be a straightforward analysis of variance (e.g. a linear model on the scores), but logistic regression with participant as a random factor. After this course you will find this an easy concept.
For an account of tricks to raise pvalues, and how to avoid that, read Falsepositive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant by Joseph P. Simmons, Leif D. Nelson & Uri Simonsohn (2011, Psychological Science).
You can read here the presentation of today's lecture about confidence intervals.
Did you ever split up your participants into a "young" and an "old" group, using their median age as a criterion? This is just one way to convert continuous data into discrete data, and it is dubious. The method does allow you to use "Anova" with age as a binary factor, but also raises suspicions as to whether your binning of the age data might have been meant to improve your pvalue. It is better to keep age in the model as a continuous factor, and this is not more difficult than binning.
For a proposal to perform more honest kinds of research, read An agenda for purely confirmatory research by EricJan Wagenmakers, Ruud Wetzels, Denny Borsboom, Han van der Maas and Rogier Kievit (2012, Perspectives on Psychological Science).
You can read here the presentations of today's lecture about testing until significant and binning.
If you like to be able to accept a null hypothesis as probably true, you cannot use pvalue testing, because with pvalue testing you can only accept the alternative hypothesis (if p<0.05) or not reject the null hypothesis (if p>0.05). Instead, you need a method that takes two hypotheses equally seriously, and compare their likelihoods given the data. Today you'll learn the jargon of Bayesian inference, and how to apply these methods to otherwise hopeless experimental results.
You can read here the presentation of today's lecture about Bayesian statistics.
If you follow all of the above advice for your gigantic dataset, you'll often find that you end up creating a giant generalized linear mixedeffects model. You build this model overnight, only to find that the parameters "fail to converge" (in R) or come out as all zeroes (in SPSS). In such cases, your research questions can guide you toward simplification. For instance, if your research question is about the difference between two populations of speakers, one can typically collapse many cells of your data table into one value per speaker, computed in any interesting way that matches your specific research question. This technique, which subsumes, but is not limited to, "contrasts" for repeated measures, has pleasingly wide validity.
You can read here the presentation of today's lecture about models.
Go to Paul Boersma’s home page