Worksheet 8

Published

March 2, 2026

Packages

library(tidyverse)
library(car)
library(lme4)

Neurocognition in individuals with schizophrenia

A study was carried out to evaluate patterns and levels of performance on neurocognitive measures among individuals with schizophrenia and schizoaffective disorder using a well-validated, comprehensive neurocognitive battery specifically designed for individuals with psychosis.

The main interest was in determining how well these measures distinguished among all groups and whether there were variables that distinguished between the schizophrenia and schizoaffective groups. Age and sex were also measured for each individual, but we ignore those in our analysis.

Variables of interest, all quantitative except for the first one:

Dx Diagnostic group, categorical with levels Schizophrenia Schizoaffective Control
Speed Speed of processing score
Attention Attention/Vigilance score
Memory Working memory score
Verbal Verbal Learning score
Visual Visual Learning score
ProbSolv Reasoning/Problem Solving score
SocialCog Social Cognition score

The clinical sample comprised 116 male and female patients who had a diagnosis of schizophrenia (\(n = 70)\) or schizoaffective disorder (\(n = 46\)) confirmed by a standard test.

Non-psychiatric control participants (\(n = 146\)) were screened for medical and psychiatric illness and history of substance abuse. Patients were recruited from three outpatient clinics in Hamilton, Ontario, Canada. Control participants were recruited through local newspaper and online classified advertisements for paid research participation.

The data are in http://ritsokiguess.site/datafiles/NeuroCog.csv.

Read in and display (some of) the data.

Solution

The usual, exactly:

my_url <- "http://ritsokiguess.site/datafiles/NeuroCog.csv"
NeuroCog <- read_csv(my_url)

Rows: 242 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Dx, Sex
dbl (9): id, Speed, Attention, Memory, Verbal, Visual, ProbSolv, SocialCog, Age

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

NeuroCog

\(\blacksquare\)

Why is this dataset suitable for a multivariate ANOVA analysis? Explain briefly.

Solution

All the quantitative test scores are response variables (they are all outcomes of the study). There is only one explanatory variable, Dx, with three levels. With more than one response variable and one or more categorical explanatory variables, a MANOVA is a sensible way to proceed.

\(\blacksquare\)

Create a suitable response variable for a MANOVA. Show the first few rows of your response variable. NOTE: if you display it all, all 242 rows will be displayed. Be careful.

Solution

There are two ways to go here. The way you’ll probably think of is the “base R” way:

y <- with(NeuroCog, cbind(Speed, Attention, Memory, Verbal, Visual, ProbSolv, SocialCog))
head(y)

     Speed Attention Memory Verbal Visual ProbSolv SocialCog
[1,]    19         9     19     33     24       39        28
[2,]     8        25     15     28     24       40        37
[3,]    14        23     15     20     13       32        24
[4,]     7        18     14     34     16       31        36
[5,]    21         9     35     28     29       45        28
[6,]    31        10     26     29     21       33        28

The problem with this way is that you have to name all seven of the response variables. The other way you can go is the “tidyverse” way, where you use tidyverse tools to select the columns you want without naming them, and then you turn it into a matrix at the end:

NeuroCog %>% 
  select(Speed:SocialCog) %>% 
  as.matrix() -> response
head(response)

     Speed Attention Memory Verbal Visual ProbSolv SocialCog
[1,]    19         9     19     33     24       39        28
[2,]     8        25     15     28     24       40        37
[3,]    14        23     15     20     13       32        24
[4,]     7        18     14     34     16       31        36
[5,]    21         9     35     28     29       45        28
[6,]    31        10     26     29     21       33        28

The reason for the head is that what I called y and response are both R matrix objects, not dataframes. If you display a matrix, it shows you the whole thing, no matter how many rows it has, so you need to use head, or something like this:

response %>% as_tibble() %>% slice(1:10)

to display only enough of it to verify that you have the right thing. (The last one turns response back into a dataframe to use slice.)

\(\blacksquare\)

Run a suitable MANOVA using the manova command, displaying the output.

Solution

This is straightforward, because it is just like aov except with the multi-column response that you made:

neuro.1 <- manova(response ~ Dx, data = NeuroCog)
summary(neuro.1)

           Df  Pillai approx F num Df den Df    Pr(>F)    
Dx          2 0.34795   7.0406     14    468 2.956e-13 ***
Residuals 239                                             
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

\(\blacksquare\)

Run a suitable MANOVA using Manova from the car package, displaying the results.

Solution

Load car first, and then go through the two-step process to get the analysis:

neuro.2 <- lm(response ~ Dx, data = NeuroCog)
neuro.3 <- Manova(neuro.2)
summary(neuro.3)


Type II MANOVA Tests:

Sum of squares and products for error:
              Speed Attention   Memory    Verbal    Visual  ProbSolv SocialCog
Speed     26951.463 18528.553 18905.32 11987.542 13615.554 11973.889  6644.373
Attention 18528.553 38318.045 21237.57 14631.133 13714.366  9205.879 13537.946
Memory    18905.318 21237.570 32337.99 14250.909 16211.156 13144.151 11766.618
Verbal    11987.542 14631.133 14250.91 20416.547 12810.992  7260.057  7000.052
Visual    13615.554 13714.366 16211.16 12810.992 26566.679 11753.637  6761.313
ProbSolv  11973.889  9205.879 13144.15  7260.057 11753.637 19861.366  5372.382
SocialCog  6644.373 13537.946 11766.62  7000.052  6761.313  5372.382 33805.811

------------------------------------------
 
Term: Dx 

Sum of squares and products for the hypothesis:
             Speed Attention   Memory   Verbal   Visual ProbSolv SocialCog
Speed     8360.293  6813.042 5600.521 6238.566 5555.768 5865.127  6130.585
Attention 6813.042  5579.079 4581.959 5105.181 4528.576 4742.708  5141.087
Memory    5600.521  4581.959 3763.704 4193.298 3722.463 3904.419  4203.457
Verbal    6238.566  5105.181 4193.298 4671.981 4146.595 4347.563  4688.898
Visual    5555.768  4528.576 3722.463 4146.595 3692.081 3896.223  4079.538
ProbSolv  5865.127  4742.708 3904.419 4347.563 3896.223 4165.345  4101.841
SocialCog 6130.585  5141.087 4203.457 4688.898 4079.538 4101.841  5277.131

Multivariate Tests: Dx
                 Df test stat  approx F num Df den Df     Pr(>F)    
Pillai            2 0.3479501  7.040631     14    468 2.9559e-13 ***
Wilks             2 0.6610440  7.653800     14    466 1.3276e-14 ***
Hotelling-Lawley  2 0.4991527  8.271673     14    464 5.9508e-16 ***
Roy               2 0.4702174 15.718695      7    234 < 2.22e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

\(\blacksquare\)

What are you able to conclude from your analyses? (The conclusion should be the same for both of them.)

Solution

The null hypothesis says that all the test scores have the same mean on all the diagnoses. This is clearly rejected, so the null is false in some way: there are test scores with different means on some of the diagnoses (or, it is not true that all the test scores are the same for all the diagnoses). If you use the null hypothesis, make sure it is clear enough which part of it is your null hypothesis and which part is your conclusion, since they are not the same.

Your conclusion will necessarily be vague, because we have no idea which diagnoses are different on which tests. To assess that properly, we need to do a discriminant analysis, which is coming up later. We will do an informal analysis with a graph.

\(\blacksquare\)

Carry out Box’s M test. What do you conclude from it?

Solution

Load (and install if you have to) MVTests, and then:

library(MVTests)


Attaching package: 'MVTests'

The following object is masked from 'package:datasets':

    iris

summary(BoxM(response, NeuroCog$Dx))

       Box's M Test 

Chi-Squared Value = 77.84483 , df = 56  and p-value: 0.0284

This is significant at α=0.05, but recall that we do not reject equal spreads (across groups for each response variable) until this P-value goes below 0.001, so this says that the spreads are not unequal enough to be a problem.

\(\blacksquare\)

Make boxplots of all seven response variables against diagnosis (Dx), on one ggplot. The best answer will display the graphs so that they are easy to read. Hint: the idea is the same as plotting residuals against all of the explanatory variables in a regression.

Solution

First, we need the data “longer”, with all the test scores in one column and a second column labelling which test they were:

NeuroCog %>% 
  pivot_longer(Speed:SocialCog, names_to = "test_name", values_to = "test_score")

Having done that, we can now draw facetted boxplots, one for each explanatory variable:

NeuroCog %>% 
  pivot_longer(Speed:SocialCog, names_to = "test_name", values_to = "test_score") %>% 
  ggplot(aes(x = Dx, y = test_score)) + geom_boxplot() +
  facet_wrap(~test_name)

The scores are all on the same scale (0-100, presumably), so you can use the same scale for all the facets, or (equally good) use scales = "free" to give each facet its own scale, since you may not know until you draw the graph whether the tests really are all on the same scale.

Three points (when this was on an assignment) for getting this far. The last point was for noticing that the diagnosis names are long and overwrite each other on the \(x\)-axes. The standard way to work around this is to put them on the \(y\)-axis, which means flipping the \(x\) and \(y\) axes around using coord_flip and making the boxes go sideways:

NeuroCog %>% 
  pivot_longer(Speed:SocialCog, names_to = "test_name", values_to = "test_score") %>% 
  ggplot(aes(x = Dx, y = test_score)) + geom_boxplot() +
  facet_wrap(~test_name) + coord_flip()

The long diagnosis names take up a lot of plot real estate, but you can compare the boxes well enough (next part), so this is all right.

Another approach is to display the facets in two columns rather than three, which might give enough room for the diagnosis labels to be seen:

NeuroCog %>% 
  pivot_longer(Speed:SocialCog, names_to = "test_name", values_to = "test_score") %>% 
  ggplot(aes(x = Dx, y = test_score)) + geom_boxplot() +
  facet_wrap(~test_name, ncol = 2)

Extra: you could even combine the coord-flipping and the two columns:

NeuroCog %>% 
  pivot_longer(Speed:SocialCog, names_to = "test_name", values_to = "test_score") %>% 
  ggplot(aes(x = Dx, y = test_score)) + geom_boxplot() +
  facet_wrap(~test_name, scales = "free", ncol = 2) + coord_flip()

which gives more room to see the boxplots.

\(\blacksquare\)

Looking at your boxplots, why do you think your MANOVA came out significant, and what do your boxplots tell you about the relative test scores for patients with diagnoses of schizophrenia or schizoaffective?

Solution

The significant MANOVA says that somewhere, there must be at least one response variable where the scores on average differ among the three groups. What seems to be happening here is that the scores are consistently higher for the control group than the other two groups, across all (or almost all, depending on how you see it) the response variables.

The researchers were interested in comparing the schizophrenia and schizoaffective patients, and as I see it, the scores on those are very similar across all the response variables, given the amount of variability there is (quite a lot). My guess is that there is actually no significant difference anywhere between these two groups.

Extra: what these boxplots don’t show us is whether there are any multivariate differences between these two groups (as we saw with the seed yield and seed weight in lecture). Maybe you have a good way to make a seven-dimensional plot, but I don’t think I do!

\(\blacksquare\)

The questions below are on repeated measures. We likely won’t get to the material for questions 14 and 15 this week; if we don’t, leave these questions for next week. (Assignment 6 won’t go any further than we go in lecture.)

You have intelligent rats

Each of 12 students trained rats to run a maze, and the number of successful runs (out of 50 attempts) was recorded on each of 5 days. Some of the students were told that the rats they were training were “bright” (intelligent) and some of them were told that their rats were “dull” (not intelligent). In actual fact, all the rats were from the same source, and none of them were any more intelligent than the others. Did it make a difference whether the students were told that their rats were intelligent on the number of mazes the rats successfully ran, and, if so, was the effect consistent over time? The same group of rats were trained by the same student throughout this experiment, so it makes sense to treat the data as repeated measures.

The data are in http://ritsokiguess.site/datafiles/intelligent-rats.csv. The columns of interest to us are:

Student: a numerical identifier for each student
Treatment: what the student was told about their rats
Day1 through Day5: the number of successful runs on each day.

There are some other variables that will not concern us here.

Read in and display (some of) the data.

Nothing unexpected here:

my_url <- "http://ritsokiguess.site/datafiles/intelligent-rats.csv"
maze <- read_csv(my_url)

Rows: 12 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (1): Treatment
dbl (11): Student, PriorExp, Block, Day1, Day2, Day3, Day4, Day5, Relax, Suc...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

maze

Set up and run an appropriate repeated-measures ANOVA. What do you conclude from it? (Hint: set up the response variable, and go through the steps required to run the Manova. Obtain the appropriate P-values, describing your process.)

There’s actually a fair bit to do here. The first bit is to grab the five Day columns and make a response matrix out of them, either this way:

maze %>% select(starts_with("Day")) %>% 
  as.matrix() -> response
response

      Day1 Day2 Day3 Day4 Day5
 [1,]   10   13   12   12    9
 [2,]   18   20   21   23   30
 [3,]   16   14   20   20   20
 [4,]   11   20   27   17   16
 [5,]   27   24   26   28   27
 [6,]   16   14   14   13   15
 [7,]    8   10   21   10   20
 [8,]   13   20   18   20   22
 [9,]   24   23   27   29   27
[10,]   13   12   23   17   16
[11,]    9   15   24   12   11
[12,]   19   17   23   33   29

or this way:

response1 <- with(maze, cbind(Day1, Day2, Day3, Day4, Day5))
response1

      Day1 Day2 Day3 Day4 Day5
 [1,]   10   13   12   12    9
 [2,]   18   20   21   23   30
 [3,]   16   14   20   20   20
 [4,]   11   20   27   17   16
 [5,]   27   24   26   28   27
 [6,]   16   14   14   13   15
 [7,]    8   10   21   10   20
 [8,]   13   20   18   20   22
 [9,]   24   23   27   29   27
[10,]   13   12   23   17   16
[11,]    9   15   24   12   11
[12,]   19   17   23   33   29

The second way is more like what I did in class, but also more typing. The as.matrix turns a dataframe into a matrix, which is what the response needs to be in the end.

Next, run lm, set up the time-dependence stuff, run Manova, and look at the results. This is basically cookie-cutter code: take what appears in the lecture notes and copy and paste it, changing a few names:

maze.1 <- lm(response ~ Treatment, data = maze)
times <- colnames(response)
times.df <- data.frame(times = factor(times))
maze.2 <- Manova(maze.1, idata = times.df, idesign = ~ times)
summary(maze.2)

Warning in summary.Anova.mlm(maze.2): HF eps > 1 treated as 1


Type II Repeated Measures MANOVA Tests:

------------------------------------------
 
Term: (Intercept) 

 Response transformation matrix:
     (Intercept)
Day1           1
Day2           1
Day3           1
Day4           1
Day5           1

Sum of squares and products for the hypothesis:
            (Intercept)
(Intercept)    104160.3

Multivariate Tests: (Intercept)
                 Df test stat approx F num Df den Df     Pr(>F)    
Pillai            1   0.97802 444.8761      1     10 1.2754e-09 ***
Wilks             1   0.02198 444.8761      1     10 1.2754e-09 ***
Hotelling-Lawley  1  44.48761 444.8761      1     10 1.2754e-09 ***
Roy               1  44.48761 444.8761      1     10 1.2754e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

------------------------------------------
 
Term: Treatment 

 Response transformation matrix:
     (Intercept)
Day1           1
Day2           1
Day3           1
Day4           1
Day5           1

Sum of squares and products for the hypothesis:
            (Intercept)
(Intercept)    4720.333

Multivariate Tests: Treatment
                 Df test stat approx F num Df den Df    Pr(>F)   
Pillai            1 0.6684447 20.16088      1     10 0.0011608 **
Wilks             1 0.3315553 20.16088      1     10 0.0011608 **
Hotelling-Lawley  1 2.0160877 20.16088      1     10 0.0011608 **
Roy               1 2.0160877 20.16088      1     10 0.0011608 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

------------------------------------------
 
Term: times 

 Response transformation matrix:
     times1 times2 times3 times4
Day1      1      0      0      0
Day2      0      1      0      0
Day3      0      0      1      0
Day4      0      0      0      1
Day5     -1     -1     -1     -1

Sum of squares and products for the hypothesis:
          times1    times2     times3    times4
times1 280.33333 193.33333 -67.666667 38.666667
times2 193.33333 133.33333 -46.666667 26.666667
times3 -67.66667 -46.66667  16.333333 -9.333333
times4  38.66667  26.66667  -9.333333  5.333333

Multivariate Tests: times
                 Df test stat approx F num Df den Df   Pr(>F)  
Pillai            1 0.6827637 3.766392      4      7 0.060953 .
Wilks             1 0.3172363 3.766392      4      7 0.060953 .
Hotelling-Lawley  1 2.1522241 3.766392      4      7 0.060953 .
Roy               1 2.1522241 3.766392      4      7 0.060953 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

------------------------------------------
 
Term: Treatment:times 

 Response transformation matrix:
     times1 times2 times3 times4
Day1      1      0      0      0
Day2      0      1      0      0
Day3      0      0      1      0
Day4      0      0      0      1
Day5     -1     -1     -1     -1

Sum of squares and products for the hypothesis:
       times1    times2 times3     times4
times1     27  51.00000     81  -6.000000
times2     51  96.33333    153 -11.333333
times3     81 153.00000    243 -18.000000
times4     -6 -11.33333    -18   1.333333

Multivariate Tests: Treatment:times
                 Df test stat approx F num Df den Df   Pr(>F)  
Pillai            1 0.6538624 3.305793      4      7 0.080236 .
Wilks             1 0.3461376 3.305793      4      7 0.080236 .
Hotelling-Lawley  1 1.8890244 3.305793      4      7 0.080236 .
Roy               1 1.8890244 3.305793      4      7 0.080236 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Univariate Type II Repeated-Measures ANOVA Assuming Sphericity

                 Sum Sq num Df Error SS den Df  F value    Pr(>F)    
(Intercept)     20832.1      1   468.27     10 444.8761 1.275e-09 ***
Treatment         944.1      1   468.27     10  20.1609 0.0011608 ** 
times             294.3      4   411.07     40   7.1586 0.0001909 ***
Treatment:times   194.3      4   411.07     40   4.7259 0.0032260 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Mauchly Tests for Sphericity

                Test statistic p-value
times                  0.69814 0.96424
Treatment:times        0.69814 0.96424


Greenhouse-Geisser and Huynh-Feldt Corrections
 for Departure from Sphericity

                 GG eps Pr(>F[GG])    
times           0.86937  0.0004321 ***
Treatment:times 0.86937  0.0052136 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                  HF eps   Pr(>F[HF])
times           1.389507 0.0001909378
Treatment:times 1.389507 0.0032260394

Finally, go down to near the bottom and check the sphericity tests. Sphericity is by no means rejected (P-value 0.964), so look at the “Univariate Type II Repeated-Measures ANOVA Assuming Sphericity” just above that. The interaction is significant, with a P-value of 0.0032, so that is our finding. In the context of the data, that means that there is an effect of treatment on the number of successful runs, but that effect depends on which day you are looking at. (This is as far as we can go for now.)

Draw a suitable interaction plot (for treatment and time). How does this clarify your conclusions of the previous part? (Hint: you’ll need longer data. If you want to be consistent with me, use Day for the names of the days, and runs for the numbers of runs.)

The hint says that the first thing we need to do is to make “longer” data, with each observation on one row:¹

maze %>% 
  pivot_longer(starts_with("Day"), names_to = "Day", values_to = "runs") -> maze_long
maze_long

Then work out the mean number of runs for each combination of Treatment and Day:

maze_long %>% 
  group_by(Treatment, Day) %>% 
  summarize(mean_runs = mean(runs)) -> means

`summarise()` has regrouped the output.
ℹ Summaries were computed grouped by Treatment and Day.
ℹ Output is grouped by Treatment.
ℹ Use `summarise(.groups = "drop_last")` to silence this message.
ℹ Use `summarise(.by = c(Treatment, Day))` for per-operation grouping
  (`?dplyr::dplyr_by`) instead.

means

And then make the interaction plot, bearing in mind that you need a colour and a group and they need to be the same:

ggplot(means, aes(x = Day, y = mean_runs, colour = Treatment, group = Treatment)) + 
  geom_point() + geom_line()

(You don’t need to display the intervening dataframes, though you probably should for yourself while you’re working on this.)

When the students were told that their rats were “bright”, the number of successful runs on average increased over time. But for the students told that their rats were “dull”, the mean number of successful runs increased up to day 3, peaked there, and decreased afterwards. You might also say that the traces over time are not parallel, and talk about how they are not.

This difference in pattern over time was the reason for the significant interaction.

Extra: I also drew a spaghetti plot, which uses the same long data that we just made, but has a colour and group that are different. The lines on this one need to join observations that come from the same student, so that’s what goes in group:

ggplot(maze_long, aes(x = Day, y = runs, colour = Treatment, group = Student)) + 
  geom_point() + geom_line()

The pattern is not so clear here (which is why I had you draw the interaction plot instead), but if you look carefully, you can that several of the students with the supposedly “dull” rats had the number of successful runs peak at day 3, while the students with “bright” rats saw the number of runs on a more or less upward trajectory. There is some variability here, but the different patterns over time were consistent enough to make the interaction strongly significant.

The red traces are mostly above the blue ones (except perhaps at day 3), which is probably the reason there is a significant treatment main effect, but we would do better to look at simple effects over time to further understand the significant interaction (the treatment effect will probably not be consistent over time, as a consequence of the significant interaction).

For each time point (day), run a simple effects analysis: that is, using an ordinary aov, test whether the number of successful runs depends on whether the rat was labelled as “dull” or “bright”.

I think I airily talked about how you can use simple effects to understand an interaction in this kind of analysis, but never actually showed you an example. There are two ways you might go:

fix time and look at the effect of treatment at each time
fix treatment and look at the effect of time within that treatment

In this example, and perhaps more generally, the first option makes more sense to me: we care more about the treatment effect, while the time is a sort of “block” within which to observe the treatment effect. The first option is also easier to do, because by focusing on only one time point at a time, there is no time dependence left and each student gives you only one observation at that time (the number of runs their rats made at that time). Hence, the analysis at each time point is only an ordinary aov.²

Last year, I had this as an Extra on an assignment (where students had to do the MANOVA-based approach above). In my solutions then, I used the longer dataframe that you made above to draw the interaction plot with, but thinking about it now, it actually goes more easily with the original wider data: pick out the day of interest, see whether the number of mazes completed that day depends on treatment, and then repeat for the other days.

Here’s day 1:

d.1 <- aov(Day1 ~ Treatment, data = maze)
summary(d.1)

            Df Sum Sq Mean Sq F value  Pr(>F)   
Treatment    1  208.3  208.33   11.81 0.00636 **
Residuals   10  176.3   17.63                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

There is a treatment effect (P-value 0.006) on Day 1, which we know (from the interaction plot) is because the supposedly bright rats performed better than the dull ones. This is all we need to do for Day 1, because there were only two treatments, and we now know that they were significantly different in terms of mazes run (and we know from our graph which way around the difference is). If there had been three treatments, say a third one in which the students were told nothing about their rats, then we would run TukeyHSD to find out which treatments differed from which on Day 1. But there is no need for that here.

We would expect something similar to happen on Day 2. Copy, paste, and edit:

d.2 <- aov(Day2 ~ Treatment, data = maze)
summary(d.2)

            Df Sum Sq Mean Sq F value Pr(>F)  
Treatment    1  96.33   96.33   7.565 0.0205 *
Residuals   10 127.33   12.73                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Still significant (P-value 0.020), though not as strongly so. The bright rats (so-called) again did better than the dull ones.

Day 3 is where the significance of the treatment effect might go away, since this is where the dull rats peaked. Paste again (it is probably still on your clipboard):

d.3 <- aov(Day3 ~ Treatment, data = maze)
summary(d.3)

            Df Sum Sq Mean Sq F value Pr(>F)
Treatment    1  16.33   16.33   0.691  0.425
Residuals   10 236.33   23.63

The P-value is 0.425, which is nowhere near significance, so there is no treatment effect on Day 3.

After that, you would expect bright to be better again, since the dull rats peaked and the bright ones continued to improve after Day 3:

d.4 <- aov(Day4 ~ Treatment, data = maze)
summary(d.4)

            Df Sum Sq Mean Sq F value   Pr(>F)    
Treatment    1    432   432.0   23.61 0.000663 ***
Residuals   10    183    18.3                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

d.5 <- aov(Day5 ~ Treatment, data = maze)
summary(d.5)

            Df Sum Sq Mean Sq F value   Pr(>F)    
Treatment    1  385.3   385.3   24.65 0.000566 ***
Residuals   10  156.3    15.6                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

And so it proves, with P-values 0.00066 and 0.00057. Thus, there is a significant treatment effect on all the days except Day 3. This (small) inconsistency of pattern is what drives the significant interaction; if the interaction had not been significant, there would have been a consistent treatment effect over all five days.³

Do a mixed model analysis (hint: use the long data that you made for your interaction plot.)

Let’s remind ourselves of how the longer data looked:

maze_long

Make sure you have lme4 installed and loaded.

The response variable is runs. That depends on Treatment and time (Day) and their interaction. In addition, you see that the first five measurements in the dataframe are from the rats of the same student (student 1), and because they are trained by the same student, we expect the numbers of runs from those rats to be correlated. This is incorporated in the analysis by adding a “random effect” for Student. In the jargon of mixed models, this is more precisely called a “random intercept”: it says that the mean number of runs is moved up or down for the particular student that trained those rats, the same for each day:

maze_long.1 <- lmer(runs ~ Treatment * Day + (1|Student), data = maze_long)

Then we run drop1 to see if we can get rid of any of the fixed (non-random) effects. The test is Chisq because this model is based on maximum likelihood not least squares:

drop1(maze_long.1, test = "Chisq")

The interaction between Treatment and Day is significant, so that is our finding: this is where we stop.

Extras:

The 1 in the code 1|Student is the R notation for “intercept”, so the code 1|Student means “a different intercept for each Student”.
Random effects, such as the random intercepts for students here, never get tested. We are allowing for the possibility that the students are different from each other, which is something we can estimate because we have repeated observations on the same students. This is not of immediate interest to us, because the students are the ones the original researchers could get: in principle, a random sample of all possible students, and so we don’t care especially much about these students. You might be reminded of “blocks” in a randomized blocks analysis: we don’t care about the blocks themselves, but we expect them to be different from each other and thus to affect the outcome in addition to whatever treatments we are interested in.
You could also have a random effect of Student that increases over time, a so-called “random slope”. To do that, you pull out the number from Day and store it in, say, Day_number, and then write your random effect as Day_number|Student, which will introduce a random effect for each student that is a linear function of time.
Having repeated observations on the same subjects is new. In C32, we didn’t have that, because each subject only gave us one observation. For example, in a two-sample experiment, the subjects are divided at random and each gets only one of treatment or control. The exception is a matched-pairs experiment, where each subject has a “before” and an “after” (or something like that), and we rig up one observation for each subject by taking the difference between before and after.

Compare your findings from the mixed model to the ones from the Manova analysis you did earlier.

The findings are exactly the same, albeit with a smaller P-value: there is a significant interaction between Treatment and Day, so the effect of treatment is different for the different days.

If you want to follow this up by doing simple effects of treatment for each day, the analysis is exactly the same as the one you did before, because when you look at each day separately, there is no repeated measures.⁴

Footnotes

The definition of “tidy data” gets a bit blurry for repeated-measures data. Is “one observation” one value of a number of mazes run, with another column saying which student and day it was (longer), or is it the observations for all five days for the same student (wider)? You could make a case for either answer.↩︎
The second option is still repeated measures because you are still looking at all five time points.↩︎
The significant treatment and time effects that showed up in part (b), which we are not really supposed to interpret, say that on average there is a treatment effect over all days (that is, when you average up over all five days, the bright rats do better), and there is an effect over time on average regardless of treatment (probably driven by rats on both treatments improving over the first three days, ignoring what happened after that).↩︎
If you wanted to condition on treatment and look for a time effect for each treatment, that would still be repeated measures, and to be consistent here, you would use a mixed model for that as well.↩︎