Worksheet 9

Published

March 9, 2026

Packages

library(tidyverse)
library(lme4)
library(MASS, exclude = "select")

You have intelligent rats

(We analyzed these data before; this time, we are going to run a mixed model.)

Each of 12 students trained rats to run a maze, and the number of successful runs (out of 50 attempts) was recorded on each of 5 days. Some of the students were told that the rats they were training were “bright” (intelligent) and some of them were told that their rats were “dull” (not intelligent). In actual fact, all the rats were from the same source, and none of them were any more intelligent than the others. Did it make a difference whether the students were told that their rats were intelligent on the number of mazes the rats successfully ran, and, if so, was the effect consistent over time? The same group of rats were trained by the same student throughout this experiment, so it makes sense to treat the data as repeated measures.

The data are in http://ritsokiguess.site/datafiles/intelligent-rats.csv. The columns of interest to us are:

Student: a numerical identifier for each student
Treatment: what the student was told about their rats
Day1 through Day5: the number of successful runs on each day.

There are some other variables that will not concern us here.

Read in and display (some of) the data.

Draw a suitable interaction plot (for treatment and time). How does this clarify your conclusions of the previous part? (Hint: you’ll need longer data. If you want to be consistent with me, use Day for the names of the days, and runs for the numbers of runs.) (This is a repeat of the last worksheet, but you will need longer data for what follows.)

Do a mixed model analysis (hint: use the long data that you made for your interaction plot.)

Compare your findings from the mixed model to the ones from the Manova analysis you did earlier (on last week’s worksheet).

Diabetes

According to the Mayo Clinic,

Diabetes mellitus refers to a group of diseases that affect how the body uses blood sugar (glucose). Glucose is an important source of energy for the cells that make up the muscles and tissues. It’s also the brain’s main source of fuel. The main cause of diabetes varies by type. But no matter what type of diabetes you have, it can lead to excess sugar in the blood. Too much sugar in the blood can lead to serious health problems.

The data in http://ritsokiguess.site/datafiles/diabetes1.csv are from 145 non-obese adult patients classified into three groups (types of diabetes): “normal”, “overt”, and “chemical”. For each patient, five other variables were also recorded:

rw: relative weight, the ratio of actual weight to ideal weight for the person’s height.
fpg: fasting plasma glucose
glucose: area under plasma glucose curve after 3-hour glucose tolerance test
insulin: area under plasma insulin curve after 3-hour glucose tolerance test
sspg: steady-state plasma glucose

These variables are recorded here as \(z\)-scores (they were originally measured on vastly different scales).

Our aim is to investigate any association between the five measured variables and the diabetes type (in group).

Read in and display (some of) the data.

Using manova, demonstrate that the group has some kind of effect on the other variables.

Run a discriminant analysis and display the results.

Comment briefly on the relative importance of the linear discriminants.

Which two of the original quantitative variables play the largest role in LD1? What kind of values on those variables would make the LD1 score large (very positive)?

Obtain and save a dataframe containing the predicted group memberships, posterior probabilities, and discriminant scores for each individual, along with the original data. Display (some of) your dataframe.

Obtain a table counting the number of individuals who actually had each type of diabetes, cross-classified by the type of diabetes they were predicted to have. Does the classification appear to be good or bad? Explain briefly.

Find an individual that was misclassified (it doesn’t matter which one). For your chosen individual, was the misclassification clear-cut or a close thing? Explain briefly.

Make a plot of LD1 and LD2 scores for each individual, distinguished by the group they belong to. There are too many points on this plot to label individually.

Which group is on the right on your plot? What does that say about this group’s values on the original quantitative variables?