Worksheet 5

Published

January 30, 2026

Packages

library(tidyverse)
library(marginaleffects)
library(survival)

Hypernephroma

Hypernephroma, also known as renal cell carcinoma or kidney cancer, is a type of cancer that starts in the kidneys. It’s one of the most common types of kidney cancer in adults. Source.

The 33 patients in http://ritsokiguess.site/datafiles/lee_hypernephroma.csv, from a study carried out in the 1970s, all had hypernephromia and were all treated with chemotherapy, immunotherapy, and hormonal therapy. (That is to say, in this study all the patients got the same treatment, so treatment is not one of the explanatory variables.) There are a lot of columns in our data, among them:

  • age of patient in years
  • gender of patient, noted as F or M.
  • date of treatment_start, as text (month - day - year)
  • date of treatment_end (last followup or date of death), as text
  • status of patient when last seen
  • the last five columns are the results of skin tests taken at the start of treatment.

The researchers wanted to see whether any of the skin test results, as well as the age and gender of the patient, helped in predicting survival time after the start of treatment.

  1. Read in and display (some of) the data.
  1. Convert the treatment start and treatment end dates into actual dates.
  1. Work out the number of days between the start and the end of the treatment. Check that your result is indeed a number of days. Turn it, if necessary, into an actual number (with as.numeric), for plots later.
  1. Create a suitable response variable y for a Cox proportional-hazards model, and display it. (You don’t need to save it.) Does it distinguish correctly between patients whose treatment_end was their date of death, and the patients who were still alive at this point?
  1. Fit a Cox proportional-hazards model, predicting survival time from age, gender, and the five skin test results. Display the summary of the model. (Hint: copy your Surv from above into your modelling function.)
  1. (2 points) Use step to remove explanatory variables that do not help to predict survival time. Save and display the model that comes out of step. (Some of the explanatory variables will only be significant at 0.10, not 0.05. Keep those.)
  1. Plot predicted survival probabilities over time for five representative ages. Hint: your procedure will use representative values for the other variables, so you do not need to supply values for those or indeed choose the values for age. You might get a warning (that you can ignore), here and on the other similar plot.
  1. Describe the effect of increasing age on your plot, and explain briefly how this is consistent with the summary output from your model.
  1. Repeat the previous two questions for mumps skin test values.