Causal Epistemology


Q: How do we estimate the causal effect of a treatment T on the healthcare outcome Y of a patient?

FIG. 1:
Association vs. Causation.
Adapted from Peters et al (2017)
P1: Association is generally a good guide to causation.
P2: However, association is no guarantee of causation.

P1 and P2 are normally taken for granted in statistics primers.

There are two ways to determine the causal effect of one variable on another:
OPTION 1: Experimentation and measurement of causal effects
OPTION 2: Observation and estimation of causal effects relative to the observational dataset
Randomized Controlled Trials (RCTs) are supported by Fisher's (1925) theory of experimental design


RCTs rely on randomization (e.g. allocating members of a sample population into either the Treatment Group or the Control Group with the toss of a fair coin) to eliminate bias

However, it is often infeasible or impossible to conduct RCTs
∴ We may have to consider OPTIONS 2A-2C instead

FIG. 2:
Randomized Controlled Trials (RCTs)
(OPTION 1)

FIG. 3:
The Logic of Counterfactuals
(OPTION 2A)
The Logic of Counterfactuals has been developed by philosophers:
i) Robert Stalnaker (1968)
ii) David Lewis (1973)

The Logic of Counterfactuals relies on the Semantics of Possible Worlds
The Rubin Causal Model or Potential Outcomes Approach has been developed by statisticians like:
i) Donald Rubin (1974)
ii) James Robins (1986)
iii) Miguel Hernán and James Robins (2020)

The Rubin Causal Model or Potential Outcomes Approach relies on a set of Identifiability Conditions:
a) The Consistency Condition: the treatment is well-defined
b) The Exchangeability Condition: the conditional probability of receiving treatment T depends only on the measured covariates L
c) The Positivity Condition: the probability of receiving any value of treatment conditional on L is positive

FIG. 4:
The Potential Outcomes Approach
(OPTION 2B)

FIG. 5:
The Do-Calculus Approach
(OPTION 2C)
The Do-Calculus Approach has emerged from computer science as a result of the groundbreaking work of Judea Pearl (2000)

The Do-Calculus Approach relies on such axioms as the Causal Markov Condition and the Causal Faithfulness Condition:
a) The Causal Markov Condition: for every node W in a set of nodes V, W is independent of its non-effects, given its parents
b) The Causal Faithfulness Condition: if all and only those conditional independence relations true in the probability distribution P are entailed by the Causal Markov Condition applied to a graph G, then P and G are faithful to one another
Whether we go for OPTION 2A, OPTION 2B, or OPTION 2C, we will still have to adjust for confounders

Confounders are a set of measured covariates listed under node L (see FIG. 6) that affect the values of both treatment T and healthcare outcome Y

In our Medical AI project, we rely on the domain-specific expertise of our medical collaborators to identify possible confounders

FIG. 6:
Dealing with Confounders
EXTRA: A Causal Calculator in Python and Excel (Developed from OPTION 2B)

Source: Tsuriel
    The input (integer) values of the following variables have been randomized:
  1. The number of patients who will die if treated (randomized between 1-200);
  2. The number of patients who will survive if treated (randomized between 1-200);
  3. The number of patients who will die if untreated (randomized between 1 and the total number of patients)
    Our causal calculator computes the following measures as output values:
  1. The causal risk difference;
  2. The causal risk ratio;
  3. The causal odds ratio

    The following definitions of causal effect are equivalent:
  1. The causal risk difference ≠ 0;
  2. The causal risk ratio ≠ 1;
  3. The causal odds ratio ≠ 1

  4. For more on how these measures are computed, see Hernán (2004)
    How our causal calculator works:
  1. STEP 1: Download and save this template file as 'example.xslx'
  2. STEP 2: Make sure that you are able to view the code from Google Colab in playground mode
  3. Access the code on Google Colab ()



  4. STEP 3: Click 'Ctrl + S' and select 'Save a Copy in Drive'
  5. STEP 4: Select the Folder icon (on the LHS of Google Colab) and upload 'example.xslx'
  6. STEP 5: Run the code
  7. STEP 6: Select the Folder icon (on the LHS of Google Colab) and download 'results.xlsx'
  8. STEP 7: View the results after you have downloaded 'results.xlsx' and enabled editing
  9. STEP 8: Repeat STEPS 5-7 to get different sets of output values from different sets of randomized input values