FFTrees

An R package to create, visualize, and evaluate fast-and-frugal decision trees

Nathaniel Phillips, Economic Psychology, University of Basel, Switzerland
useR! 2017, Brussels

How can people make good decisions based on limited, noisy information?

Limited Time. Limited information. How can one make good decisions?


plot of chunk unnamed-chunk-1

plot of chunk unnamed-chunk-4

Heart Attack Diagnosis

  • How do doctors make decisions? Experience. Intuition. Clinical judgment

plot of chunk unnamed-chunk-5

  • In a Michigan hospital, doctors sent 90% of patients to the ICU, although only 25% were actually having a heart attack.

Emergency Room Solution: a fast-and-frugal tree (FFT)

  • A fast-and-frugal decision tree (FFT) developed by Green & Mehr (1997).
  • Tree cut the false-alarm rate in half
  • Tree is transparent, easy to modify, and accepted by physicians (unlike regression).

What is a fast-and-frugal decision tree (FFT)?

plot of chunk unnamed-chunk-6

Green & Mehr (1997) "What alters physicians' decisions to admit to the coronary care unit?"

Fast-and-Frugal Decision Tree (FFT)

  • An FFT is a decision tree with exactly two branches from each node, where one, or both, of the branches are exit branches (Martignon et al., 2008)

plot of chunk unnamed-chunk-7

Examples of FFTs

plot of chunk unnamed-chunk-8

Why an FFT?

No more entities should be presumed to exist than are absolutely necessary Willam of Occam

plot of chunk unnamed-chunk-9

FFTrees

  • A package to create, visualize, and evaluate (dead-simple) fast-and-frugal decision trees.
install.packages("FFTrees")
library("FFTrees")
    a      
   / \     
0   b  
     / \   
    0   1  
 FFTrees v1.3.2

plot of chunk unnamed-chunk-11

Example: Heart Disease

age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal diagnosis
63 1 ta 145 233 1 hypertrophy 150 0 2.3 down 0 fd 0
67 1 a 160 286 0 hypertrophy 108 1 1.5 flat 3 normal 1
67 1 a 120 229 0 hypertrophy 129 1 2.6 flat 2 rd 1
37 1 np 130 250 0 normal 187 0 3.5 down 0 normal 0
41 0 aa 130 204 0 hypertrophy 172 0 1.4 up 0 normal 0
56 1 aa 120 236 0 normal 178 0 0.8 up 0 normal 0

Goal: Predict binary diagnosis classification

The FFTrees() function

# Step 1: Install and load FFTrees (v.1.3.2)
install.packages("FFTrees")
library("FFTrees")

# Step 2: Create FFTs
heart.fft <- FFTrees(formula = diagnosis ~.,
                     data = heart.train,
                     data.test = heart.test,
                     main = "Heart Disease",
                     decision.labels = c("Low-Risk", "High-Risk"))

Print an FFTrees object

heart.fft
Heart Disease
7 FFTs predicting diagnosis (Low-Risk v High-Risk)
FFT #1 uses 3 cues: {thal,cp,ca}

                   train   test
cases       :n    150.00 153.00
speed       :mcu    1.74   1.73
frugality   :pci    0.88   0.88
accuracy    :acc    0.80   0.82
weighted    :wacc   0.80   0.82
sensitivity :sens   0.82   0.88
specificity :spec   0.79   0.76

pars: algorithm = 'ifan', goal = 'wacc', goal.chase = 'bacc', sens.w = 0.5

Print a tree "in words"

inwords(heart.fft)
[1] "If thal = {rd,fd}, predict High-Risk"            
[2] "If cp != {a}, predict Low-Risk"  
[3] "If ca <= 0, predict Low-Risk, otherwise, if ca > 0, predict High-Risk"
cue definition Possible values
thal: thallium scintigraphy How well blood flows to the heart normal (n)fixed defect (fd), or reversible defect (rd)
cp: chest pain type Type of chest pain typical angina (ta), atypical angina (aa), non-anginal pain (np), or asymptomatic (a).
ca: number of major vessels colored by flourosopy 0, 1, 2, 3
plot(heart.fft, stats = FALSE, data = "test")

plot of chunk unnamed-chunk-18

plot(heart.fft, data = "test")  # Training data

plot of chunk unnamed-chunk-19

plot of chunk unnamed-chunk-20

plot(heart.fft, data = "test", tree = 6)   # Testing data, tree 6

plot of chunk unnamed-chunk-21

plot(heart.fft, data = "test", tree = 7)   # Testing data, tree 7

plot of chunk unnamed-chunk-22

Prediction Simulation

  • 10 datasets from the UCI Machine Learning Database.
  • 50\% Training, 50\% Testing
  • FFtrees vs rpart, regression, random forests...
  • Criterion: Balanced accuracy
    • mean(sensitivity, specificity)

plot of chunk unnamed-chunk-23

Speed and frugality

plot of chunk unnamed-chunk-24

Speed and frugality

plot of chunk unnamed-chunk-25

Training Balanced Accuracy (all datasets)

plot of chunk unnamed-chunk-26

Testing Balanced Accuracy (all datasets)

plot of chunk unnamed-chunk-27

plot of chunk unnamed-chunk-28

plot of chunk unnamed-chunk-29

plot of chunk unnamed-chunk-30

plot of chunk unnamed-chunk-31

Additional features

plot of chunk unnamed-chunk-32

Define an FFT manually

# Create an FFT manually
FFTrees(formula = diagnosis ~.,
data = heart.train,
my.tree = "If chol > 350, predict True. 
           If cp != {a}, predict False. 
           If age <= 35, predict False.
           Otherwise, predict True")

plot of chunk unnamed-chunk-35

Include cue costs

plot of chunk unnamed-chunk-36

FFTrees(formula = diagnosis ~.,
 data = heart.train)

Include cue costs

plot of chunk unnamed-chunk-38

FFTrees(formula = diagnosis ~.,
 data = heart.train)

Original FFT: Cost = $120, Accuracy = 80%

plot of chunk unnamed-chunk-40

Include cue costs

plot of chunk unnamed-chunk-41

FFTrees(formula, data,
 cost.cues = heart.cost,
 cost.outcomes = c(0, 200, 100, 0))

Cheap FFT: Cost = $1.56, Accuracy = 75%

plot of chunk unnamed-chunk-44

When should I consider an FFT?

When should I consider an FFT?

plot of chunk unnamed-chunk-45

Publication and Collaborators

Publication

FFTrees: A toolbox to create, visualize and evaluate fast-and-frugal decision trees. (in press). Judgment and Decision Making.

Collaborators

  • Wolfgang Gaissmaier (University of Konstanz)
  • Hansjoerg Neth (University of Konstanz)
  • Jan Woike (MPI for Human Development)

plot of chunk unnamed-chunk-46

FFTrees Unfriendly Data

  • Many cues, weak validity, ind errors

plot of chunk unnamed-chunk-49

FFTrees Friendly Data

  • Few cues with high validity, dep errors.

plot of chunk unnamed-chunk-50

plot(heart.fft, what = "cues")

plot of chunk unnamed-chunk-51

Cue importance

  • As calculated by randomForest

plot of chunk unnamed-chunk-52

Tree Building Algorithm

  1. For each cue (aka, feature), calculate a threshold that maximizes goal.chase (default: balanced accuracy)
  2. Rank order cues by goal.chase
  3. Select the top max.levels (default: 4)
  4. Create a "fan" of all possible trees with all possible exit directions.
  5. Select the tree that maximizes goal (default: balanced accuracy)

plot of chunk unnamed-chunk-53

plot of chunk unnamed-chunk-54

Conclusion

plot of chunk unnamed-chunk-55

Conclusion

plot of chunk unnamed-chunk-56

Conclusion

plot of chunk unnamed-chunk-57

How accurate can FFTs be?

plot of chunk unnamed-chunk-58

How accurate can FFTs be?

plot of chunk unnamed-chunk-59

Forest Fires (Training)

plot of chunk unnamed-chunk-60

Forest Fires (Testing)

plot of chunk unnamed-chunk-61

Forest Fires (Testing)

plot of chunk unnamed-chunk-62

Conclusion

Why use FFTrees?

  • See how, and how well, a crazy simple, transparent fast-and-frugal tree can make sense of your data.
  • You might be surprised by how well it works, and generate new insights.
library("FFTrees")
    a      
   / \     
0   b  
     / \   
    0   1  
 FFTrees v1.3.2

plot of chunk unnamed-chunk-64

Create a forest of FFTs

heart.fff <- FFForest(formula = diagnosis ~., data = heartdisease)

plot of chunk unnamed-chunk-66