The vis2rfi package, or "Visualising 2Way Random Forest Interactions", leverages the results obtained from forestFloor, extracting the calculated feature contributions of any pair of random forest important variables, and then using plot_ly to construct an interactive 3d surface of the pair's interaction effect.

Details

The main function vis2rfi employs subsampling approaches to deal with scenarios of bigger data size i.e. more cases or data points to plot, that otherwise would produce a difficult to view and to manage 3d visualization.

Issues With 3d Surface Depictions

When constructing 3d visualizations with thousands of data points, the following issues might arise:

  • Trends or curvatures in the produced surface are hard to be discerned. Viewing slices of the 3d surface can help in better extracting any potential interaction effect.

  • The sheer data volume might either crash the session or render it in a non-responsive state.

  • The 3d surface produced in an interactive display becomes more difficult to manipulate e.g. to turn the surface to a different viewing angle or to zoom in/'out of' it.

  • The computational time required to construct and to manipulate a 3d surface becomes longer. In applications one would like to manipulate the surface relatively fast so as to quicker extract any interaction effect that is present.

To tackle these issues subsampling is used together with smoothing in order to construct an approximation to the "all data" interaction effect, quicker and without significant loss of information.

Subsampling Approaches

  • Random Subsampling :

    Take a random sample of the cases in the observed data set and use that to construct the 3d interaction effect surface of a pair of random forest important variables.

  • Random Subsampling And GAM Smoothing :

    Similarly as above but in addition fit a feature contribution smoother to the subsample, then resample from this smoother and construct the 3d surface on a more regularly sampled grid in the XY plane than the one observed in the subsample. This way a smoother 3d interaction effect surface can be constructed.

  • Random Subsampling With Maximum Dissimilarity Sampling :

    Split a hypothetical random sample of cases in two parts: The base part and the pool part.

    Fill the base part with a portion of the hypothetical random sample of observed cases (say 67%) and let the rest (say 33%) pool part be filled by other observed cases that are maximally separated (i.e. via the Euclidean distance) from the cases in the base part.

    In this way, a more diverse range of feature contributions values can be obtained and hence a more accurate 3d surface representation of the 2 way interaction effect can be constructed. This approach is very nicely explained by Kuhn, in the example section of the documenation for maxDissim.

  • Random Subsampling With Maximum Dissimilarity Sampling And GAM Smoothing :

    Similarly as mentioned in the above options. With this option a more representative subsample is seeked together with a smoother 3d interaction effect surface approximation.

See also