Random forest machine learning

8/15/2023

Note that random forest-based importance measures have been shown to be biased if predictor variables are highly correlated, variables are measured on different scales, a mix of continuous and categorical variables are used, and/or categorical variables have varying numbers of levels. Higher values indicate a greater importance in the model. In the second code block, I have plotted the importance measures. Using the importance() function from the randomForest package, we can obtained the importance estimates for each variable relative to each class and also overall using the OOB mean decrease in accuracy and/or mean decrease in Gini measures. Note that the band order doesn’t matter however, the correct name must be associated with the correct variable or band. I then rename the raster bands to match the column names from the table. This process could also be completed in R using methods we have already discussed. I created the training data by extracting raster data values at point locations using GIS software. The tables are read in using read.csv() and the raster data are loaded using the raster() function from the raster package. (16)82038-8.Īs always, to get started I am reading in all of the needed packages and data. Predicting palustrine wetland probability using random forest machine learning and digital elevation data-derived terrain variables, Photogrammetric Engineering & Remote Sensing, 82(6): 437-447. The data provided in this example are a subset of the data used in this publication: rough_25: Terrain Roughness (51 x 51 window).rough_20: Terrain roughness (41 x 41 window).rough_10: Terrain roughness (21 x 21 window).rough_5: Terrain roughness (11 x 11 window).diss_25: Terrain dissection (51 x 51 window).

diss_20: Terrain dissection (41 x 41 window).diss_10: Terrain dissection (21 x 21 window).diss_5: Terrain dissection (11 x 11 window).ctmi: Compound topographic moisture index (CTMI).cost: Distance from water bodies weighted by slope.Here is a brief description of all the provided variables. The link at the bottom of the page provides the example data and R Markdown file used to generate this module. Since PFO/PSS wetlands should only occur in woody or forested extents, I have also provided a binary raster mask ( for_mask.img) to subset the final result to only these areas. I have also provided a raster stack ( predictors.img) of all the predictor variables so that a spatial prediction can be produced. The validation.csv file contains a separate, non-overlapping 2,000 examples equally split between PFO/PSS wetlands and not PFO/PSS wetlands. The provided training.csv table contains 1,000 examples of PFO/PSS wetlands and 1,000 not PFO/PSS examples. I will step through the process of creating a spatial prediction to map the likelihood or probability of palustrine forested/palustrine scrub/shrub (PFO/PSS) wetland occurrence using a variety of terrain variables. Before we experiment with using the caret package, which provides access to a variety of different machine learning algorithms, in this module we will explore the randomForest package to implement the random forest ( RF) algorithm specifically.

0 Comments

Random forest machine learning

Leave a Reply.

Author

Archives

Categories