PR

This article explains how to perform biodiversity index and community structure analysis (cluster analysis/NMDS) using the statistical analysis language "R"!

This article explains how to perform biodiversity index and community structure analysis (cluster analysis/NMDS) using the statistical analysis language "R"! ecosystem
This article explains how to perform biodiversity index and community structure analysis (cluster analysis/NMDS) using the statistical analysis language "R"!

If you want to calculate biodiversity indices or perform community structure analysis (cluster analysis/ NMDS ), the best approach is to use the free statistical programming language 'R' . However, when I was a student, there was little documentation available, and writing code was a struggle. Variable assignment (definition) symbols, argument notation, and function naming all varied considerably from site to site, showing a great deal of inconsistency. Now, with the standardization of notation and the emergence of excellent generative AI like ChatGPT, the language has become much easier to read. Nevertheless, in this article, I would like to share an example of community structure analysis I performed using soil animal identification data.

Sponsored Link

*This website is originally in Japanese. Other languages are automatically translated and may contain errors in scientific names or technical terms.

Example of soil animal analysis at a specific location

The following are the identification results of soil animal samples extracted using the Tullgren method at four locations over two seasons. I was responsible for identifying all species except for beetles. We will use this data to calculate the biodiversity index and analyze the community structure.

The following is raw data; in reality, the analysis would involve aggregating the number of individuals of each entered species for each location and survey day, but this will be omitted here.

Input type namesexpopulationSurvey dateSurvey location
Armored bead mite 2MayA
Flower mites 2MayA
Xylobates sp. 21MayA
Flat mite 1MayA
Asian giant mite 1MayA
Dwarf mites 20MayA
Cottony bead mites 3MayA
Family Scaridae sp. 8MayA
Yamato spider mites 2MayA
Stag beetle mites 3MayA
Himehesoirekodani 2MayA
Ghostly Tick 1MayA
Tokyo lily mite 3MayA
Veigaia sp. 9MayA
Parasitus sp. 2MayA
Family Tineidae sp. 9MayA
Podocinum sp. 1MayA
Pachylaelaps sp. 1MayA
Epicrius sp. 6MayA
Order Tinea, spp. 24MayA
Striped springtail 1MayA
Blue leafhopper springtail 2MayA
Small springtail 7MayA
Japanese white springtail 1MayA
Sacrum purpureus 3MayA
Yoshii forsom springtail 1MayA
Order Protaenia, sp. 2MayA
False ants 6MayA
Plum pine ant 1MayA
Teranishi's ant 1MayA
Centipede 2MayA
Takami's centipede1MayA
Tajimagahara Hitofushimukade1MayA
Monotarsobius sp.1MayA
Fan millipede 1MayA
Mycosiidae sp. 1MayA
Variegated woodlouse 1MayA
Long-legged woodlice 8MayA
Family Tineidae sp. 1MayA
Dwarf mites 2MayA
Family Lumidae sp. 10MayA
Blue leafhopper springtail 1MayB
Striped springtail 3MayB
Small spiny springtail 9MayB
Sacrum purpureus 15MayB
Kotsunoari 11MayB
Soldier fly family sp. 1MayB
Centipede 1MayB
Flat mite 1MayB
Stag beetle mites 1MayB
Cottony bead mites 2MayB
Podocinum sp. 1MayB
Order Tinea, spp. 3MayB
Thrips (Amblyzolacera sp.). 1MayB
Ardisia japonica scale insect 1MayB
A species very similar to Watanabe's centipede♀♂2MayB
Monotarsobius sp.4MayB
Centipede 3MayB
Cottony bead mites 23MayB
Family Scaridae sp. 2MayB
Stag beetle mites 7MayB
Pale-colored umbilical cord mite 1MayB
Xylobates sp. 5MayB
Otafuku Daruma Mite 1MayB
Podocinum sp. 2MayB
Holaspina sp. 4MayB
Pachylaelaps sp.4MayB
Parasitus sp. 1MayB
Order Tinea, spp. 47MayB
Abrolophus sp. 1MayB
Flower mites 1MayC
Armored bead mite 1MayC
Cottony bead mites 10MayC
Xylobates sp. 46MayC
Pale-colored umbilical cord mite 1MayC
Alameirekodani 1MayC
Yamato spider mites 3MayC
Dwarf mites 2MayC
Beak mites 1MayC
The smooth-skinned mite (Tricholoma rhodopolium) 1MayC
Podocinum sp. 3MayC
Family Tineidae sp. 5MayC
Veigaia sp. 1MayC
*Holostaspella* genus sp. 4MayC
Holaspina sp. 2MayC
Order Tinea, spp. 12MayC
Order Mites sp. 1MayC
Order Protaenia, sp. 2MayC
Sacrum purpureus 2MayC
Blue leafhopper springtail 4MayC
Striped springtail 3MayC
Northern scale ant 1MayC
False ants 16MayC
Teranishi's ant 1MayC
Kotsunoari 26MayC
Lepidoptera sp. 1MayC
Variegated woodlouse 5MayC
Long-legged woodlice 1MayC
Musashikamukade 1MayC
Fan millipede 2MayC
Mycosiidae sp. 2MayC
Monotarsobius sp.6MayC
A species very similar to *Sakayori* centipede.1MayC
Bothropolys sp. 1MayC
Pheretima sp. 1MayC
Family Lumidae sp. 1MayC
Two-eyed springtail 3MayD
Sacrum purpureus 2MayD
Japanese white springtail 3MayD
Family Stridae sp. 4MayD
Scale ants 1MayD
Kotsunoari 29MayD
Ameiro ant 1MayD
Soldier fly family sp. 3MayD
Family Tineidae sp. 3MayD
Family Tineidae sp. 8MayD
*Holostaspella* genus sp. 4MayD
Order Tinea, spp. 10MayD
Xylobates sp. 11MayD
Lasioseius sp.♂♀2MayD
Southern spider mites 6MayD
Tokyo lily mite 9MayD
Cottony bead mites 1MayD
Himehesoirekodani 1MayD
Aphid mite 1MayD
Centipede 1MayD
Mycosiidae sp. 9MayD
Goshichinagasumukade 1MayD
Monotarsobius sp. 1MayD
Variegated woodlouse 1MayD
Long-legged woodlice 9MayD
Pheretima sp. 1MayD
Family Lumidae sp. 1MayD
Kotsunoari 1AugustA
Ardisia japonica scale insect. 3AugustA
Mecistocephalus sp. 1AugustA
Prolamnonyx sp. 1AugustA
Variegated woodlouse 3AugustA
Southern spider mites 3AugustA
Flower mites 1AugustA
Asian giant mite 2AugustA
Two-spotted mites 1AugustA
Ghostly Tick 1AugustA
Stag beetle mites 2AugustA
The smooth-skinned mite (Tricholoma rhodopolium) 1AugustA
Dwarf mites 1AugustA
Cottony bead mites 1AugustA
Family Scaridae sp. 1AugustA
Xylobates sp. 1AugustA
Podocinum sp. 2AugustA
Order Tinea, spp. 4AugustA
Pheretima sp. 1AugustA
Striped springtail 1AugustA
Blue leafhopper springtail 12AugustA
White springtail 22AugustA
Small long springtail 2AugustA
Pseudachorutes genus 2AugustA
False ants 4AugustA
Ameiro ant 4AugustA
Northern scale ant 3AugustA
Flat-tailed scale ants 3AugustA
Kotsunoari 4AugustA
Family Pentatomidae sp. larva 1AugustA
Two-spotted mites 1AugustA
Centipede 3AugustB
Kirifuri Hitofushi Mukade1AugustB
A species very similar to Watanabe's centipede1AugustB
Eggplant centipede1AugustB
Monotarsobius sp. 3AugustB
Pill bug 1AugustB
Xylobates sp. 1AugustB
Thick-legged mites 10AugustB
Tokyo lily mite 1AugustB
Holaspina sp. 3AugustB
Podocinum sp. 1AugustB
Order Tinea, spp. 5AugustB
Japanese poppy 1AugustB
Order Protaenia, sp. 1AugustB
Striped springtail 1AugustB
Himeforsom springtail 49AugustB
Blue leafhopper springtail 2AugustB
Two-eyed springtail 2AugustB
Yoshii forsom springtail 5AugustB
Small spiny springtail 2AugustB
Sacrum purpureus 1AugustB
Northern scale ant 1AugustB
False ants 1AugustB
Umematsu ant 1AugustB
Kotsunoari 69AugustB
Scale ants 1AugustB
Elateridae sp. 1AugustB
Thick-legged mites 1AugustB
Ardisia japonica scale insect. 3AugustC
Centipede 5AugustC
Monotarsobius sp.4AugustC
Mecistocephalus sp. 2AugustC
Hirozu Gymnocalycium 3AugustC
Variegated woodlouse 6AugustC
Family Tineidae sp. 6AugustC
*Holostaspella* genus sp. 4AugustC
Order Tinea, spp. 6AugustC
Xylobates sp. 8AugustC
Flower mites 1AugustC
Asian giant mite 6AugustC
Aphid mite 1AugustC
Order Protaenia, sp. 8AugustC
Small long springtail 13AugustC
Striped springtail 2AugustC
Blue leafhopper springtail 2AugustC
Sacrum purpureus 2AugustC
Yellow-tailed ant 1AugustC
Kotsunoari 5AugustC
Northern scale ant 2AugustC
Mealybug family sp. 2AugustD
Kotsunoari 1AugustD
Mycosiidae sp. 1AugustD
Fan millipede 3AugustD
genus *Haga millipede* 2AugustD
Hirozu Gymnocalycium 2AugustD
Monotarsobius sp. 1AugustD
Variegated woodlouse 9AugustD
Xylobates sp. 1AugustD
Yamato spider mites 4AugustD
Lasioseius sp. 3AugustD
Cybaeus sp. 2AugustD
Blue leafhopper springtail 19AugustD
Yoshii forsom springtail 2AugustD
Himeforsom springtail 6AugustD
Small springtail 2AugustD
Pseudachorutes sp. 4AugustD
Order Protaenia, sp. 1AugustD
Japanese subterranean termite 19AugustD
Kotsunoari 5AugustD
Elateridae sp. larva 1AugustD
Soil animal community data for a certain region | © 2021-2026 Ecological Information Kenichi Ikeda

How can I perform crowd structure analysis in R?

The above data can be used to perform a crowd structure analysis in R. The complete code is provided below. This example uses R 4.3.2 . Basic information is based on Takada (2018). Please also refer to Doi and Okamura (2011).

required_packages <- c("here", "vegan", "labdsv") Installed packages if they are not already installed installed_packages <- rownames(installed.packages()) for (pkg in required_packages) { if (!pkg %in% installed_packages) { install.packages(pkg) } } Load packages laapply(required_packages, library, character.only = TRUE) Specify the directory getwd() script_directory <- here("./Example of laboratory analysis and community structure analysis of soil animals") setwd(script_directory) Read analysis data soil_data <- read.csv("Soil animal population data.csv", header = TRUE, row.names = 1, fileEncoding = "UTF-8") Calculate biodiversity shannon_diversity <- diversity(soil_data, index = "shannon") shannon_diversity simpson_diversity <- diversity(soil_data, index = "simpson") simpson_diversity renyi(soil_data) renyi_diversity <- exp(renyi(soil_data)) print(renyi_diversity) # Multidimensional scaling (MDS) analysis of metadata soil_data_matrix <- data.matrix(soil_data) # Convert to matrix mds_result <- metaMDS(soil_data_matrix, dist = "bray", autotransform = FALSE) mds_result ordipointlabel(mds_result, display = "sites", cex = 1, col = "black", pch = 16) # Cluster analysis veg_dist <- vegdist(soil_data_matrix, method = "bray") veg_dist hierarchical_clustering <- hclust(veg_dist, method = "ward.D2") hierarchical_clustering plot(hierarchical_clustering, hang = -1) # Evaluating the number of clusters average_silhouette <- numeric(nrow(soil_data)) for (i in 2:(nrow(soil_data) - 1)) { silhouette_vals <- silhouette(cutree(hierarchical_clustering, k = i), veg_dist) average_silhouette[i] <- mean(silhouette_vals[, 3]) } average_silhouette # clustering result optimal_cluster <- cutree(hierarchical_clustering, k = which.max(average_silhouette)) # Silhouette plot silhouette_vals <- silhouette(optimal_cluster, veg_dist) rownames(silhouette_vals) <- row.names(soil_data) plot(silhouette_vals) # Clustering using PAM method and identification of important species pam_result <- pam(veg_dist, k = which.max(average_silhouette), diss = TRUE) ind_val_analysis <- indval(soil_data, pam_result$clustering, numitr = 5000) significant_groups <- ind_val_analysis$maxcls[ind_val_analysis$pval <= 0.05] significant_species <- ind_val_analysis$indcls[ind_val_analysis$pval <= 0.05] p_values <- ind_val_analysis$pval[ind_val_analysis$pval <= 0.05] frequencies <- apply(soil_data > 0, 2, sum)[ind_val_analysis$pval <= 0.05] significant_results <- data.frame(group = significant_groups, indval = significant_species, pvalue = p_values, freq = frequencies) significant_results_summary <- significant_results[order(significant_results$group, -significant_results$indval), ] significant_results_summary

How do I load data files?

First, install the packages we will be using. This step is unnecessary if you have already installed them.

required_packages <- c("here", "labdsv", "vegan") install.packages(required_packages)

install.packages(required_packages)Using the following code instead will skip packages that are already installed.

installed_packages <- rownames(installed.packages()) for (pkg in required_packages) { if (!pkg %in% installed_packages) { install.packages(pkg) } }

While you can load them individually if you're not used to it, the code above allows you to reuse the package names as a vector when loading.

install.packages(here) install.packages(labdsv) install.packages(vegan)

Now, let's load the package.

lapply(required_packages, library, character.only = TRUE)

Similarly, if you're not used to it, you can read them individually.

library(here) library(labdsv) library(vegan)

Next, specify the directory. You can specify the directory using an absolute path, but it's easier to specify the currently used workspace as the working directory.herePackagehere()I am using a function.here()The function allows you to share files on cloud storage such as Google Drive, or to anotherPCIt's convenient because the path remains valid even after migrating to a newer version.

However, it's important to note that the here() function retrieves the workspace path, so depending on your environment (in my case, VS Code), it may not specify the path above the currently open .R file. Therefore, I have to manually enter the path name one level up here. This is something that could be improved.

getwd() script_directory <- here("./Example of laboratory analysis and community structure analysis of soil animals") setwd(script_directory)

and,.csvThe file will be read. One thing to note is....csvThe character encoding of the file isUTFIt must be -8. This specification seems to have been added in a recent version, and I had trouble when trying to run it with older R code. (This might be because IVS (This might be because it's being executed in code.)

On PCs running Windows , creating a .csv file from an .xlsx file using Microsoft Excel or similar software will result in a .csv file with Shift- JIS ( ANSI ) character encoding.

Therefore, when you use "Save As," choose the option that says " CSV UTF -8" instead of just " CSV ." This is a very easy mistake to make.

While it's technically possible to read .csv files using Shift- JIS encoding by setting the argument of read.csv() function to " fileEncoding = "Shift-JIS" ", Shift- JIS is an older standard, and UTF -8 is currently the backward-compatible character encoding. Therefore, using .csv files in their original format is not recommended.

In the following example, we've specified " fileEncoding = "UTF-8" just to be safe.

soil_data <- read.csv("Soil Animal Population Data.csv", header = TRUE, row.names = 1, fileEncoding = "UTF-8")

Data loading is now complete.

Common problems when reading .csv files include the orientation of columns and rows, and the presence or absence of labels.

The .csv file being imported in this example actually has the columns and rows reversed compared to the table above (the "Gender" row has also been removed). If you are using Microsoft Excel , you can easily reverse the rows and columns by right-clicking and selecting "Transpose" from the "Paste Options".

Also, if both the column and row have labels, specify " header = TRUE, row.names = 1 ".

How can I calculate Shannon's diversity index and Simpson's diversity index using R?

First, let's calculate the various biodiversity indices.

To calculate and view the Shannon diversity index, use the following code.

shannon_diversity <- diversity(soil_data, index = "shannon") shannon_diversity

The following values were obtained this time.

A(May)B (May)C(May)D (May)A (August)B (August)C (August)D (August)
3.0814132.7227612.7395022.7013952.8206751.9581882.8295262.496044
Shannon's diversity index obtained from soil animal community data in a certain region | © 2021-2026 Ecological Information Kenichi Ikeda

To calculate and view the Simpson diversity index, use the following code.

simpson_diversity <- diversity(soil_data, index = "simpson") simpson_diversity

The following values were obtained this time.

A(May)B (May)C(May)D (May)A (August)B (August)C (August)D (August)
0.93199030.90616340.88243650.89744330.89550170.74206090.93037040.8756084
Simpson's diversity index obtained from soil animal community data in a certain region | © 2021-2026 Ecological Information Kenichi Ikeda

In all diversity indices, A (May) showed the highest diversity, while B (August) showed the lowest. Generally, one might expect insects to increase in number and population in August, but this was a surprising result.

How do I plot NMDS using R?

Next, let's perform Nonmetric Multidimensional Scaling ( NMDS ), a type of multivariate analysis. NMDS is a technique that places data into a low-dimensional space to reflect the similarity or distance between individual data points within a dataset.

In short, in this case, the closer the plots are in the diagram, the closer the biological community structure is.

To plot non-metric multidimensional scaling in R, use the following code:

soil_data_matrix <- data.matrix(soil_data) # Convert to matrix mds_result <- metaMDS(soil_data_matrix, dist = "bray", autotransform = FALSE) mds_result ordipointlabel(mds_result, display = "sites", cex = 1, col = "black", pch = 16)

As a result, the following figure was obtained.

Results of Non-metric Multidimensional Scaling (NMDS) analysis obtained from soil animal community data in a certain region
Results of Non-metric Multidimensional Scaling (NMDS) analysis obtained from soil animal community data in a certain region | © 2021-2026 Ecological Information Kenichi Ikeda

The NMDS suggests seasonal influences on soil animal communities. While soil animals are generally considered less susceptible to seasonal changes, the increase in insects during the summer months likely played a role in this study. Site-specific influences are also evident, with site D showing significantly different results. Although there is no environmental information available for site D in this data, it is possible that the environment there undergoes significant seasonal changes. If quantitative environmental data were available, it would be possible to plot a more comprehensive evaluation of those influences.

How do I perform cluster analysis using R?

Next is cluster analysis. Cluster analysis is a statistical method that groups individual data points in a dataset based on similar characteristics or attributes, allowing us to understand the patterns and structures within the data. In this case, we will group the points in order of their closeness in terms of crowd structure. Cluster analysis and plotting can be performed using the following code.

This analysis uses the Bray-Curtis dissimilarity index.

veg_dist <- vegdist(soil_data_matrix, method = "bray") veg_dist hierarchical_clustering <- hclust(veg_dist, method = "ward.D2") hierarchical_clustering plot(hierarchical_clustering, hang = -1)

As a result, the following figure was obtained.

Results of cluster analysis obtained from soil animal community data in a certain region
Results of cluster analysis obtained from soil animal community data in a certain region | © 2021-2026 Ecological Information Kenichi Ikeda

Silhouette analysis is performed using the following code. This method evaluates the results of clustering and measures how densely each data point is within its cluster. Silhouette analysis is one method for evaluating the validity of clustering and is useful in determining the number and shape of clusters.

In short, it's a handy tool that helps you avoid arbitrary group divisions and find meaningful criteria for dividing groups.

silhouette_vals <- silhouette(optimal_cluster, veg_dist) rownames(silhouette_vals) <- row.names(soil_data) plot(silhouette_vals)

As a result, the following diagram was obtained, and it was determined that dividing the data into two clusters (Cluster A (August) and Cluster D (August) and the other clusters) was optimal.

Results of silhouette analysis obtained from soil animal community data in a certain region
Silhouette analysis results obtained from soil animal community data in a certain region | © 2021-2026 Ecological Information Kenichi Ikeda

cutree() function allows you to determine which group each location belongs to.

A(May)B (May)C(May)D (May)A (August)B (August)C (August)D (August)
11112112
Group affiliations at each location obtained from soil animal community data in a certain region | © 2021-2026 Ecological Information Kenichi Ikeda

In this particular cluster analysis example, there is very little information available about the ecology of each soil animal species, and environmental information such as the vegetation at the site is also lacking, making it difficult to draw conclusions. However, if this information is also included, a deeper analysis will be possible.

How can I perform indicator type analysis using R?

Finally, we perform an indicator species analysis based on cluster analysis. This can be done with the following code. Using the PAM method, we determine the optimal number of clusters, cluster the data, and identify important species for each cluster based on the clustering results. Then, we collect information on the identified important species (cluster, species, p-value, frequency of occurrence), summarize the important species for each cluster, and summarize the results.

pam_result <- pam(veg_dist, k = which.max(average_silhouette), diss = TRUE) ind_val_analysis <- indval(soil_data, pam_result$clustering, numitr = 5000) significant_groups <- ind_val_analysis$maxcls[ind_val_analysis$pval <= 0.05] significant_species <- ind_val_analysis$indcls[ind_val_analysis$pval <= 0.05] p_values <- ind_val_analysis$pval[ind_val_analysis$pval <= 0.05] frequencies <- apply(soil_data > 0, 2, sum)[ind_val_analysis$pval <= 0.05] significant_results <- data.frame(group = significant_groups, indval = significant_species, pvalue = p_values, freq = frequencies) significant_results_summary <- significant_results[order(significant_results$group, -significant_results$indval), ] significant_results_summary

Running this code will produce the following table.

groupindvalpvaluefreq
Sacrum purpureus110.03786
Pseudachorutes sp.210.03742
Results of indicator species analysis obtained from soil animal community data in a certain region | © 2021-2026 Ecological Information Kenichi Ikeda

This revealed that clusters A (August) and D (August) are indicated by the genus Pseudachorutes , while the other clusters are indicated by Pseudachorutes. Unfortunately, there is a lack of ecological information on these species as well, but a deeper analysis would be possible if we were to perform the analysis on species whose ecology is well known.

These analyses are possible for various biological communities, including insects, vertebrates, plants, aquatic organisms, and bacteria.

References

Doi, Hideyuki & Okamura, Hiroshi. (2011). Similarity and its applications for biological community analysis: Calculation, graphing, and testing of similarity using R. Journal of the Ecological Society of Japan , 61 (1), 3-20. https://doi.org/10.18960/seitai.61.1_3

Takada, Yoshitake. (2018). Analysis of community composition using R. Fisheries Research and Education Agency, Japan Sea Fisheries Research Institute . https://jsnfri.fra.affrc.go.jp/gunshu/index.html

Copied title and URL