Document complémentaire au module 7 du cours SDD II de 2025-2026. Distribué sous licence CC BY-NC-SA 4.0.
Veuillez vous référer au cours en ligne pour les explications et les interprétations de cette analyse.
Installer un environnement R adéquat pour reproduire cette analyse.
# Chargement du dialecte SciViews::R avec le module d'exploration des données
SciViews::R("explore", lang = "fr")
# Lecture du jeu de données depuis le package mlbench
pima <- read("PimaIndiansDiabetes2", package = "mlbench")
pima## # A data.trame: [768 × 9]
## pregnant glucose pressure triceps insulin mass pedigree age diabetes
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
## 1 6 148 72 35 NA 33.6 0.627 50 pos
## 2 1 85 66 29 NA 26.6 0.351 31 neg
## 3 8 183 64 NA NA 23.3 0.672 32 pos
## 4 1 89 66 23 94 28.1 0.167 21 neg
## 5 0 137 40 35 168 43.1 2.29 33 pos
## 6 5 116 74 NA NA 25.6 0.201 30 neg
## 7 3 78 50 32 88 31 0.248 26 pos
## 8 10 115 NA NA NA 35.3 0.134 29 neg
## 9 2 197 70 45 543 30.5 0.158 53 pos
## 10 8 125 96 NA NA NA 0.232 54 pos
## # ℹ 758 more rows
## # A data.trame: [392 × 9]
## pregnant glucose pressure triceps insulin mass pedigree age diabetes
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
## 1 1 89 66 23 94 28.1 0.167 21 neg
## 2 0 137 40 35 168 43.1 2.29 33 pos
## 3 3 78 50 32 88 31 0.248 26 pos
## 4 2 197 70 45 543 30.5 0.158 53 pos
## 5 1 189 60 23 846 30.1 0.398 59 pos
## 6 5 166 72 19 175 25.8 0.587 51 pos
## 7 0 118 84 47 230 45.8 0.551 31 pos
## 8 1 103 30 38 83 43.3 0.183 33 neg
## 9 1 115 70 30 96 34.6 0.529 32 pos
## 10 3 126 88 41 235 39.3 0.704 27 neg
## # ℹ 382 more rows
| Name | pima |
| Number of rows | 392 |
| Number of columns | 9 |
| _______________________ | |
| Column type frequency: | |
| factor | 1 |
| numeric | 8 |
| ________________________ | |
| Group variables | None |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| diabetes | 0 | 1 | FALSE | 2 | neg: 262, pos: 130 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| pregnant | 0 | 1 | 3.30 | 3.21 | 0.00 | 1.00 | 2.00 | 5.00 | 17.00 | ▇▂▂▁▁ |
| glucose | 0 | 1 | 122.63 | 30.86 | 56.00 | 99.00 | 119.00 | 143.00 | 198.00 | ▂▇▇▃▂ |
| pressure | 0 | 1 | 70.66 | 12.50 | 24.00 | 62.00 | 70.00 | 78.00 | 110.00 | ▁▂▇▆▁ |
| triceps | 0 | 1 | 29.15 | 10.52 | 7.00 | 21.00 | 29.00 | 37.00 | 63.00 | ▅▇▇▃▁ |
| insulin | 0 | 1 | 156.06 | 118.84 | 14.00 | 76.75 | 125.50 | 190.00 | 846.00 | ▇▂▁▁▁ |
| mass | 0 | 1 | 33.09 | 7.03 | 18.20 | 28.40 | 33.20 | 37.10 | 67.10 | ▃▇▃▁▁ |
| pedigree | 0 | 1 | 0.52 | 0.35 | 0.09 | 0.27 | 0.45 | 0.69 | 2.42 | ▇▃▁▁▁ |
| age | 0 | 1 | 30.86 | 10.20 | 21.00 | 23.00 | 27.00 | 36.00 | 81.00 | ▇▂▁▁▁ |
# Calcul de la matrice de corrélation, tableau formaté et graphique
pima_cor <- correlation(pima[, 2:8])
tabularise(pima_cor, digits = 2)Matrice de coefficients de corrélation de Pearson r | |||||||
|---|---|---|---|---|---|---|---|
| glucose | pressure | triceps | insulin | mass | pedigree | age |
glucose | 1.000 | 0.2100 | 0.199 | 0.5812 | 0.2095 | 0.140 | 0.3436 |
pressure | 0.210 | 1.0000 | 0.233 | 0.0985 | 0.3044 | -0.016 | 0.3000 |
triceps | 0.199 | 0.2326 | 1.000 | 0.1822 | 0.6644 | 0.160 | 0.1678 |
insulin | 0.581 | 0.0985 | 0.182 | 1.0000 | 0.2264 | 0.136 | 0.2171 |
mass | 0.210 | 0.3044 | 0.664 | 0.2264 | 1.0000 | 0.159 | 0.0698 |
pedigree | 0.140 | -0.0160 | 0.160 | 0.1359 | 0.1588 | 1.000 | 0.0850 |
age | 0.344 | 0.3000 | 0.168 | 0.2171 | 0.0698 | 0.085 | 1.0000 |
# ACP sur pima avec standardisation des données
pima_pca <- pca(data = pima, ~ glucose + pressure + triceps + insulin + mass +
pedigree + age, scale = TRUE)
# Idem, mais avec élimination des colonnes non utilisées auparavant
pima %>.%
sselect(., glucose:age) %>.%
pca(., scale = TRUE) %->%
pima_pca
# Résumé de notre ACP
summary(pima_pca)## Importance of components (eigenvalues):
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Variance 2.412 1.288 1.074 0.878 0.6389 0.399 0.3098
## Proportion of Variance 0.345 0.184 0.153 0.126 0.0913 0.057 0.0443
## Cumulative Proportion 0.345 0.529 0.682 0.807 0.8988 0.956 1.0000
##
## Loadings (eigenvectors, rotation matrix):
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## glucose 0.441 0.455 -0.198 -0.736
## pressure 0.329 -0.101 -0.613 0.206 -0.654 0.171
## triceps 0.439 -0.488 0.367 0.644
## insulin 0.402 0.418 0.263 -0.388 -0.123 0.642 0.129
## mass 0.446 -0.506 -0.181 -0.711
## pedigree 0.198 0.625 0.711 -0.251
## age 0.325 0.337 -0.384 0.471 0.592 0.168 -0.179
# Graphique de l'espace des variables pour les deux premiers axes de l'ACP
chart$loadings(pima_pca, choices = c(1, 2))# Graphique de l'espace des variables pour les axes PC1 et PC3
chart$loadings(pima_pca, choices = c(1, 3))# Graphique de l'espace des variables pour les axes PC2 et PC3
chart$loadings(pima_pca, choices = c(2, 3))# Graphique de l'espace des individus pour les deux premeirs axes de l'ACP
chart$scores(pima_pca, choices = c(1, 2), aspect.ratio = 3/5)# Idem, mais avec labels et ellipses
chart$scores(pima_pca, choices = c(1, 2),
labels = pima$diabetes) +
stat_ellipse()# Graphique de l'espace des individus pour PC1 et PC3 + labels et ellipses
chart$scores(pima_pca, choices = c(1, 3),
labels = pima$diabetes) +
stat_ellipse()# Graphique de l'espace des individus pour PC2 et PC3 + labels et ellipses
chart$scores(pima_pca, choices = c(2, 3),
labels = pima$diabetes) +
stat_ellipse()# Récupération des données depuis le package data.io
urchin <- read("urchin_bio", package = "data.io", lang = "FR")
urchin## # A data.trame: [421 × 19]
## origin diameter1 diameter2 height buoyant_weight weight solid_parts integuments dry_integuments digestive_tract
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Pêcherie 9.9 10.2 5 NA 0.522 0.478 0.366 NA 0.0525
## 2 Pêcherie 10.5 10.6 5.7 NA 0.642 0.589 0.445 NA 0.0482
## 3 Pêcherie 10.8 10.8 5.2 NA 0.734 0.677 0.533 NA 0.0758
## 4 Pêcherie 9.6 9.3 4.6 NA 0.370 0.344 0.266 NA 0.0442
## 5 Pêcherie 10.4 10.7 4.8 NA 0.610 0.559 0.406 NA 0.0743
## 6 Pêcherie 10.5 11.1 5 NA 0.610 0.551 0.427 NA 0.0492
## 7 Pêcherie 11 11 5.2 NA 0.672 0.605 0.452 NA 0.0734
## 8 Pêcherie 11.1 11.2 5.7 NA 0.703 0.628 0.484 NA 0.06
## 9 Pêcherie 9.4 9.2 4.6 NA 0.413 0.375 0.277 NA 0.0345
## 10 Pêcherie 10.1 9.5 4.7 NA 0.449 0.398 0.212 NA 0.0441
## # ℹ 411 more rows
## # ℹ 9 more variables: dry_digestive_tract <dbl>, gonads <dbl>, dry_gonads <dbl>, skeleton <dbl>, lantern <dbl>, test <dbl>,
## # spines <dbl>, maturity <int>, sex <fct>
# Elimination des variables inutilisées et des données manquantes
urchin %>.%
sselect(., -(skeleton:spines), -sex) %>.%
sdrop_na(.) ->
urchin2
urchin2## # A data.trame: [319 × 14]
## origin diameter1 diameter2 height buoyant_weight weight solid_parts integuments dry_integuments digestive_tract
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Pêcherie 16.7 16.8 8.4 0.588 2.58 2.04 1.77 1.06 0.0644
## 2 Pêcherie 19.9 20 9.2 1.10 4.26 3.66 3.11 1.95 0.217
## 3 Pêcherie 19.9 19.2 8.5 0.629 2.93 2.43 2.13 1.17 0.115
## 4 Pêcherie 19.3 19.8 10.2 0.781 3.71 3.09 2.52 1.45 0.179
## 5 Pêcherie 18.8 20 9.3 0.761 3.59 2.99 2.4 1.41 0.282
## 6 Pêcherie 21.5 20.9 9.6 1.13 4.98 4.42 3.47 2.04 0.466
## 7 Pêcherie 17.4 16.5 7.8 0.477 2.33 1.97 1.57 0.89 0.0785
## 8 Pêcherie 21 21.2 10.8 1.23 5.4 4.55 3.81 2.21 0.358
## 9 Pêcherie 17.8 18.8 8.6 0.548 2.58 2.07 1.71 1.05 0.120
## 10 Pêcherie 19.7 19.6 9.7 0.862 3.59 3.08 2.51 1.59 0.154
## # ℹ 309 more rows
## # ℹ 4 more variables: dry_digestive_tract <dbl>, gonads <dbl>, dry_gonads <dbl>, maturity <int>
| Name | urchin2 |
| Number of rows | 319 |
| Number of columns | 14 |
| _______________________ | |
| Column type frequency: | |
| factor | 1 |
| numeric | 13 |
| ________________________ | |
| Group variables | None |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| origin | 0 | 1 | FALSE | 2 | Cul: 188, Pêc: 131 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| diameter1 | 0 | 1 | 32.78 | 11.71 | 14.60 | 23.25 | 31.50 | 39.65 | 65.60 | ▇▇▅▃▁ |
| diameter2 | 0 | 1 | 32.71 | 11.67 | 15.00 | 23.45 | 31.60 | 39.60 | 65.60 | ▇▇▆▃▁ |
| height | 0 | 1 | 16.78 | 6.25 | 7.30 | 11.10 | 16.20 | 21.50 | 32.20 | ▇▆▆▅▂ |
| buoyant_weight | 0 | 1 | 4.27 | 3.84 | 0.31 | 1.35 | 3.18 | 5.67 | 17.73 | ▇▃▁▁▁ |
| weight | 0 | 1 | 21.80 | 21.37 | 1.61 | 6.08 | 15.25 | 28.14 | 100.51 | ▇▂▁▁▁ |
| solid_parts | 0 | 1 | 16.52 | 15.27 | 1.46 | 4.96 | 11.73 | 21.69 | 73.14 | ▇▃▁▁▁ |
| integuments | 0 | 1 | 12.32 | 10.64 | 1.09 | 4.00 | 9.40 | 16.08 | 47.22 | ▇▃▂▁▁ |
| dry_integuments | 0 | 1 | 7.16 | 6.30 | 0.58 | 2.22 | 5.42 | 9.42 | 28.80 | ▇▃▂▁▁ |
| digestive_tract | 0 | 1 | 1.90 | 2.03 | 0.03 | 0.45 | 1.21 | 2.54 | 10.37 | ▇▂▁▁▁ |
| dry_digestive_tract | 0 | 1 | 0.23 | 0.21 | 0.01 | 0.07 | 0.17 | 0.31 | 1.02 | ▇▃▂▁▁ |
| gonads | 0 | 1 | 1.72 | 2.65 | 0.00 | 0.10 | 0.63 | 2.20 | 15.93 | ▇▁▁▁▁ |
| dry_gonads | 0 | 1 | 0.51 | 0.82 | 0.00 | 0.03 | 0.17 | 0.64 | 5.00 | ▇▁▁▁▁ |
| maturity | 0 | 1 | 0.37 | 0.71 | 0.00 | 0.00 | 0.00 | 0.00 | 2.00 | ▇▁▁▁▂ |
# Matrice de corrélation, tableau formaté et graphique
urchin2_cor <- correlation(urchin2[, 2:13])
tabularise(urchin2_cor)Matrice de coefficients de corrélation de Pearson r | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| diameter1 | diameter2 | height | buoyant_weight | weight | solid_parts | integuments | dry_integuments | digestive_tract | dry_digestive_tract | gonads | dry_gonads |
diameter1 | 1.000 | 0.998 | 0.976 | 0.952 | 0.956 | 0.956 | 0.968 | 0.956 | 0.910 | 0.932 | 0.798 | 0.788 |
diameter2 | 0.998 | 1.000 | 0.974 | 0.952 | 0.957 | 0.957 | 0.969 | 0.956 | 0.912 | 0.933 | 0.798 | 0.788 |
height | 0.976 | 0.974 | 1.000 | 0.928 | 0.924 | 0.926 | 0.944 | 0.934 | 0.875 | 0.909 | 0.759 | 0.747 |
buoyant_weight | 0.952 | 0.952 | 0.928 | 1.000 | 0.986 | 0.992 | 0.994 | 0.999 | 0.921 | 0.939 | 0.882 | 0.868 |
weight | 0.956 | 0.957 | 0.924 | 0.986 | 1.000 | 0.994 | 0.991 | 0.986 | 0.952 | 0.955 | 0.882 | 0.875 |
solid_parts | 0.956 | 0.957 | 0.926 | 0.992 | 0.994 | 1.000 | 0.994 | 0.991 | 0.947 | 0.955 | 0.907 | 0.897 |
integuments | 0.968 | 0.969 | 0.944 | 0.994 | 0.991 | 0.994 | 1.000 | 0.996 | 0.934 | 0.946 | 0.867 | 0.854 |
dry_integuments | 0.956 | 0.956 | 0.934 | 0.999 | 0.986 | 0.991 | 0.996 | 1.000 | 0.920 | 0.938 | 0.874 | 0.860 |
digestive_tract | 0.910 | 0.912 | 0.875 | 0.921 | 0.952 | 0.947 | 0.934 | 0.920 | 1.000 | 0.980 | 0.806 | 0.807 |
dry_digestive_tract | 0.932 | 0.933 | 0.909 | 0.939 | 0.955 | 0.955 | 0.946 | 0.938 | 0.980 | 1.000 | 0.816 | 0.818 |
gonads | 0.798 | 0.798 | 0.759 | 0.882 | 0.882 | 0.907 | 0.867 | 0.874 | 0.806 | 0.816 | 1.000 | 0.991 |
dry_gonads | 0.788 | 0.788 | 0.747 | 0.868 | 0.875 | 0.897 | 0.854 | 0.860 | 0.807 | 0.818 | 0.991 | 1.000 |
# Graphique de la masse totle en fonction du diamètre 1
chart(data = urchin2, weight ~ diameter1) +
geom_point()# Graphique de la masse totle en fonction du diamètre 1, avec double log
chart(data = urchin2, log(weight) ~ log(diameter1)) +
geom_point()# ACP sur données transformées log(x + 1) et standardisation
urchin2 %>.%
sselect(., -origin, -maturity) %>.% # Élimine les variables non quantitatives
log1p(.) %>.% # Transforme toutes les autres en log(x + 1)
pca(data = ., ~., scale = TRUE) -> # Effectue l'ACP après standardisation
urchin2_pca
# Résumé de l'ACP
summary(urchin2_pca)## Importance of components (eigenvalues):
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12
## Variance 11.219 0.5010 0.1813 0.03862 0.02601 0.01657 0.00931 0.00336 0.00210 0.00108 0.00082 0.00034
## Proportion of Variance 0.935 0.0418 0.0151 0.00322 0.00217 0.00138 0.00078 0.00028 0.00017 0.00009 0.00007 0.00003
## Cumulative Proportion 0.935 0.9767 0.9918 0.99503 0.99720 0.99858 0.99936 0.99964 0.99981 0.99990 0.99997 1.00000
##
## Loadings (eigenvectors, rotation matrix):
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12
## diameter1 0.295 -0.162 -0.441 0.237 0.174 -0.177 0.242 -0.706
## diameter2 0.295 -0.166 -0.449 0.249 0.178 -0.157 0.124 0.688 -0.263
## height 0.291 -0.218 0.154 -0.106 -0.902
## buoyant_weight 0.296 0.120 0.509 0.124 0.530 0.265 0.504
## weight 0.296 -0.149 -0.145 0.116 0.912
## solid_parts 0.297 -0.106 0.127 -0.234 -0.594 -0.216 0.638
## integuments 0.296 -0.159 0.157 0.153 0.125 -0.396 -0.148 -0.702 0.371
## dry_integuments 0.296 -0.115 0.160 0.455 0.114 0.105 -0.128 -0.133 -0.774
## digestive_tract 0.288 -0.571 -0.187 -0.485 0.519 0.201
## dry_digestive_tract 0.283 -0.702 0.217 0.278 -0.513 -0.161
## gonads 0.271 0.568 0.226 -0.575 -0.430 0.143
## dry_gonads 0.259 0.697 -0.104 0.465 0.453 -0.111
# Graphique de l'espace des variables de l'ACP pour les deux premiers axes (par défaut)
chart$loadings(urchin2_pca)# Nouvelle ACP sur données divisées par la masse immergée
urchin2 %>.%
sselect(., -origin, -maturity, -buoyant_weight) %>.% # Élimination des variables inutiles
(. / urchin2$buoyant_weight) %>.% # Division par buoyant_weight
log1p(.) -> # Transformation log(x + 1)
urchin3
head(urchin3)## # A data.trame: [6 × 11]
## diameter1 diameter2 height weight solid_parts integuments dry_integuments digestive_tract dry_digestive_tract gonads
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 3.38 3.39 2.73 1.68 1.50 1.39 1.03 0.104 0.0267 0.00914
## 2 2.95 2.96 2.24 1.59 1.47 1.35 1.02 0.181 0.0437 0.0402
## 3 3.49 3.45 2.68 1.73 1.58 1.48 1.05 0.168 0.0361 0
## 4 3.25 3.27 2.64 1.75 1.60 1.44 1.05 0.206 0.0476 0.0232
## 5 3.25 3.31 2.58 1.74 1.60 1.42 1.05 0.315 0.0661 0.0289
## 6 3.00 2.97 2.25 1.69 1.59 1.41 1.03 0.346 0.0554 0.0166
## # ℹ 1 more variable: dry_gonads <dbl>
# ACP avec standardisation
urchin3_pca <- pca(data = urchin3, ~., scale = TRUE)
# Résumé de l'ACP
summary(urchin3_pca)## Importance of components (eigenvalues):
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11
## Variance 4.687 3.353 1.273 0.9666 0.3668 0.1724 0.10547 0.04761 0.01943 0.00834 0.00068
## Proportion of Variance 0.426 0.305 0.116 0.0879 0.0333 0.0157 0.00959 0.00433 0.00177 0.00076 0.00006
## Cumulative Proportion 0.426 0.731 0.847 0.9345 0.9678 0.9835 0.99308 0.99741 0.99918 0.99994 1.00000
##
## Loadings (eigenvectors, rotation matrix):
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11
## diameter1 -0.425 0.145 0.297 0.101 0.101 -0.400 -0.714
## diameter2 -0.425 -0.101 0.143 0.296 0.103 -0.425 0.700
## height -0.427 0.131 0.300 0.141 -0.197 0.790
## weight 0.189 -0.428 -0.216 0.497 0.672 0.145 -0.109
## solid_parts -0.495 0.254 0.117 -0.335 0.245 -0.662 0.223
## integuments -0.142 -0.463 0.152 -0.283 0.165 -0.345 -0.667 0.266
## dry_integuments -0.259 -0.242 0.173 -0.533 -0.669 0.161 0.277
## digestive_tract 0.214 -0.370 -0.448 0.102 -0.388 0.462 0.468 -0.148
## dry_digestive_tract -0.360 -0.485 0.401 -0.429 0.338 -0.382 -0.155
## gonads 0.374 0.438 0.257 -0.223 -0.725 -0.130
## dry_gonads 0.371 0.440 0.269 -0.162 0.122 0.446 0.589
# Graphique de l'espace des variables de l'ACP pour les deux premiers axes
chart$loadings(urchin3_pca, choices = c(1, 2))# Graphique de l'espace des variables de l'ACP pour les PC2 et PC3
chart$loadings(urchin3_pca, choices = c(2, 3))# Graphique de l'espace des individus pour PC1 et PC2 avec couleur et ellipses
chart$scores(urchin3_pca, choices = c(1, 2),
col = urchin2$origin, labels = urchin2$maturity, aspect.ratio = 3/5) +
theme(legend.position = "right") +
stat_ellipse()# Graphique de l'espace des individus pour PC2 et PC3 avec couleur et ellipses
chart$scores(urchin3_pca, choices = c(2, 3),
col = urchin2$origin, labels = urchin2$maturity, aspect.ratio = 3/5) +
theme(legend.position = "right") +
stat_ellipse()# Visu 2D : graphique de glucose en fonction d'insuline, couleur par diabète
chart(data = pima, glucose ~ insulin %col=% diabetes) +
geom_point()# Préparation du Quarto/R Markdown pour widget RGL
options(rgl.printRglwidget = TRUE)
library(rgl)
# Need this too now? library(rglwidget)
knitr::knit_hooks$set(webgl = hook_webgl)
# Visu 3D : nuage de points en 3 dimensions avec RGL
rgl::plot3d(pima$insulin, pima$glucose, pima$triceps,
col = as.integer(pima$diabetes))
# Lecture des données depuis le package ca
wg <- read("wg93", package = "ca")
# Tableau présentant les premières et dernières données de wg
tabularise$headtail(wg)A | B | C | D | sex | age | edu |
|---|---|---|---|---|---|---|
2 | 3 | 4 | 3 | 2 | 2 | 3 |
3 | 4 | 2 | 3 | 1 | 3 | 4 |
2 | 3 | 2 | 4 | 2 | 3 | 2 |
2 | 2 | 2 | 2 | 1 | 2 | 3 |
3 | 3 | 3 | 3 | 1 | 5 | 2 |
... | ... | ... | ... | ... | ... | ... |
1 | 3 | 4 | 5 | 1 | 2 | 2 |
3 | 4 | 2 | 4 | 1 | 2 | 3 |
1 | 2 | 2 | 2 | 2 | 3 | 3 |
3 | 4 | 2 | 3 | 1 | 2 | 2 |
1 | 2 | 2 | 2 | 2 | 3 | 6 |
Premières et dernières 5 lignes d'un total de 871 | ||||||
# Recodage des niveaux des variables de wg
wg %>.%
smutate(.,
A = recode(A, `1` = "++", `2` = "+", `3` = "0", `4` = "-", `5` = "--"),
B = recode(B, `1` = "++", `2` = "+", `3` = "0", `4` = "-", `5` = "--"),
C = recode(C, `1` = "++", `2` = "+", `3` = "0", `4` = "-", `5` = "--"),
D = recode(D, `1` = "++", `2` = "+", `3` = "0", `4` = "-", `5` = "--"),
sex = recode(sex, `1` = "H", `2` = "F"),
age = recode(age, `1` = "18-24", `2` = "25-34", `3` = "35-44",
`4` = "45-54", `5` = "55-64", `6` = "65+"),
edu = recode(edu, `1` = "primaire", `2` = "sec. part", `3` = "secondaire",
`4` = "univ. part", `5` = "univ. cycle 1", `6` = "univ. cycle 2")
) -> wg
# Tableau présentant les premières et dernières données de wg réencodé
tabularise$headtail(wg)A | B | C | D | sex | age | edu |
|---|---|---|---|---|---|---|
+ | 0 | - | 0 | F | 25-34 | secondaire |
0 | - | + | 0 | H | 35-44 | univ. part |
+ | 0 | + | - | F | 35-44 | sec. part |
+ | + | + | + | H | 25-34 | secondaire |
0 | 0 | 0 | 0 | H | 55-64 | sec. part |
... | ... | ... | ... | ... | ... | ... |
++ | 0 | - | -- | H | 25-34 | sec. part |
0 | - | + | - | H | 25-34 | secondaire |
++ | + | + | + | F | 35-44 | secondaire |
0 | - | + | 0 | H | 25-34 | sec. part |
++ | + | + | + | F | 35-44 | univ. cycle 2 |
Premières et dernières 5 lignes d'un total de 871 | ||||||
## edu
## B primaire sec. part secondaire univ. part univ. cycle 1 univ. cycle 2
## ++ 6 34 19 6 4 2
## + 10 93 47 12 5 7
## 0 11 95 55 18 11 15
## - 7 112 82 37 16 27
## -- 4 44 39 21 13 19
##
## Pearson's Chi-squared test
##
## data: wg$B and wg$edu
## X-squared = 42.764, df = 20, p-value = 0.002196
# AFC question B versus edu(cation) du jeu de données wg
wg_b_edu <- ca(data = wg, ~ B + edu)
wg_b_edu##
## Principal inertias (eigenvalues):
## 1 2 3 4
## Value 0.043989 0.004191 0.000914 4e-06
## Percentage 89.59% 8.54% 1.86% 0.01%
##
##
## Rows:
## ++ + 0 - --
## Mass 0.081515 0.199770 0.235362 0.322618 0.160735
## ChiDist 0.286891 0.274685 0.095627 0.141176 0.341388
## Inertia 0.006709 0.015073 0.002152 0.006430 0.018733
## Dim. 1 1.140162 1.293374 0.406059 -0.591115 -1.593838
## Dim. 2 2.211930 -0.641674 0.409771 -0.976000 1.034695
##
##
## Columns:
## primaire sec. part secondaire univ. part univ. cycle 1 univ. cycle 2
## Mass 0.043628 0.433984 0.277842 0.107922 0.056257 0.080367
## ChiDist 0.427367 0.164446 0.036882 0.279471 0.341163 0.417941
## Inertia 0.007968 0.011736 0.000378 0.008429 0.006548 0.014038
## Dim. 1 1.722505 0.773101 -0.115183 -1.302120 -1.428782 -1.962905
## Dim. 2 3.523982 -0.381160 -0.336612 -0.235984 2.516238 -0.135512
##
## Principal inertias (eigenvalues):
##
## dim value % cum% scree plot
## 1 0.043989 89.6 89.6 **********************
## 2 0.004191 8.5 98.1 **
## 3 0.000914 1.9 100.0
## 4 4e-06000 0.0 100.0
## -------- -----
## Total: 0.049097 100.0
# Graphique biplot de l'AFC : question B en turquoise, edu(cation) en rouge
chart$biplot(wg_b_edu, choices = c(1, 2))# Lecture des données depuis le package vegan (mite en anglais = acarien)
mite <- read("mite", package = "vegan")
# Description générale des données
skimr::skim(mite)| Name | mite |
| Number of rows | 70 |
| Number of columns | 35 |
| _______________________ | |
| Column type frequency: | |
| numeric | 35 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Brachy | 0 | 1 | 8.73 | 10.08 | 0 | 3.00 | 4.5 | 11.75 | 42 | ▇▁▂▁▁ |
| PHTH | 0 | 1 | 1.27 | 2.17 | 0 | 0.00 | 0.0 | 2.00 | 8 | ▇▂▁▁▁ |
| HPAV | 0 | 1 | 8.51 | 7.56 | 0 | 4.00 | 6.5 | 12.00 | 37 | ▇▅▂▁▁ |
| RARD | 0 | 1 | 1.21 | 2.78 | 0 | 0.00 | 0.0 | 1.00 | 13 | ▇▁▁▁▁ |
| SSTR | 0 | 1 | 0.31 | 0.97 | 0 | 0.00 | 0.0 | 0.00 | 6 | ▇▁▁▁▁ |
| Protopl | 0 | 1 | 0.37 | 1.61 | 0 | 0.00 | 0.0 | 0.00 | 13 | ▇▁▁▁▁ |
| MEGR | 0 | 1 | 2.19 | 3.62 | 0 | 0.00 | 1.0 | 3.00 | 17 | ▇▁▁▁▁ |
| MPRO | 0 | 1 | 0.16 | 0.47 | 0 | 0.00 | 0.0 | 0.00 | 2 | ▇▁▁▁▁ |
| TVIE | 0 | 1 | 0.83 | 1.47 | 0 | 0.00 | 0.0 | 1.00 | 7 | ▇▁▁▁▁ |
| HMIN | 0 | 1 | 4.91 | 8.47 | 0 | 0.00 | 0.0 | 4.75 | 36 | ▇▁▁▁▁ |
| HMIN2 | 0 | 1 | 1.96 | 3.92 | 0 | 0.00 | 0.0 | 2.75 | 20 | ▇▁▁▁▁ |
| NPRA | 0 | 1 | 1.89 | 2.37 | 0 | 0.00 | 1.0 | 2.75 | 10 | ▇▂▁▁▁ |
| TVEL | 0 | 1 | 9.06 | 10.93 | 0 | 0.00 | 3.0 | 19.00 | 42 | ▇▂▃▁▁ |
| ONOV | 0 | 1 | 17.27 | 18.05 | 0 | 5.00 | 10.5 | 24.25 | 73 | ▇▂▁▁▁ |
| SUCT | 0 | 1 | 16.96 | 13.89 | 0 | 7.25 | 13.5 | 24.00 | 63 | ▇▅▂▁▁ |
| LCIL | 0 | 1 | 35.26 | 88.85 | 0 | 1.25 | 13.0 | 44.00 | 723 | ▇▁▁▁▁ |
| Oribatl1 | 0 | 1 | 1.89 | 3.43 | 0 | 0.00 | 0.0 | 2.75 | 17 | ▇▁▁▁▁ |
| Ceratoz1 | 0 | 1 | 1.29 | 1.46 | 0 | 0.00 | 1.0 | 2.00 | 5 | ▇▂▁▁▁ |
| PWIL | 0 | 1 | 1.09 | 1.71 | 0 | 0.00 | 0.0 | 1.00 | 8 | ▇▂▁▁▁ |
| Galumna1 | 0 | 1 | 0.96 | 1.73 | 0 | 0.00 | 0.0 | 1.00 | 8 | ▇▁▁▁▁ |
| Stgncrs2 | 0 | 1 | 0.73 | 1.83 | 0 | 0.00 | 0.0 | 0.00 | 9 | ▇▁▁▁▁ |
| HRUF | 0 | 1 | 0.23 | 0.62 | 0 | 0.00 | 0.0 | 0.00 | 3 | ▇▁▁▁▁ |
| Trhypch1 | 0 | 1 | 2.61 | 6.14 | 0 | 0.00 | 0.0 | 2.00 | 29 | ▇▁▁▁▁ |
| PPEL | 0 | 1 | 0.17 | 0.54 | 0 | 0.00 | 0.0 | 0.00 | 3 | ▇▁▁▁▁ |
| NCOR | 0 | 1 | 1.13 | 1.65 | 0 | 0.00 | 0.5 | 1.75 | 7 | ▇▁▁▁▁ |
| SLAT | 0 | 1 | 0.40 | 1.23 | 0 | 0.00 | 0.0 | 0.00 | 8 | ▇▁▁▁▁ |
| FSET | 0 | 1 | 1.86 | 3.18 | 0 | 0.00 | 0.0 | 2.00 | 12 | ▇▁▁▁▁ |
| Lepidzts | 0 | 1 | 0.17 | 0.54 | 0 | 0.00 | 0.0 | 0.00 | 3 | ▇▁▁▁▁ |
| Eupelops | 0 | 1 | 0.64 | 0.99 | 0 | 0.00 | 0.0 | 1.00 | 4 | ▇▃▁▁▁ |
| Miniglmn | 0 | 1 | 0.24 | 0.79 | 0 | 0.00 | 0.0 | 0.00 | 5 | ▇▁▁▁▁ |
| LRUG | 0 | 1 | 10.43 | 12.66 | 0 | 0.00 | 4.5 | 17.75 | 57 | ▇▃▂▁▁ |
| PLAG2 | 0 | 1 | 0.80 | 1.79 | 0 | 0.00 | 0.0 | 1.00 | 9 | ▇▁▁▁▁ |
| Ceratoz3 | 0 | 1 | 1.30 | 2.20 | 0 | 0.00 | 0.0 | 2.00 | 9 | ▇▂▁▁▁ |
| Oppiminu | 0 | 1 | 1.11 | 1.84 | 0 | 0.00 | 0.0 | 1.75 | 9 | ▇▂▁▁▁ |
| Trimalc2 | 0 | 1 | 2.07 | 5.79 | 0 | 0.00 | 0.0 | 0.00 | 33 | ▇▁▁▁▁ |
## [1] 140 268 186 286 199 209 162 126 123 166 216 213 177 269 100 97 90 118 118 184 117 172 81 80 123 120 173 111 111 96
## [31] 130 93 136 194 111 133 139 189 94 157 81 140 148 60 158 154 121 113 107 148 91 112 145 49 58 108 8 121 90 127
## [61] 42 13 86 88 112 116 781 111 184 121
## Brachy PHTH HPAV RARD SSTR Protopl MEGR MPRO TVIE HMIN HMIN2 NPRA TVEL
## 611 89 596 85 22 26 153 11 58 344 137 132 634
## ONOV SUCT LCIL Oribatl1 Ceratoz1 PWIL Galumna1 Stgncrs2 HRUF Trhypch1 PPEL NCOR SLAT
## 1209 1187 2468 132 90 76 67 51 16 183 12 79 28
## FSET Lepidzts Eupelops Miniglmn LRUG PLAG2 Ceratoz3 Oppiminu Trimalc2
## 130 12 45 17 730 56 91 78 145
# Boites à moustaches parallèles pour mite
mite %>.%
spivot_longer(., everything(), names_to = "species", values_to = "n") %>.%
chart(., species ~ n) +
geom_boxplot() +
labs(x = "Espèces", y = "Observations")# Transformation log(x + 1) pour mite
mite2 <- log1p(as_dtf(mite)) # Utilisons un data.frame pour les noms des lignes
# Ajouter le numéro des stations explicitement comme nom des lignes
rownames(mite2) <- 1:nrow(mite2)
# Boites à moustaches parallèles pour mite transformé log(x + 1)
mite2 %>.%
spivot_longer(., everything(), names_to = "species", values_to = "log_n_1") %>.%
chart(., species ~ log_n_1) +
geom_boxplot() +
labs(x = "Espèces", y = "Logarithme des observations")# Autre visualisation des données tranformées log(x + 1) de mite
mite2 %>.%
smutate(., station = 1:nrow(mite2)) %>.%
spivot_longer(., Brachy:Trimalc2, names_to = "species", values_to = "n") %>.%
chart(., species ~ station %fill=% n) +
geom_raster()# AFC sur données transformées log(x + 1) de mite
mite2_ca <- ca(mite2)
# Résumé de l'AFC
summary(mite2_ca, scree = TRUE, rows = FALSE, columns = FALSE)##
## Principal inertias (eigenvalues):
##
## dim value % cum% scree plot
## 1 0.366209 31.5 31.5 ********
## 2 0.132783 11.4 42.9 ***
## 3 0.072315 6.2 49.1 **
## 4 0.065787 5.7 54.7 *
## 5 0.055872 4.8 59.5 *
## 6 0.048122 4.1 63.7 *
## 7 0.041834 3.6 67.3 *
## 8 0.039072 3.4 70.6 *
## 9 0.032183 2.8 73.4 *
## 10 0.031300 2.7 76.1 *
## 11 0.028809 2.5 78.6 *
## 12 0.026949 2.3 80.9 *
## 13 0.024701 2.1 83.0 *
## 14 0.022729 2.0 84.9
## 15 0.020618 1.8 86.7
## 16 0.018029 1.5 88.3
## 17 0.016836 1.4 89.7
## 18 0.014850 1.3 91.0
## 19 0.014217 1.2 92.2
## 20 0.012553 1.1 93.3
## 21 0.010879 0.9 94.2
## 22 0.010412 0.9 95.1
## 23 0.009703 0.8 96.0
## 24 0.008187 0.7 96.7
## 25 0.007294 0.6 97.3
## 26 0.006689 0.6 97.9
## 27 0.005343 0.5 98.3
## 28 0.004576 0.4 98.7
## 29 0.004020 0.3 99.1
## 30 0.003828 0.3 99.4
## 31 0.002732 0.2 99.6
## 32 0.001955 0.2 99.8
## 33 0.001656 0.1 99.9
## 34 0.000776 0.1 100.0
## -------- -----
## Total: 1.163821 100.0
# Biplot de l'AFC : en turquoise les station, et en rouge, les espèces d'acariens
chart$biplot(mite2_ca, choices = c(1, 2))# Même biplot, mais avec labels plus lisibles grâce à l'option repel = TRUE
chart$biplot(mite2_ca, choices = c(1, 2), repel = TRUE)