Document complémentaire au module 6 du cours SDD I de 2025-2026. Distribué sous licence CC BY-NC-SA 4.0.
Veuillez vous référer au cours en ligne pour les explications et les interprétations de cette analyse.
Installer un environnement R adéquat pour reproduire cette analyse.
# Initie le dialecte SciViews::R avec le module d'inférence
SciViews::R("infer", lang = "fr")
## En statistique, nous appelons cela un **tirage au sort sans remise**. Le résultat est très différent si le premier individu tiré au hasard était remis dans la population et pouvait être éventuellement pris à nouveau au second ou troisième tirage (**tirage au sort avec remise**). Notez aussi que, pour une population de taille infinie ou très grande, les deux types de tirage au sort sont équivalents à celui **avec** remise, car enlever un individu d'une population infinie ne change pas fondamentalement son effectif, donc les probabilités ultérieures.dtx(Portée = 1:4, Probabilité = 1/4) %>.%
chart(., aes()) +
geom_segment(aes(x = Portée, xend = Portée, y = 0, yend = Probabilité)) +
ylab("Probabilité")## Dans le cas de probabilités continues, la probabilité d'un évènement en particulier est **toujours nulle**. Nous pouvons seulement calculer que l'un parmi plusieurs évènements se produise (compris dans un intervalle).
# Probabilité pour le quantile 2.5 de la distribution uniforme [0, 4]
punif(2.5, min = 0, max = 4, lower.tail = TRUE)## [1] 0.625
# Probabilité comprise entre 2 et 2.5 pour la distribution uniforme [0, 4]
punif(2.5, min = 0, max = 4, lower.tail = TRUE) -
punif(2.0, min = 0, max = 4, lower.tail = TRUE)## [1] 0.125
# Quantile pour une probabilité d'1/3 de la distribution uniforme [0, 4]
qunif(1/3, min = 0, max = 4, lower.tail = TRUE)## [1] 1.333333
## <distribution[1]>
## [1] U(0, 4)
## [1] 0.75
## [1] 0.75
## [1] 1.333333
## [1] 1.333333
## [1] 0.25 0.25 0.25
## [1] 0.25 0.25 0.25
## [1] 3.5515874 3.6412791 1.4006474 0.2001286 3.8390097
## [1] 3.5515874 3.6412791 1.4006474 0.2001286 3.8390097
# Initialisation du générateur de nombre pseudo-aléatoires (pour la reproductibilité)
set.seed(946)
# Génération de 10 nombres selon la distribution uniforme [0, 1]
runif(10, min = 0, max = 1) # Série de 10 nombres les mêmes à chaque exécution## [1] 0.6378020 0.7524999 0.5593599 0.6688387 0.8989262 0.5300384 0.1520689 0.9031163 0.2693327 0.6738862
## [1] "U(0, 4)"
## l u
## 1 0 4
## [1] 2
## [1] 1.333333
## <support_region[1]>
## [1] [0,4]
# Graphique de la distribution U
chart(U) +
geom_funfill(fun = dfun(U), from = 1, to = 3) +
annotate("text", x = 2, y = 0.10, label = "P[1, 3]", col = "red")# Graphique de densité de probabilité cumulée de la distribution U
chart$cumulative(U) +
geom_funfill(fun = cdfun(U), from = 1, to = 3)# Distribution normale N(12, 1.5^2)
N1 <- dist_normal(mu = 12, sigma = 1.5) # Arguments mu =, sigma =
N1 # Attention: N(mu, variance) 1.5^2 = 2.2## <distribution[1]>
## [1] N(12, 2.2)
## [1] 14.46728
## [1] 14.46728
# Jeu de données artificiel
set.seed(653643)
df <- dtx(
x = rnorm(100),
y1 = x + rnorm(100, sd = 0.2),
y2 = rnorm(100),
y3 = -x + rnorm(100, sd = 0.2))
# Graphiques
pl <- list(
chart(data = df, y1 ~ x) + geom_point(),
chart(data = df, y2 ~ x) + geom_point(),
chart(data = df, y3 ~ x) + geom_point()
)
combine_charts(pl, ncol = 3L)## [1] 0.9336631
## [1] 0.004064163
## [1] -0.9230542
## [1] 93.36631
## [1] 0.4064163
## [1] -92.30542
## [1] 0.9781819
## [1] 0.9781819
## [1] 0.003939552
## [1] 0.003939552
## [1] -0.979511
## [1] -0.979511
Diamètre à 1,4m [m] | Hauteur [m] | Volume de bois [m^3] |
|---|---|---|
0.211 | 21.3 | 0.292 |
0.218 | 19.8 | 0.292 |
0.224 | 19.2 | 0.289 |
0.267 | 21.9 | 0.464 |
0.272 | 24.7 | 0.532 |
... | ... | ... |
0.444 | 25.0 | 1.577 |
0.455 | 24.4 | 1.651 |
0.457 | 24.4 | 1.458 |
0.457 | 24.4 | 1.444 |
0.523 | 26.5 | 2.180 |
Premières et dernières 5 lignes d'un total de 31 | ||
## Warning in set2(resolve(...)): The object is read-only and cannot be modified. If you have to modify it for a legitimate reason, call the method
## $lock(FALSE) on the object before $set(). Using $lock(FALSE) to modify the object will be enforced in future versions of knitr and this warning
## will become an error.
Matrice de coefficients de corrélation de Pearson r | |||
|---|---|---|---|
| diameter | height | volume |
diameter | 1.000 | 0.519 | 0.967 |
height | 0.519 | 1.000 | 0.597 |
volume | 0.967 | 0.597 | 1.000 |
## Matrix of Pearson's product-moment correlation:
## (calculation uses everything)
## d h v
## diameter 1
## height . 1
## volume B . 1
## attr(,"legend")
## [1] 0 ' ' 0.3 '.' 0.6 ',' 0.8 '+' 0.9 '*' 0.95 'B' 1
## Warning in rep(col, length = length(corr)): partial argument match of 'length' to 'length.out'
## Warning in seq.default(0, 2 * pi, len = npoints): partial argument match of 'len' to 'length.out'
## Warning in seq.default(0, 2 * pi, len = npoints): partial argument match of 'len' to 'length.out'
## Warning in seq.default(0, 2 * pi, len = npoints): partial argument match of 'len' to 'length.out'
## Warning in seq.default(0, 2 * pi, len = npoints): partial argument match of 'len' to 'length.out'
## Warning in seq.default(0, 2 * pi, len = npoints): partial argument match of 'len' to 'length.out'
## Warning in seq.default(0, 2 * pi, len = npoints): partial argument match of 'len' to 'length.out'
## Warning in seq.default(0, 2 * pi, len = npoints): partial argument match of 'len' to 'length.out'
## Warning in seq.default(0, 2 * pi, len = npoints): partial argument match of 'len' to 'length.out'
## Warning in seq.default(0, 2 * pi, len = npoints): partial argument match of 'len' to 'length.out'
# Autre exemple de corrélation sur le zooplancton
zoo <- read("zooplankton", package = "data.io")
zoo %>.%
sselect(., size:density) %>.%
correlation(.) ->
zoo_cor
plot(zoo_cor)## Warning in rep(col, length = length(corr)): partial argument match of 'length' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'length' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## Warning in rep(col, length = length(corr)): partial argument match of 'len' to 'length.out'
## # A data.trame: [6 × 8]
## x1 x2 x3 x4 y1 y2 y3 y4
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 10 10 10 8 8.04 9.14 7.46 6.58
## 2 8 8 8 8 6.95 8.14 6.77 5.76
## 3 13 13 13 8 7.58 8.74 12.7 7.71
## 4 9 9 9 8 8.81 8.77 7.11 8.84
## 5 11 11 11 8 8.33 9.26 7.81 8.47
## 6 14 14 14 8 9.96 8.1 8.84 7.04
# Deux variables d'Anscombe
ans_x <- anscombe[, 1:4]
ans_y <- anscombe[, 5:8]
# Description statistique de X
fmean(ans_x)## x1 x2 x3 x4
## 9 9 9 9
## x1 x2 x3 x4
## 11 11 11 11
## x1 x2 x3 x4
## 3.316625 3.316625 3.316625 3.316625
## y1 y2 y3 y4
## 7.500909 7.500909 7.500000 7.500909
## y1 y2 y3 y4
## 4.127269 4.127629 4.122620 4.123249
## y1 y2 y3 y4
## 2.031568 2.031657 2.030424 2.030579
## [1] 0.8164205 0.8162365 0.8162867 0.8165214
# Graphique d'Anscombe
pl <- list(
chart(data = anscombe, y1 ~ x1) + geom_point(),
chart(data = anscombe, y2 ~ x2) + geom_point(),
chart(data = anscombe, y3 ~ x3) + geom_point(),
chart(data = anscombe, y4 ~ x4) + geom_point()
)
combine_charts(pl)
| diameter | height | volume |
|---|---|---|---|
diameter | 0.00634 | 0.0805 | 0.0358 |
height | 0.08051 | 3.8001 | 0.5414 |
volume | 0.03584 | 0.5414 | 0.2166 |
## Matrix of Spearman's rank correlation rho:
## (calculation uses everything)
## diameter height volume
## diameter 1.000 0.441 0.955
## height 0.441 1.000 0.579
## volume 0.955 0.579 1.000
## Matrix of Kendall's rank correlation tau:
## (calculation uses everything)
## diameter height volume
## diameter 1.000 0.317 0.830
## height 0.317 1.000 0.450
## volume 0.830 0.450 1.000
## Matrix of Pearson's product-moment correlation:
## (calculation uses everything)
## diameter height volume
## diameter 1.000 0.519 0.967
## height 0.519 1.000 0.597
## volume 0.967 0.597 1.000
# Test de corrélation pour trees, variables diameter et volume
cor.test(data = trees, ~ diameter + volume, alternative = "greater")##
## Pearson's product-moment correlation
##
## data: diameter and volume
## t = 20.44, df = 29, p-value < 2.2e-16
## alternative hypothesis: true correlation is greater than 0
## 95 percent confidence interval:
## 0.9394172 1.0000000
## sample estimates:
## cor
## 0.9670023
# Idem, mais mise en forme du tableau avec tabularise()
cor.test(data = trees, ~ diameter + volume, alternative = "greater") |> tabularise()Coefficent de Pearson r (IC:95%) | Valeur de tobs. | Ddl | Valeur sous H0 | Valeur de p | |
|---|---|---|---|---|---|
1.0 (0.9-1.0) | 20.4 | 29 | 0 | 4.55·10-19 | *** |
0 <= '***' < 0.001 < '**' < 0.01 < '*' < 0.05 | |||||
# Test de corrélation entre diameter et height pour trees
cor.test(data = trees, ~ diameter + height, alternative = "greater") |> tabularise()Coefficent de Pearson r (IC:95%) | Valeur de tobs. | Ddl | Valeur sous H0 | Valeur de p | |
|---|---|---|---|---|---|
0.5 (0.3-1.0) | 3.27 | 29 | 0 | 0.0014 | ** |
0 <= '***' < 0.001 < '**' < 0.01 < '*' < 0.05 | |||||
# Idem, mais corrélation de Spearman
trees_cor_test <- cor.test(data = trees, ~ diameter + height,
alternative = "greater", method = "spearman")## Warning in cor.test.default(x = mf[[1L]], y = mf[[2L]], ...): Cannot compute exact p-value with ties
Coefficent de Spearman | Valeur de Sobs. | Valeur sous H0 | Valeur de p | |
|---|---|---|---|---|
0.441 | 2773 | 0 | 0.00653 | ** |
0 <= '***' < 0.001 < '**' < 0.01 < '*' < 0.05 | ||||
# Idem, mais corrélation de Kendall
trees_cor_test <- cor.test(data = trees, ~ diameter + height,
alternative = "greater", method = "kendall")## Warning in cor.test.default(x = mf[[1L]], y = mf[[2L]], ...): Cannot compute exact p-value with ties
Coefficient de Kendall | Valeur de Zobs. | Valeur sous H0 | Valeur de p | |
|---|---|---|---|---|
0.317 | 2.46 | 0 | 0.007 | ** |
0 <= '***' < 0.001 < '**' < 0.01 < '*' < 0.05 | ||||