# 回归分析

R语言回归分析相关流程

### 数据预处理

* 中心化标准化

  ```
  data <- as.data.frame(scale(data, center = T, scale = T))    # 标准化
  data <- as.data.frame(scale(data, center = T, scale = F))    # 中心化
  ```

### 变量相关性检验

* 使用`Hmisc`

  ```
  library(Hmisc)
  rcorr(as.matrix(data), type = 'pearson')
  R <- rcorr(as.matrix(data), type = 'pearson')$r  # 相关系数阵
  P <- rcorr(as.matrix(data), type = 'pearson')$P  # 相关性检验的P值
  ```
* 使用`psych`

  ```
  library(psych)
  corr.test(data,use="complete")
  ```
* 热力图

  ```
  library(Hmisc)
  R <- rcorr(as.matrix(data), type = 'pearson')$r # 相关阵
  library(corrplot)
  corrplot(R, method="color", addCoef.col='grey', tl.col='black') # 热力图
  ```

  ![image-20221124123849236](https://euclid-picgo.oss-cn-shenzhen.aliyuncs.com/image/image-20221124123849236.png)
* 使用ggplot绘制热图

  ```
  library(Hmisc)
  R <- rcorr(as.matrix(data), type = 'pearson')$r # 相关阵
  library(reshape2)  
  melted_r <- melt(R)  # 将相关系数矩阵拉直
  library(ggplot2)
  ggplot(data = melted_r, aes(x=Var1,y=Var2,fill = value)) + # ggplot2 基本图层
    geom_raster() + # 绘制热力图
    scale_fill_gradient2(low = rgb(10,8,32,max = 255),mid = rgb(165,24,90,max = 255),
                         high = rgb(249,228,209,max = 255),limit = c(0,1),
                         name="Pearson\nCorrelation") + 
    theme( # 调整图表样式
      axis.title = element_blank(), # 删除ggplot2的坐标标签
      panel.background = element_blank(), # 删除ggplot2的灰色背景
      panel.grid.major = element_blank(), # 删除ggplot2的面板网格
      panel.border = element_blank(),
      axis.ticks = element_blank() # 删除ggplot2的轴刻度
    ) + 
    geom_text(aes(Var2,Var1,label = round(value,2)),
              color = '#228B22',size = 2) # 加上相关系数数值
  ```

  ![image-20221124124310441](https://euclid-picgo.oss-cn-shenzhen.aliyuncs.com/image/image-20221124124310441.png)

### 偏相关系数

在多要素所构成的系统中，当研究某一个要素对另一个要素的影响或相关程度时，把其他要素的影响视作常数（保持不变），即暂时不考虑其他要素影响，单独研究两个要素之间的相互关系的密切程度，所得数值结果为偏相关系数

* 计算偏相关系数并绘图

  ```
  library(corpcor)
  cor2pcor(R) # 计算得到偏相关阵
  library(corrplot)
  corrplot(cor2pcor(R), method="color", addCoef.col='grey', tl.col='black') # 热力图
  ```

  ![偏相关系数图](https://euclid-picgo.oss-cn-shenzhen.aliyuncs.com/image/202211261537897.png)

### 建立模型

* 最小二乘回归

  ```
  model <- lm(y~x1+x2+x3+x4+x5, data)     # 最小二乘回归
  #model <- lm(y~0+x1+x2+x3+x4+x5, data)   # 无截距项最小二乘回归
  fitted(model)                           # 拟合值
  model$coefficients                      # 回归系数
  model$residuals                         # 残差
  ```
* 计算杠杆值

  ```
  hii <- hatvalues(mo)                                # 杠杆值
  ```

  ![查看杠杆值(按照X大小顺序排序)](https://euclid-picgo.oss-cn-shenzhen.aliyuncs.com/image/image-20221124103158877.png)
* 计算模型残差

  ```
  SSE <- sum(mo$residuals^2)                          # 残差平方和
  sqrt(SSE/(length(data$x)-2))
  ```

### 回归模型检验

* 显著性检验

  ```
  summary(model)$r.squared         # R-square
  summary(model)$adj.r.squared     # Adjusted R-square
  sum((result - mean(result))^2)   # SSR
  sum((data$y - result)^2)         # SSE
  sum((data$y - mean(data$y))^2)   # SST, and SST = SSR + SSE
  plot(fitted(model), model$residuals, pch = 19, xlab = "fitted values",ylab = "residuals") # 残差图
  ```

  ![image-20221124124627132](https://euclid-picgo.oss-cn-shenzhen.aliyuncs.com/image/image-20221124124627132.png)

### 模型分析

* 点预测

  ```
  mo <- lm(y~x,data)                                                  # 最小二乘回归
  none_interval <- predict.lm(mo, data, interval="none")              # 计算点预测值
  ```
* 区间预测

  ```
  confidence_interval <- predict.lm(mo, data, interval="confidence")  # 计算置信区间预测值
  prediction_interval <- predict.lm(mo, data, interval="prediction")  # 计算预测区间预测值
  ```
* 绘制区间图

  ```
  # 对齐数据顺序
  data_order <- data[order(data$x),]
  confidence_interval_order <- confidence_interval[order(data$x),]
  prediction_interval_order <- prediction_interval[order(data$x),]
  ```

  开始绘图

  ```
  plot(data$x, data$y, xlab="x", ylab="y", pch = 16)  # 绘制轮廓
  polygon(c(data_order$x,rev(data_order$x)),c(prediction_interval_order[,2],rev(prediction_interval_order[,3])),col = rgb(221,234,243,max = 255),border = NA)
  polygon(c(data_order$x,rev(data_order$x)),c(confidence_interval_order[,2],rev(confidence_interval_order[,3])),col = "gray",border = NA)  # 其中polygon为绘制多边形的函数，rev为排序函数
  abline(mo, col="red", lwd=2)  
  points(data$x, data$y, pch = 16) 
  ```

  ![置信区间和预测区间](https://euclid-picgo.oss-cn-shenzhen.aliyuncs.com/image/image-20221124103249152.png)

### 预测相关

* 新值预测

  ```
  new <- data.frame(x = 3.5)  # 生成新的数据
  predict.lm(mo, new, interval="prediction")  # 点预测+预测上下界
  predict.lm(mo, new, interval="confidence")  # 点预测+置信上下界
  ```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://euclid-jie.gitbook.io/euclid_jie_s_book/r-yu-yan-cao-zuo/r-hui-gui-fen-xi.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
