课程:MA6514 Machine Learning and Data Science(南洋理工大学 NTU) 来源 PDF:
MA6514 2022-2023 Semester 2(April/May 2023)MA6514 2024-2025 Semester 2(April/May 2025)MA6514 2025-2026 Semester 1(November/December 2025)三份均为 闭卷考试,时间 3 小时,含 4 道必答大题。
一、整体对比 / Overall Comparison
| 项目 / Item | 2022-2023 S2 | 2024-2025 S2 | 2025-2026 S1 |
|---|---|---|---|
| 应用背景 / Main scenario | 智能制造、Industry 4.0、航运集装箱 / Smart manufacturing, Industry 4.0, shipping containers | MNC 咨询、航运集装箱、自动驾驶 / MNC consulting, shipping containers, self-driving cars | 智造企业实习、预测性维护、迷宫导航 / Manufacturing internship, predictive maintenance, maze navigation |
| 高频方法 / Repeated methods | PCA、SVM、Bayesian、ANN | PCA、LASSO、SVM、Bayesian、EM、RL | PCA、Mercer、K-Fold、Bayesian mixture、EM、RL |
| 代码考查 / Coding focus | PyMC3 贝叶斯回归、Pandas 清洗 / PyMC3 Bayesian regression, Pandas cleaning | PyMC 代码解释、DQN/SARSA 伪码 / PyMC code reading, DQN/SARSA pseudocode | PCA 预处理、PyMC mixture、RL 场景建模 / PCA preprocessing, PyMC mixture, RL task design |
| 新增重点 / New emphasis | 经典基础题 / Classical fundamentals | SARSA、DQN、MAR 缺失值、EM 填补 / SARSA, DQN, MAR missingness, EM imputation | K-Fold、Mercer 定理、预测性维护与 RL 设计 / K-Fold, Mercer’s theorem, predictive maintenance, RL design |
三套试卷共享的知识骨架是:数据科学基础、降维、监督/非监督学习、贝叶斯建模、神经网络、强化学习。
The shared knowledge backbone across all three papers is data science fundamentals, dimensionality reduction, supervised and unsupervised learning, Bayesian modeling, neural networks, and reinforcement learning.
二、2022-2023 Semester 2(April/May 2023)题目与参考答案 / Questions and Suggested Answers
Q1. Smart Manufacturing vs Intelligent Manufacturing
Q1.1 概念区分 / Conceptual distinction
题目 / Question
描述 Smart Manufacturing 与 Intelligent Manufacturing,并说明两者是否有区别。
Describe Smart Manufacturing and Intelligent Manufacturing, and explain whether they are different.
参考答案 / Suggested Answer
- 智能制造(Smart Manufacturing)强调传感器、网络、自动化和数据分析的整合,使生产过程透明、互联、柔性。 / Smart manufacturing emphasizes the integration of sensors, connectivity, automation, and analytics to make production transparent, connected, and flexible.
- 智能化制造(Intelligent Manufacturing)是在此基础上进一步加入 AI、机器学习和专家系统,使系统具备自感知、自学习、自决策、自优化能力。 / Intelligent manufacturing builds on this by adding AI, machine learning, and expert systems so that the system can sense, learn, decide, and optimize autonomously.
- 二者不是完全对立的概念,更像层级关系:Smart 更偏数字化和互联,Intelligent 更偏自主决策与认知能力。 / They are not opposing ideas but hierarchical ones: smart manufacturing focuses more on digitalization and connectivity, while intelligent manufacturing focuses more on autonomous decision-making and cognition.
Q1.2 Industry 4.0 的四项最显著收益 / Four most significant benefits of Industry 4.0
题目 / Question
列出并解释 Industry 4.0 的 4 项最显著收益,并给出排序。
List, explain, and rank the four most significant benefits of Industry 4.0.
参考答案 / Suggested Answer
- 生产效率提升 / Higher productivity:实时监控、自动调度和流程优化能减少等待、返工和停机。 / Real-time monitoring, automated scheduling, and process optimization reduce waiting time, rework, and downtime.
- 质量一致性提升 / Better quality consistency:在线检测和数据闭环能更快发现偏差并稳定产品质量。 / Inline inspection and closed-loop data analysis detect deviations early and stabilize product quality.
- 预测性维护 / Predictive maintenance:通过振动、温度、功率等数据提前发现故障征兆,减少突发停机。 / Vibration, temperature, and power data can reveal failure signals early and reduce unplanned downtime.
- 柔性与定制化 / Flexibility and customization:可支持小批量、多品种、按需生产。 / It supports small-batch, high-mix, and on-demand production.
Q1.3 Big Data 定义及其在制造中的意义 / Definition of Big Data and its role in manufacturing
题目 / Question
解释 Big Data 的定义,并说明 Smart/Intelligent Manufacturing 为什么依赖大数据。
Explain the definition of Big Data and why smart/intelligent manufacturing depends on it.
参考答案 / Suggested Answer
- Big Data 是指传统工具无法在合理时间内高效采集、存储、处理和分析的大规模、高速度、多样化数据。 / Big Data refers to data so large, fast, and diverse that conventional tools cannot efficiently capture, store, process, or analyze it in reasonable time.
- 常用 5V 描述:Volume、Velocity、Variety、Veracity、Value。 / It is commonly described by the 5Vs: volume, velocity, variety, veracity, and value.
- 制造场景中,设备传感器会持续生成海量时序数据;若没有大数据技术,就无法进行预测性维护、质量追踪、需求预测和排程优化。 / In manufacturing, sensors continuously generate massive time-series data; without big-data technologies, predictive maintenance, quality tracing, demand forecasting, and scheduling optimization cannot be done effectively.
Q2. PCA、LASSO 与 SVM 综合题 / Integrated PCA, LASSO, and SVM question
Q2.1 用词频矩阵做 PCA 鉴定共同作者 / Use PCA on a word-frequency matrix to infer common authorship
题目 / Question
三篇历史散文给出词频矩阵,要求用 PCA 判断是否可能出自共同作者。
Three historical essays are represented by a word-frequency matrix. Use PCA to judge whether they may share a common author.
参考答案 / Suggested Answer
- 先把每篇文章表示成词频向量,再对矩阵做中心化和标准化。 / Represent each essay as a word-frequency vector, then center and standardize the matrix.
- 做 PCA 后观察前两主成分上的投影位置;如果三篇文章在主成分空间中非常接近,说明它们共享相似的写作风格。 / After PCA, inspect the projections on the first two principal components; if the three essays are close in that space, they likely share a similar writing style.
- 若第一主成分解释大部分方差,而三篇文章沿该主成分得分相近,则可推断存在共同作者的可能。 / If the first principal component explains most variance and the essays have similar scores on it, common authorship is plausible.
Q2.2 PCA 预处理中的归一化步骤 / Normalization steps in PCA preprocessing
题目 / Question
描述 PCA 的数据预处理归一化步骤。
Describe the normalization steps used before PCA.
参考答案 / Suggested Answer
- 对每个特征减去均值完成中心化。 / Subtract the mean of each feature to center the data.
- 如果各特征量纲不同,再除以标准差完成标准化。 / If features have different scales, divide by the standard deviation to standardize them.
- 得到标准化矩阵后再计算协方差矩阵或直接做 SVD。 / Use the standardized matrix to compute the covariance matrix or directly apply SVD.
Q2.3 根据 83% 和 16% 方差解释共同作者 / Infer common authorship from 83% and 16% explained variance
题目 / Question
已知前两主成分解释 83% 和 16% 的方差,推断共同作者身份。
Given that the first two principal components explain 83% and 16% of the variance, infer whether there is a common author.
参考答案 / Suggested Answer
- 前两主成分已解释 99% 的总方差,说明二维表示几乎保留了全部信息。 / The first two principal components explain 99% of the total variance, so a 2D representation preserves almost all information.
- 如果三篇文章在 PC1 和 PC2 的得分聚集或呈现相似模式,可以合理认为它们来自同一作者或同一风格群体。 / If the essays cluster or show similar score patterns on PC1 and PC2, it is reasonable to infer a common author or a common stylistic group.
- 其中 PC1 反映主要文风,PC2 反映次要差异,如题材或年代。 / PC1 reflects the dominant writing style, while PC2 captures secondary differences such as topic or era.
Q2.4 Scree Plot 的画法与用途 / How to draw and use a scree plot
题目 / Question
绘制 Scree Plot,并说明其用途。
Draw a scree plot and explain its purpose.
参考答案 / Suggested Answer
- 横轴是主成分编号,纵轴是每个主成分的特征值或方差解释率。 / The x-axis is the component index, and the y-axis is the eigenvalue or explained variance ratio.
- 通过“肘部法”选择主成分数;若题目给出累计方差阈值,也可选累计解释率达到 95% 的最小主成分个数。 / Use the elbow method to choose the number of components, or select the smallest number of components that reaches a target cumulative variance such as 95%.
Q2.5 词频矩阵清洗脚本解释 / Explain the word-frequency cleaning script
题目 / Question
解释给出的 Python 词频矩阵清洗脚本主体。
Explain the core logic of the Python script that cleans the word-frequency matrix.
参考答案 / Suggested Answer
- 先读取文本并按空格分词。 / First read the text and split it into tokens.
- 再统一大小写、去标点、删除无效字符。 / Then normalize case, remove punctuation, and discard invalid tokens.
- 用
Counter统计每篇文章的词频,再构造统一词汇表形成矩阵。 / UseCounterto count word frequencies for each essay, then build a common vocabulary and convert it into a matrix. - 该脚本的目的,是把原始文本转化为可用于 PCA 的数值特征矩阵。 / The purpose of the script is to convert raw text into a numerical feature matrix suitable for PCA.
Q2.6 降维、Pearson 相关系数与非监督降维的区别 / Distinguish dimensionality reduction, Pearson correlation, and unsupervised dimensionality reduction
题目 / Question
辨析 Dimensionality Reduction、Pearson 相关系数与非监督降维。
Differentiate dimensionality reduction, Pearson correlation, and unsupervised dimensionality reduction.
参考答案 / Suggested Answer
- 降维是把高维特征压缩到低维表示的总称。 / Dimensionality reduction is the general process of compressing high-dimensional features into a lower-dimensional representation.
- Pearson 相关系数只刻画两个变量之间的线性相关,不会直接生成新特征。 / Pearson correlation only measures linear association between two variables and does not directly create new features.
- PCA 属于非监督降维方法,因为它不依赖标签,只利用数据的方差结构提取主成分。 / PCA is an unsupervised dimensionality-reduction method because it does not use labels and only exploits the variance structure of the data.
Q2.7 LASSO 与降维的关系 / Relationship between LASSO and dimensionality reduction
题目 / Question
说明 LASSO 特征选择与降维的关系。
Explain the relationship between LASSO feature selection and dimensionality reduction.
参考答案 / Suggested Answer
- LASSO 通过 L1 正则化把部分系数压缩到 0,从而保留少量关键特征。 / LASSO uses L1 regularization to shrink some coefficients exactly to zero, keeping only a small set of important features.
- 因此 LASSO 可视为“特征选择式降维”,而 PCA 是“特征提取式降维”。 / Therefore, LASSO can be viewed as feature-selection-based reduction, while PCA is feature-extraction-based reduction.
- LASSO 更容易解释,因为保留的是原始变量;PCA 更擅长压缩冗余信息。 / LASSO is more interpretable because it keeps original variables, whereas PCA is better at compressing redundant information.
Q2.8 SVM 核技巧的重要性 / Importance of the kernel trick in SVM
题目 / Question
说明 SVM 核技巧的重要性及其计算意义。
Explain the importance and computational meaning of the kernel trick in SVM.
参考答案 / Suggested Answer
- 核技巧用核函数直接计算高维特征空间中的内积,无需显式构造映射后的高维向量。 / The kernel trick uses a kernel function to compute inner products in a high-dimensional feature space without explicitly constructing the mapped vectors.
- 它使 SVM 能处理非线性可分数据,例如通过 RBF 核把原本不可分的数据映射到可分空间。 / It allows SVM to handle nonlinearly separable data, for example by using an RBF kernel to map data into a separable space.
- 计算上更高效,也避免了高维空间显式建模带来的维度灾难。 / It is computationally more efficient and avoids the curse of dimensionality associated with explicit high-dimensional representations.
Q3. SVM 三样本求解与贝叶斯回归 / Three-point SVM and Bayesian regression
Q3.1 SVM 对偶目标、等式约束与决策边界 / Dual objective, equality constraint, and decision boundary
题目 / Question
给定三点训练样本和 RBF 核,写出对偶目标函数,引入等式约束,推导新拉格朗日,并写出决策边界。
For three training points with an RBF kernel, write the dual objective, incorporate the equality constraint, derive the new Lagrangian, and write the decision boundary.
参考答案 / Suggested Answer
- 对偶目标函数为
The dual objective is $$ L(\alpha)=\sum_{i=1}^{3}\alpha_i-\frac12\sum_{i=1}^{3}\sum_{j=1}^{3}\alpha_i\alpha_j y_i y_j k(x_i,x_j) $$ - 约束条件为 $\alpha_i\ge 0$ 且 $\sum_i \alpha_i y_i=0$。 / The constraints are $\alpha_i \ge 0$ and $\sum_i \alpha_i y_i = 0$.
- 若用 $\delta$ 表示等式约束乘子,则新拉格朗日可写为
Using $\delta$ as the multiplier for the equality constraint, the augmented Lagrangian can be written as $$ \mathcal{L}(\alpha,\delta)=\sum_i\alpha_i-\frac12\sum_i\sum_j\alpha_i\alpha_j y_i y_j k(x_i,x_j)+\delta\sum_i\alpha_i y_i $$ - 决策边界为
The decision boundary is $$ f(x)=\operatorname{sign}\left(\sum_i \alpha_i y_i k(x_i,x)+b\right) $$ - 只有支持向量对应的 $\alpha_i>0$ 会真正影响分类边界。 / Only support vectors with $\alpha_i>0$ contribute to the final boundary.
Q3.2 HPD 与 ROPE / HPD versus ROPE
题目 / Question
解释 HPD 区间与 ROPE 区间的定义和关系。
Explain the definitions of HPD and ROPE and their relationship.
参考答案 / Suggested Answer
- HPD 是后验概率密度最高的区间,表示参数最可能落入的高密度区域。 / HPD is the highest-posterior-density interval, representing the most probable region of the parameter.
- ROPE 是“实际等效区间”,表示效果虽不一定严格为 0,但在实际应用上可视为无差异。 / ROPE is the region of practical equivalence, meaning effects that may not be exactly zero but are negligible in practice.
- 若 HPD 完全落在 ROPE 内,通常认为效果在实践上可忽略;若 HPD 与 ROPE 完全不重叠,则说明有实际显著差异。 / If the HPD lies entirely inside the ROPE, the effect is practically negligible; if they do not overlap at all, the effect is practically meaningful.
Q3.3 24 行 PyMC3 贝叶斯回归代码解释 / Explain the 24-line PyMC3 Bayesian regression code
题目 / Question
给出 24 行 Bayesian 回归 PyMC3 代码,要求逐行解释含义。
Given a 24-line PyMC3 Bayesian regression script, explain it line by line.
参考答案 / Suggested Answer
import部分负责导入 PyMC、NumPy、Pandas 等库。 / Theimportsection loads PyMC, NumPy, Pandas, and other libraries.- 数据读取部分把 Excel 或 CSV 中的自变量和因变量提取成数组。 / The data-loading section extracts the predictor and response arrays from Excel or CSV files.
pm.Model()创建模型上下文;其内部声明先验、确定性节点和似然。 /pm.Model()creates the model context in which priors, deterministic nodes, and the likelihood are declared.pm.Normal常用于截距和系数先验;pm.HalfNormal常用于标准差先验,因为标准差必须非负。 /pm.Normalis usually used for intercept and coefficient priors, whilepm.HalfNormalis used for the standard deviation because it must be nonnegative.pm.Deterministic('mu', ...)记录线性预测值。 /pm.Deterministic('mu', ...)records the linear predictor.observed=y把观测值接入模型,形成似然项。 /observed=yattaches the observed data and turns the distribution into a likelihood term.pm.sample(...)用 MCMC 从后验分布采样,得到参数的后验估计。 /pm.sample(...)draws MCMC samples from the posterior and yields posterior estimates of the parameters.
Q4. 集装箱航运数据、贝叶斯模型与神经网络 / Shipping data, Bayesian model, and neural network
Q4.1 Pandas 加载多表 Excel 数据 / Load multi-sheet Excel data with Pandas
题目 / Question
用 Pandas 加载多表 Excel 数据,形成训练集。
Use Pandas to load multi-sheet Excel data and form a training set.
参考答案 / Suggested Answer
import pandas as pd
x1 = pd.read_excel('ship.xlsx', sheet_name='X1')
x2 = pd.read_excel('ship.xlsx', sheet_name='X2')
y = pd.read_excel('ship.xlsx', sheet_name='Y')
df = pd.concat([x1, x2, y], axis=1)
- 核心思路是分别读取各工作表,再按列拼接成统一训练表。 / The key idea is to read each sheet separately and concatenate them column-wise into one training table.
Q4.2 数据清洗函数主体 / Main body of a data-cleaning function
题目 / Question
写出一段清洗函数代码的主体。
Write the main body of a data-cleaning function.
参考答案 / Suggested Answer
def clean_df(df):
df = df.drop_duplicates()
df = df.dropna(subset=['target'])
num_cols = df.select_dtypes(include='number').columns
df[num_cols] = df[num_cols].fillna(df[num_cols].median())
return df
- 典型步骤包括去重、删除关键标签缺失、对数值列做中位数填补。 / Typical steps include deduplication, dropping rows with missing targets, and median imputation for numeric columns.
Q4.3 贝叶斯模型代码主体 / Core structure of the Bayesian model
题目 / Question
用 pm.Normal、pm.HalfNormal、pm.Deterministic 写出贝叶斯模型主体。
Write the core Bayesian model using pm.Normal, pm.HalfNormal, and pm.Deterministic.
参考答案 / Suggested Answer
import pymc as pm
with pm.Model() as model:
alpha = pm.Normal('alpha', mu=0, sigma=1)
beta = pm.Normal('beta', mu=0, sigma=2)
sigma = pm.HalfNormal('sigma', sigma=1)
mu = pm.Deterministic('mu', alpha + beta * x)
y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma, observed=y)
trace = pm.sample(2000, tune=1000, target_accept=0.95)
alpha和beta是先验,sigma是噪声尺度先验,mu是确定性预测值,y_obs是似然。 /alphaandbetaare priors,sigmais the prior for noise scale,muis the deterministic prediction, andy_obsis the likelihood.
Q4.4 Bagging、Pasting、Voting 与方差控制 / Bagging, Pasting, Voting, and variance control
题目 / Question
比较 Bagging vs Pasting、Variance Reduction vs Overfitting、Bagging vs Voting。
Compare Bagging vs Pasting, variance reduction vs overfitting, and Bagging vs Voting.
参考答案 / Suggested Answer
- Bagging 是有放回抽样训练多个同类模型;Pasting 是无放回抽样。 / Bagging trains multiple similar models on bootstrap samples with replacement, while Pasting samples without replacement.
- Bagging 的主要作用是降低方差、缓解过拟合,尤其适用于高方差基学习器,如决策树。 / Bagging mainly reduces variance and mitigates overfitting, especially for high-variance base learners such as decision trees.
- Voting 通常指多个模型的投票集成,模型可以同质也可以异质;Bagging 通常更强调重采样训练。 / Voting refers to combining predictions from multiple models, which may be homogeneous or heterogeneous; Bagging specifically emphasizes resampling-based training.
Q4.5 2-3-2 ANN 前向传播与 MSE / Forward pass and MSE for a 2-3-2 ANN
题目 / Question
对 2-3-2 人工神经网络进行前向计算,并计算 MSE,已知预测值 $\hat{y}=[0.475, 0.505]$。
Carry out the forward computation for a 2-3-2 neural network and compute the MSE, given the prediction $\hat{y}=[0.475, 0.505]$.
参考答案 / Suggested Answer
- 隐藏层先算 $z^{(1)}=W^{(1)}x$,再过 Sigmoid 得 $o^{(1)}=\sigma(z^{(1)})$。 / The hidden layer first computes $z^{(1)}=W^{(1)}x$, then applies the sigmoid to get $o^{(1)}=\sigma(z^{(1)})$.
- 输出层再算 $z^{(2)}=W^{(2)}o^{(1)}$,最后得到 $\hat{y}=o^{(2)}=\sigma(z^{(2)})$。 / The output layer computes $z^{(2)}=W^{(2)}o^{(1)}$, and the final prediction is $\hat{y}=o^{(2)}=\sigma(z^{(2)})$.
- 若真实标签是 $y=[1,0]$,则
If the true label is $y=[1,0]$, then $$ \text{MSE}=\frac12\left[(1-0.475)^2+(0-0.505)^2\right]\approx 0.2653 $$
三、2024-2025 Semester 2(April/May 2025)题目与参考答案 / Questions and Suggested Answers
Q1. MNC 数据科学咨询 / MNC data-science consulting
Q1.1 数据科学与 Industry 4.0 的三大核心 / Three cores of data science and Industry 4.0
题目 / Question
列出数据科学/Industry 4.0 的三大核心。
List the three core elements of data science and Industry 4.0.
参考答案 / Suggested Answer
- 数据 / Data:高质量、可获取、可治理的数据是基础。 / Data: high-quality, accessible, governable data is the foundation.
- 模型 / Models:统计与机器学习模型用于描述、预测和优化。 / Models: statistical and machine-learning models are used for description, prediction, and optimization.
- 业务决策 / Decision-making:模型结果必须转化成流程优化、成本下降和管理决策。 / Decision-making: model outputs must be translated into process optimization, cost reduction, and management action.
Q1.2 数据科学广义三要素 / Three broad elements of data science
题目 / Question
简述数据科学广义三要素。
Briefly explain the three broad elements of data science.
参考答案 / Suggested Answer
- 领域知识 / Domain knowledge:帮助理解问题背景和约束。 / Domain knowledge helps interpret business context and constraints.
- 数学与统计 / Math and statistics:用于建模、推断和验证。 / Math and statistics are used for modeling, inference, and validation.
- 计算与编程 / Computing and programming:用于数据处理、训练模型和部署分析流程。 / Computing and programming are used to process data, train models, and deploy analytical workflows.
Q1.3 Industry 4.0 五大关键技术 / Five key technologies of Industry 4.0
题目 / Question
总结 Industry 4.0 五大关键技术,并说明在 MNC 的落地。
Summarize the five key technologies of Industry 4.0 and explain how they can be implemented in an MNC.
参考答案 / Suggested Answer
- IoT / 物联网:用于设备、产品和供应链节点的实时连接。 / Used for real-time connectivity of machines, products, and supply-chain nodes.
- Big Data & Analytics / 大数据分析:用于需求预测、质量监控和客户洞察。 / Used for demand forecasting, quality monitoring, and customer insight.
- Cloud/Edge Computing / 云与边缘计算:提供集中算力和现场低时延响应。 / Provides centralized compute and low-latency on-site response.
- AI & ML / 人工智能与机器学习:用于预测性维护、异常检测和智能客服。 / Used for predictive maintenance, anomaly detection, and intelligent customer service.
- CPS / 网络物理系统:把物理资产和数字模型联通,实现闭环控制。 / Connects physical assets and digital models for closed-loop control.
Q1.4 执行摘要 / Executive summary
题目 / Question
写一份执行摘要:为什么上述五项技术最关键。
Write an executive summary explaining why those five technologies are the most important.
参考答案 / Suggested Answer
- 这五项技术共同覆盖“连接、采集、分析、决策、执行”完整链路。 / These five technologies together cover the full loop of connectivity, data capture, analysis, decision, and execution.
- 对大型跨国企业而言,它们既能提升运营效率,也能增强全球供应链韧性和客户响应速度。 / For an MNC, they improve operational efficiency while strengthening global supply-chain resilience and customer responsiveness.
- 它们的组合价值高于单点部署,因为 Industry 4.0 的本质是端到端的数据闭环。 / Their combined value is greater than isolated deployment because the essence of Industry 4.0 is end-to-end data closure.
Q2. PCA、LASSO、SVM 与 ANN 反向传播 / PCA, LASSO, SVM, and ANN backpropagation
Q2.1 PCA 做作者鉴定 / PCA for authorship identification
题目 / Question
在作者鉴定任务中说明 PCA 的作用。
Explain the role of PCA in an authorship-identification task.
参考答案 / Suggested Answer
- PCA 用于把高维词频矩阵压缩到低维主成分空间,保留主要文风差异。 / PCA compresses the high-dimensional word-frequency matrix into a low-dimensional component space while preserving major stylistic variation.
- 如果样本在主成分空间聚类明显,则说明作者风格存在可分结构。 / If samples cluster clearly in component space, it suggests separable authorship styles.
Q2.2 Scree Plot 与主成分选择 / Scree plot and component selection
题目 / Question
说明 Scree Plot 如何辅助主成分选择。
Explain how a scree plot helps choose the number of principal components.
参考答案 / Suggested Answer
- Scree Plot 显示每个主成分的方差贡献率或累计贡献率。 / A scree plot shows the explained variance ratio or cumulative variance for each principal component.
- 实务中常选肘部位置,或选累计解释率达到 90% 到 95% 的最小主成分数。 / In practice, one selects the elbow point or the smallest number of components reaching 90% to 95% cumulative variance.
Q2.3 特征值分解 vs SVD / Eigendecomposition versus SVD
题目 / Question
比较特征值分解与 SVD 的差异。
Compare eigendecomposition and SVD.
参考答案 / Suggested Answer
- 特征值分解通常作用于方阵,PCA 中常用于协方差矩阵。 / Eigendecomposition is usually applied to square matrices, and in PCA it is often applied to the covariance matrix.
- SVD 适用于任意矩阵,数值稳定性更好,也无需显式构造 $X^TX$。 / SVD applies to any matrix, is numerically more stable, and does not require explicitly building $X^TX$.
- 在 PCA 里,两者都能得到主方向;SVD 更常用于实际计算。 / In PCA, both can recover principal directions, but SVD is more commonly used in practice.
Q2.4 Kernel Trick 的意义 / Significance of the kernel trick
题目 / Question
说明 Kernel Trick 在 SVM 中的意义。
Explain the meaning of the kernel trick in SVM.
参考答案 / Suggested Answer
- 核技巧允许模型在隐式高维空间中做线性分类,从而解决原空间中的非线性问题。 / The kernel trick allows linear separation in an implicit high-dimensional space, thereby solving nonlinear problems in the original space.
- 它避免显式构造高维特征映射,提升了计算可行性。 / It avoids explicit high-dimensional feature construction and improves computational feasibility.
Q2.5 LASSO 与特征降维 / LASSO and feature reduction
题目 / Question
说明 LASSO 与特征降维的关系。
Explain the relationship between LASSO and feature reduction.
参考答案 / Suggested Answer
- LASSO 通过 L1 正则让部分权重变成 0,从而删除不重要特征。 / LASSO uses L1 regularization to drive some coefficients to zero and remove unimportant features.
- 它不生成新的主成分,而是保留原变量,因此更可解释。 / It does not create new principal components but keeps original variables, so it is more interpretable.
Q2.6 降维是否适用于其它机器学习算法 / Is dimensionality reduction useful for other ML algorithms?
题目 / Question
降维是否适用于其它机器学习算法?
Is dimensionality reduction useful for other machine-learning algorithms?
参考答案 / Suggested Answer
- 是。降维可以减少噪声、缓解多重共线性、降低训练成本,并改善泛化。 / Yes. Dimensionality reduction can reduce noise, alleviate multicollinearity, lower training cost, and improve generalization.
- 它常用于回归、聚类、分类和可视化,但要注意可能牺牲可解释性。 / It is useful for regression, clustering, classification, and visualization, but it may reduce interpretability.
Q2.7 ANN 反向传播链式求导 / Chain rule in ANN backpropagation
题目 / Question
给出输出层反向传播的偏导链,并解释各项意义。
Write the chain of partial derivatives for backpropagation at the output layer and explain each term.
参考答案 / Suggested Answer
- 可写为
It can be written as $$ \frac{\partial L}{\partial w}=\frac{\partial L}{\partial o}\cdot\frac{\partial o}{\partial z}\cdot\frac{\partial z}{\partial w} $$ - 第一项是预测误差,第二项是激活函数的局部梯度,第三项是上一层输出。 / The first term is the prediction error, the second is the local gradient of the activation function, and the third is the previous layer’s output.
- 若激活函数为 Sigmoid,则 $\frac{\partial o}{\partial z}=o(1-o)$。 / If the activation is sigmoid, then $\frac{\partial o}{\partial z}=o(1-o)$.
Q2.8 K-Means vs EM / K-Means versus EM
题目 / Question
比较 K-Means 与 EM 的相似性和差异。
Compare the similarities and differences between K-Means and EM.
参考答案 / Suggested Answer
- 相同点:两者都采用迭代优化,反复更新“分配”和“参数”。 / Similarity: both are iterative optimization methods that repeatedly update assignments and parameters.
- 不同点:K-Means 是硬分配,只更新质心;EM 是软分配,更新簇概率、均值、协方差等参数。 / Difference: K-Means uses hard assignment and updates centroids only; EM uses soft assignment and updates cluster probabilities, means, and covariances.
- K-Means 更快更简单,EM 更灵活,能处理更复杂分布。 / K-Means is faster and simpler; EM is more flexible and can model more complex distributions.
Q2.9 用 EM 做缺失值填补 / Use EM for missing-value imputation
题目 / Question
说明用 EM 做缺失值填补的关键步骤。
Explain the key steps of using EM for missing-value imputation.
参考答案 / Suggested Answer
- E-step:在当前参数下,估计缺失值的条件期望。 / E-step: estimate the conditional expectation of missing values under current parameters.
- M-step:把填补后的数据视为完整数据,重新估计模型参数。 / M-step: treat the imputed data as complete and re-estimate model parameters.
- 重复迭代直到似然或参数收敛。 / Repeat until the likelihood or parameters converge.
Q2.10 MAR 缺失样本如何处理 / How to handle MAR missing samples
题目 / Question
在汽车属性数据中若缺失机制为 MAR,应如何处理样本?
In an automobile-attribute dataset, if the missingness mechanism is MAR, how should the samples be handled?
参考答案 / Suggested Answer
- 不应直接整行删除,否则会损失信息并可能引入偏差。 / Rows should not be dropped blindly because information is lost and bias may be introduced.
- 应优先使用多重插补、EM 或基于模型的填补方法。 / Prefer multiple imputation, EM, or model-based imputation methods.
- 同时要检查缺失是否与已观测变量相关,并把这种关系纳入建模。 / Also check whether missingness depends on observed variables and incorporate that relationship into the model.
Q3. 集装箱航运 PyMC 模型代码解释 / Shipping PyMC model code explanation
Q3.1 pm.Normal、pm.HalfNormal 与 pm.sample / pm.Normal, pm.HalfNormal, and pm.sample
题目 / Question
解释 pm.Normal、pm.HalfNormal、pm.sample(2000, observed=y_3) 等 24 行代码。
Explain a 24-line code block involving pm.Normal, pm.HalfNormal, and pm.sample(2000, observed=y_3).
参考答案 / Suggested Answer
pm.Normal往往表示系数或截距的先验,也可表示观测模型。 /pm.Normaloften defines priors for coefficients or intercepts, and can also define the observation model.pm.HalfNormal常用于标准差、方差尺度等必须为正的参数。 /pm.HalfNormalis typically used for strictly positive parameters such as standard deviations.observed=y_3表示把真实数据挂接到随机变量上,使其成为似然函数。 /observed=y_3attaches real data to the random variable and turns it into a likelihood.pm.sample(2000)表示从后验分布中抽取 2000 个样本,用于估计参数不确定性。 /pm.sample(2000)draws 2000 posterior samples to estimate parameter uncertainty.
Q3.2 Deterministic 变量 mu / The deterministic variable mu
题目 / Question
解读 Determined/Deterministic 变量 mu 的用途。
Explain the role of the Deterministic variable mu.
参考答案 / Suggested Answer
mu记录模型中的确定性中间量,例如线性回归中的 $\mu=\alpha+\beta x$。 /murecords a deterministic intermediate quantity, such as $\mu=\alpha+\beta x$ in linear regression.- 它不是额外噪声项,而是帮助我们追踪和解释模型结构。 / It is not an additional noise term; it helps us track and interpret the model structure.
Q3.3 Bayesian Inference 与 Negative Binomial / Bayesian inference and Negative Binomial
题目 / Question
解释 Bayesian Inference、Negative Binomial 的用途及其参数含义。
Explain Bayesian inference, the purpose of the Negative Binomial, and the meanings of its parameters.
参考答案 / Suggested Answer
- 贝叶斯推断是把先验与似然结合,得到后验分布,再基于后验做估计与决策。 / Bayesian inference combines the prior and likelihood to obtain the posterior, which is then used for estimation and decision-making.
- Negative Binomial 常用于过度离散的计数数据,因为它允许方差大于均值。 / The Negative Binomial is often used for overdispersed count data because it allows the variance to exceed the mean.
- 其参数通常可理解为均值相关参数和离散度参数。 / Its parameters are typically interpreted as a mean-related parameter and a dispersion-related parameter.
Q3.4 Sampling divergence / Sampling divergence
题目 / Question
说明 sampling divergence 出现的场景与防范方法。
Explain when sampling divergence occurs and how to prevent it.
参考答案 / Suggested Answer
- 当后验几何形状过于尖锐、尺度差异大或步长不合适时,HMC/NUTS 可能出现 divergence。 / Divergences may occur in HMC/NUTS when the posterior geometry is sharp, badly scaled, or incompatible with the integration step size.
- 防范方法包括:提高
target_accept、增加tune、做非中心化重参数化、收紧先验、重新缩放数据。 / Preventive actions include increasingtarget_accept, adding more tuning steps, using non-centered reparameterization, tightening priors, and rescaling data.
Q3.5 Regression 中两种 Shrinkage 模型 / Two shrinkage models in regression
题目 / Question
列举岭回归与 LASSO 的优劣。
State the strengths and weaknesses of Ridge regression and LASSO.
参考答案 / Suggested Answer
- 岭回归(L2)适合多数特征都有贡献、且存在多重共线性的情况;缺点是不会把系数压到 0。 / Ridge regression (L2) is suitable when most features contribute and multicollinearity exists; its weakness is that it does not set coefficients exactly to zero.
- LASSO(L1)能自动做特征选择,模型更稀疏;缺点是在强相关特征下可能不稳定。 / LASSO (L1) performs automatic feature selection and yields sparse models; its weakness is instability when features are strongly correlated.
Q4. 自动驾驶、DQN 与 SARSA / Self-driving, DQN, and SARSA
Q4.1 DQN 训练伪码解释 / Explain DQN training pseudocode
题目 / Question
给出 DQN 训练伪码,要求解释初始化、经验回放、目标值和反向传播。
Given DQN pseudocode, explain initialization, experience replay, target computation, and backpropagation.
参考答案 / Suggested Answer
- 初始化主网络、目标网络和 replay buffer。 / Initialize the online network, target network, and replay buffer.
- 与环境交互,按 $\varepsilon$-greedy 选择动作,把 $(s,a,r,s')$ 存入 buffer。 / Interact with the environment, choose actions by $\varepsilon$-greedy, and store $(s,a,r,s')$ in the buffer.
- 从 buffer 随机采样一个 batch,计算目标
Randomly sample a batch from the buffer and compute the target $$ y=r+\gamma \max_{a'}Q_{\theta^-}(s',a') $$ - 最小化平方误差损失,反向传播更新主网络;目标网络定期同步。 / Minimize the squared-error loss, backpropagate through the online network, and periodically sync the target network.
Q4.2 自动驾驶 SDC 的传感器到动作映射 / Mapping SDC sensors to actions
题目 / Question
说明自动驾驶汽车的传感器信息如何映射到动作,如变道、避障、启停和倒退。
Explain how self-driving-car sensor information maps to actions such as lane change, obstacle avoidance, start-stop, and reverse.
参考答案 / Suggested Answer
- 摄像头、激光雷达、雷达和超声波提供车道、障碍物、距离和速度信息。 / Cameras, LiDAR, radar, and ultrasonic sensors provide lane, obstacle, distance, and speed information.
- 感知模块把这些信息转成状态表示,策略网络根据状态输出动作价值或动作概率。 / The perception module converts them into a state representation, and the policy network outputs action values or action probabilities.
- 例如检测到前方障碍且左侧安全时,系统可输出“变道”;若前后距离过小则输出“制动或停车”。 / For example, if there is an obstacle ahead and the left lane is safe, the system may output “change lane”; if distance becomes too small, it may output “brake or stop.”
Q4.3 SARSA 算法步骤与参数角色 / SARSA steps and parameter roles
题目 / Question
给出 SARSA 的算法步骤,解释 $Q(s,u)$、$r(i,u)$、$rmm$、$\ell(\cdot)$ 的角色,并讨论学习率 $\alpha$。
State the steps of SARSA, explain the roles of $Q(s,u)$, $r(i,u)$, $rmm$, and $\ell(\cdot)$, and discuss the learning rate $\alpha$.
参考答案 / Suggested Answer
- SARSA 是 on-policy 方法,更新式为
SARSA is an on-policy method with update $$ Q(s,u)\leftarrow Q(s,u)+\alpha\left[r+\gamma Q(s',u')-Q(s,u)\right] $$ - $Q(s,u)$ 是状态动作价值;$r(i,u)$ 是即时奖励。 / $Q(s,u)$ is the state-action value; $r(i,u)$ is the immediate reward.
- 若题中
rmm表示最大迭代步数或回合长度,则它用于控制 episode 终止。 / Ifrmmdenotes the maximum number of steps or episode length, it controls episode termination. - 若 $\ell(\cdot)$ 表示损失、惩罚或衰减函数,则其作用是调节更新强度或探索策略。 / If $\ell(\cdot)$ denotes a loss, penalty, or decay function, its role is to regulate update strength or exploration.
- $\alpha$ 通常应随训练逐渐衰减:前期大一些便于学习,后期小一些便于收敛稳定。 / $\alpha$ should usually decay over time: larger early for learning speed, smaller later for stable convergence.
Q4.4 DQN vs SARSA / DQN versus SARSA
题目 / Question
比较 DQN 与 SARSA 在经验回放和学习方式上的差异。
Compare DQN and SARSA in experience replay and learning style.
参考答案 / Suggested Answer
- DQN 是 off-policy,通常配合经验回放和目标网络,适合高维状态空间。 / DQN is off-policy, usually with experience replay and a target network, and is suitable for high-dimensional state spaces.
- SARSA 是 on-policy,通常在表格型或较小状态空间中使用,不强调 replay buffer。 / SARSA is on-policy, often used in tabular or small state spaces, and does not typically rely on replay buffers.
- SARSA 更保守,因为它用实际采取的动作更新;DQN 更偏向估计最优动作价值。 / SARSA is more conservative because it updates using the action actually taken, whereas DQN focuses on estimating the optimal action value.
四、2025-2026 Semester 1(November/December 2025)题目与参考答案 / Questions and Suggested Answers
Q1. 中型智造企业实习场景 / Internship in a manufacturing company
Q1.1 Data Science vs Big Data / 数据科学与大数据
题目 / Question
向实习生解释 Data Science 与 Big Data 的异同。
Explain the similarities and differences between Data Science and Big Data to an intern.
参考答案 / Suggested Answer
- Data Science 更关注从数据中提炼洞察、建模预测和支持决策。 / Data Science focuses on extracting insights, building predictive models, and supporting decisions.
- Big Data 更关注大规模数据的采集、存储、传输和计算基础设施。 / Big Data focuses on large-scale data collection, storage, transport, and computation infrastructure.
- 二者关系可概括为:Big Data 提供数据基础,Data Science 提供分析能力。 / Their relationship can be summarized as follows: Big Data provides the data foundation, while Data Science provides the analytical capability.
Q1.2 数据分析中的四大挑战 / Four key challenges in data analytics
题目 / Question
介绍数据分析中的四大挑战。
Introduce four major challenges in data analytics.
参考答案 / Suggested Answer
- 数据质量 / Data quality:缺失、噪声、异常和不一致。 / Missingness, noise, outliers, and inconsistency.
- 规模与速度 / Scale and speed:数据体量大、更新快、需要实时处理。 / Large volume, fast updates, and the need for real-time processing.
- 隐私与合规 / Privacy and compliance:涉及安全、权限和法规要求。 / Security, access control, and regulatory constraints.
- 可解释性 / Interpretability:模型结果必须被业务方理解和信任。 / Model outputs must be understandable and trusted by stakeholders.
Q1.3 五项关键趋势并排序 / Five key trends and ranking
题目 / Question
列举数据分析中最关键的五项趋势,并给出排序。
List and rank the five most important trends in data analytics.
参考答案 / Suggested Answer
- AI 驱动自动化 / AI-driven automation
- 实时分析 / Real-time analytics
- 云边协同 / Cloud-edge integration
- 可解释 AI / Explainable AI
- 数据治理与隐私保护 / Data governance and privacy
排序理由:前两项直接影响业务价值产出速度,后三项决定系统能否规模化、可信化和长期落地。
Reason for the ranking: the first two directly affect how fast business value is created, while the latter three determine whether the system can scale, remain trustworthy, and be sustainably deployed.
Q2. 可靠性工程、PCA 与预测性维护 / Reliability engineering, PCA, and predictive maintenance
Q2.1 用 Pandas + PCA 填补缺失值 / Use Pandas + PCA to impute missing values
题目 / Question
在 CNC 传感器数据中,使用 Pandas + PCA 填补缺失值。
Use Pandas + PCA to impute missing values in CNC sensor data.
参考答案 / Suggested Answer
- 先用 Pandas 做基础清洗,如删除明显错误行、统一列名、初步填补缺失值。 / First use Pandas for basic cleaning, such as removing clearly wrong rows, unifying column names, and performing initial imputation.
- 再对数值特征标准化并做 PCA,用主成分结构重构原始数据,从而对缺失位置进行更合理估计。 / Then standardize numeric features and apply PCA, using the component structure to reconstruct the original data and obtain more informed estimates for missing entries.
Q2.2 何时用 Pandas,何时用 PCA / When to use Pandas and when to use PCA
题目 / Question
讨论何时应使用 Pandas,何时应使用 PCA。
Discuss when Pandas should be used and when PCA should be used.
参考答案 / Suggested Answer
- Pandas 适合数据加载、清洗、筛选、简单填补和表格操作。 / Pandas is suitable for loading, cleaning, filtering, simple imputation, and tabular manipulation.
- PCA 适合处理多变量相关结构、降噪、降维和基于低维结构的重构。 / PCA is suitable for exploiting multivariate correlation structure, denoising, dimensionality reduction, and low-dimensional reconstruction.
- 简单缺失可先用 Pandas;若变量强相关且需要保留整体结构,可进一步用 PCA。 / Simple missingness can first be handled in Pandas; if variables are strongly correlated and structure should be preserved, PCA can then be applied.
Q2.3 95% 方差阈值的 PCA 伪码 / PCA pseudocode with a 95% variance threshold
题目 / Question
写出“预处理 -> PCA 降维 -> 选择至少保留 95% 方差的主成分”的伪码。
Write pseudocode for “preprocess -> PCA -> choose the number of components preserving at least 95% variance.”
参考答案 / Suggested Answer
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
X = df[num_cols].fillna(df[num_cols].median())
X = StandardScaler().fit_transform(X)
pca = PCA(n_components=0.95)
X_pca = pca.fit_transform(X)
n_components=0.95表示自动选择累计解释率至少为 95% 的最小主成分数。 /n_components=0.95means automatically choosing the smallest number of components whose cumulative explained variance is at least 95%.
Q2.4 哪些传感器最关键 / Which sensors are most important
题目 / Question
根据代码结构,推荐哪些传感器变量最关键用于预测性维护。
Based on the code structure, recommend which sensor variables are most important for predictive maintenance.
参考答案 / Suggested Answer
- 轴承温度、主轴转速和功率通常最关键,因为它们直接反映机械负载、摩擦和热异常。 / Bearing temperature, spindle speed, and power are usually the most important because they directly reflect mechanical load, friction, and thermal anomalies.
- 若题目含 x/y/z 三轴振动,也应视为关键变量,因为振动往往是故障最早的信号之一。 / If x/y/z vibration channels are present, they should also be treated as critical, because vibration is often one of the earliest fault signals.
Q3. 监督/非监督学习、Mercer 定理与 K-Fold / Supervised vs unsupervised learning, Mercer’s theorem, and K-Fold
Q3.1 监督学习 vs 非监督学习与过拟合 / Supervised vs unsupervised learning and overfitting
题目 / Question
以一个 ML 算法说明 supervised vs unsupervised 在过拟合/欠拟合上的差异。
Use one ML algorithm to explain the difference between supervised and unsupervised learning in terms of overfitting and underfitting.
参考答案 / Suggested Answer
- 以决策树为例,监督学习中若树太深,会过度拟合训练标签。 / Take decision trees as an example: in supervised learning, a tree that is too deep may overfit the training labels.
- 在非监督学习中,以 K-Means 为例,若簇数过多,也会把噪声当结构;若簇数过少,则会欠拟合真实分群。 / In unsupervised learning, for example in K-Means, too many clusters treat noise as structure, while too few clusters underfit the real grouping.
- 因此二者都会面临偏差-方差权衡,只是监督学习围绕标签误差,非监督学习围绕结构描述误差。 / Both therefore face a bias-variance trade-off, but supervised learning centers on label error whereas unsupervised learning centers on structural representation error.
Q3.2 过拟合检测与解决 / Detecting and solving overfitting
题目 / Question
说明过拟合的检测与解决手段,可配图或公式。
Explain how to detect and mitigate overfitting, optionally with a figure or formula.
参考答案 / Suggested Answer
- 检测方法:训练误差很低但验证误差很高,或 learning curve 显示明显泛化差距。 / Detection: training error is very low while validation error is high, or the learning curve shows a clear generalization gap.
- 解决方法:交叉验证、正则化、早停、降维、增加数据、数据增强、简化模型。 / Remedies: cross-validation, regularization, early stopping, dimensionality reduction, more data, data augmentation, and model simplification.
Q3.3 Mercer 定理 / Mercer’s theorem
题目 / Question
说明 Mercer 定理的核心内容与核函数有效性条件。
State the core content of Mercer’s theorem and the validity condition for a kernel function.
参考答案 / Suggested Answer
- Mercer 定理的关键点是:若核函数对应的 Gram 矩阵对任意有限样本集都半正定,则该核函数可被视为某个特征空间中的内积。 / The key point of Mercer’s theorem is that if the kernel’s Gram matrix is positive semidefinite for any finite sample set, the kernel can be viewed as an inner product in some feature space.
- 这保证了 SVM 对偶优化问题是凸的,并且隐式映射是合法的。 / This guarantees that the SVM dual optimization problem is convex and that the implicit mapping is valid.
Q3.4 核技巧如何映射到特征空间 / How the kernel trick maps into feature space
题目 / Question
说明核技巧如何把数据映射到特征空间。
Explain how the kernel trick maps data into feature space.
参考答案 / Suggested Answer
- 核技巧并不显式计算 $\phi(x)$,而是直接用 $k(x,x')=\phi(x)^T\phi(x')$ 替代内积。 / The kernel trick does not compute $\phi(x)$ explicitly; instead it uses $k(x,x')=\phi(x)^T\phi(x')$ directly.
- 这样模型可在隐式高维空间中学习非线性边界,而计算仍在原始输入之间进行。 / This allows the model to learn nonlinear boundaries in an implicit high-dimensional space while computation stays in terms of the original inputs.
Q3.5 K-Fold 交叉验证 / K-Fold cross-validation
题目 / Question
说明 K-Fold 在参数调优、回归评分和泛化能力解释中的作用。
Explain the role of K-Fold in hyperparameter tuning, regression scoring, and generalization assessment.
参考答案 / Suggested Answer
- 将数据分成 K 折,每次用 K-1 折训练、1 折验证,循环 K 次并取平均分数。 / Split the data into K folds, train on K-1 folds and validate on the remaining fold, repeat K times, and average the scores.
- 参数调优时,对每组超参数计算平均验证分数,选择最优组合。 / For tuning, compute the average validation score for each hyperparameter setting and choose the best one.
- 回归任务常用 MSE、RMSE、MAE、$R^2$ 作为评分指标。 / Regression tasks commonly use MSE, RMSE, MAE, and $R^2$ as evaluation metrics.
- 若交叉验证分数稳定且接近测试表现,说明模型泛化较好;若训练分数远高于交叉验证分数,则多半过拟合。 / If the cross-validation score is stable and close to test performance, generalization is good; if training performance is much higher than cross-validation performance, the model is likely overfitting.
Q4. 航运贝叶斯模型、EM 与强化学习 / Shipping Bayesian model, EM, and reinforcement learning
Q4.1 Dirichlet 的意义 / Meaning of the Dirichlet prior
题目 / Question
解释贝叶斯混合模型中 Dirichlet 先验的意义。
Explain the meaning of the Dirichlet prior in a Bayesian mixture model.
参考答案 / Suggested Answer
- Dirichlet 用于给混合权重建先验,保证各权重非负且总和为 1。 / The Dirichlet prior is used for mixture weights, ensuring that all weights are nonnegative and sum to 1.
- 若参数设为全 1,则相当于对各成分给出均匀先验。 / If all concentration parameters are 1, it corresponds to a uniform prior over the components.
Q4.2 Normal Mixture 的贝叶斯公式与动机 / Bayesian form and motivation of a Normal Mixture
题目 / Question
解释 Normal Mixture 的公式与使用动机。
Explain the formula and motivation of a Normal Mixture.
参考答案 / Suggested Answer
- Normal Mixture 可写为
A Normal Mixture can be written as $$ p(y)=\sum_{k=1}^{K}\pi_k\mathcal{N}(y\mid \mu_k,\sigma_k^2) $$ - 它适合多峰数据或由多个潜在群体混合生成的数据。 / It is suitable for multimodal data or data generated by multiple latent groups.
- 在航运或设备状态场景中,不同运行状态可能对应不同分布,因此单一正态不够灵活。 / In shipping or equipment-state scenarios, different operating states may correspond to different distributions, so a single Gaussian is not flexible enough.
Q4.3 EM 算法与给定损失公式 / EM algorithm and the given loss formula
题目 / Question
说明 EM 处理 MAR 数据时 E-step、M-step 的含义,并结合给定损失与梯度更新公式解释。
Explain the meanings of the E-step and M-step in EM for MAR data, and relate them to the given loss and gradient update formula.
参考答案 / Suggested Answer
- E-step 根据当前参数估计隐变量或缺失值的后验期望。 / The E-step estimates the posterior expectations of latent variables or missing values under current parameters.
- M-step 在这些期望的基础上最大化期望完全数据对数似然,更新参数。 / The M-step maximizes the expected complete-data log-likelihood based on those expectations.
- 若题中给出
If the question gives $$ L(\theta)=\frac1n\sum (y_i-Q^{\text{net}}(s_i,u_i))^2,\qquad \theta\leftarrow\theta-\alpha\nabla_\theta L(\theta) $$ 可理解为:先基于当前模型形成目标,再用梯度下降更新网络参数。 / this can be interpreted as first forming targets under the current model and then updating network parameters by gradient descent.
Q4.4 异常值检测 / Outlier detection
题目 / Question
列举两个判定 outlier 的标准,并给出一个机器学习检测流程。
List two criteria for identifying outliers and give one machine-learning detection workflow.
参考答案 / Suggested Answer
- 标准一:3-sigma 规则,超出均值 $\pm 3\sigma$ 的点可视为异常。 / Criterion 1: the 3-sigma rule, where points beyond the mean $\pm 3\sigma$ are treated as abnormal.
- 标准二:IQR 规则,落在 $[Q_1-1.5IQR,Q_3+1.5IQR]$ 之外的点可视为异常。 / Criterion 2: the IQR rule, where points outside $[Q_1-1.5IQR,Q_3+1.5IQR]$ are treated as abnormal.
- 一个 ML 流程可以是:标准化数据 -> 训练 Isolation Forest -> 输出异常分数 -> 设定阈值 -> 人工复核。 / One ML workflow is: standardize data -> train an Isolation Forest -> output anomaly scores -> set a threshold -> manually review flagged points.
Q4.5 迷宫导航中的强化学习设计 / Reinforcement-learning design for maze navigation
题目 / Question
解释 RL agent 决策流程,并在 10x10 迷宫中识别 agent、environment、states、actions、rewards,最后推荐算法。
Explain the RL agent decision process, identify the agent, environment, states, actions, and rewards in a 10x10 maze, and recommend an algorithm.
参考答案 / Suggested Answer
- 决策流程是:观察状态 -> 选动作 -> 获得奖励与新状态 -> 更新价值函数或策略。 / The decision loop is: observe state -> choose action -> receive reward and next state -> update the value function or policy.
- 在 10x10 迷宫中,agent 是移动体;environment 是迷宫;state 是当前位置;action 是上下左右;reward 可设为到终点 +100、撞墙 -10、每步 -1。 / In a 10x10 maze, the agent is the moving entity, the environment is the maze, the state is the current position, the actions are up/down/left/right, and rewards can be +100 for reaching the goal, -10 for hitting a wall, and -1 per step.
- 若状态空间较小且离散,优先推荐 Q-learning 或 SARSA;若观测高维或环境复杂,再考虑 DQN。 / If the state space is small and discrete, Q-learning or SARSA is preferred; if observations are high-dimensional or the environment is complex, DQN should be considered.
五、共同核心考点 / Shared Core Topics
- 数据科学基础 / Data-science fundamentals
定义 Data Science、Big Data、Industry 4.0,并能结合业务场景作答。
Define Data Science, Big Data, and Industry 4.0, and explain them in business context.
2. PCA 与降维 / PCA and dimensionality reduction
熟悉标准化、协方差矩阵、特征值分解、Scree Plot、95% 方差阈值。
Be comfortable with standardization, covariance matrices, eigendecomposition, scree plots, and the 95% variance threshold.
3. 监督学习 / Supervised learning
SVM、Kernel Trick、Mercer 定理、Ridge、LASSO、K-Fold。
SVM, kernel trick, Mercer’s theorem, Ridge, LASSO, and K-Fold.
4. 非监督学习 / Unsupervised learning
K-Means、EM、MAR 缺失值填补、异常值检测。
K-Means, EM, MAR missing-value imputation, and outlier detection.
5. 贝叶斯建模 / Bayesian modeling
先验、似然、后验、PyMC 代码阅读、HPD、ROPE、sampling divergence。
Priors, likelihood, posterior, PyMC code reading, HPD, ROPE, and sampling divergence.
6. 神经网络 / Neural networks
前向传播、Sigmoid、MSE、反向传播链式求导。
Forward propagation, sigmoid, MSE, and the chain rule in backpropagation.
7. 强化学习 / Reinforcement learning
Q-learning、SARSA、DQN、经验回放、target network、场景建模。
Q-learning, SARSA, DQN, experience replay, target network, and task modeling.
六、速记公式与答题模板 / Formula Bank and Exam Templates
1. PCA / 主成分分析
$$ \Sigma=\frac{1}{n-1}X^TX,\qquad \Sigma v_i=\lambda_i v_i,\qquad Z=XV_k $$- 答题模板 / Answer template:
“先中心化/标准化,再做协方差矩阵或 SVD,按方差贡献率选主成分,最后解释投影结果。”
“First center/standardize, then compute the covariance matrix or SVD, choose components by explained variance, and finally interpret the projections.”
2. SVM / 支持向量机
$$ \max_\alpha \sum_i\alpha_i-\frac12\sum_i\sum_j\alpha_i\alpha_j y_i y_j k(x_i,x_j), \qquad f(x)=\operatorname{sign}\left(\sum_i\alpha_i y_i k(x_i,x)+b\right) $$- 答题模板 / Answer template:
“写出目标函数、约束条件、核函数作用,再给决策边界。”
“State the objective function, the constraints, the role of the kernel, and then give the decision boundary.”
3. 神经网络 / Neural network
$$ \hat{y}=\sigma(W^{(2)}\sigma(W^{(1)}x)),\qquad \text{MSE}=\frac12\sum_i(y_i-\hat{y}_i)^2 $$$$ \frac{\partial L}{\partial w}= \frac{\partial L}{\partial o}\cdot \frac{\partial o}{\partial z}\cdot \frac{\partial z}{\partial w} $$- 答题模板 / Answer template:
“先前向传播,再写损失函数,最后用链式法则解释梯度来源。”
“Do the forward pass first, then write the loss function, and finally explain the gradient via the chain rule.”
4. 贝叶斯建模 / Bayesian modeling
$$ p(\theta\mid y)\propto p(y\mid \theta)p(\theta) $$- 答题模板 / Answer template:
“先说明先验,再说明似然,最后解释后验与采样。”
“Explain the prior first, then the likelihood, and finally the posterior and sampling.”
5. EM 算法 / EM algorithm
- E-step:估计隐变量或缺失值期望。 / Estimate expectations of latent variables or missing values.
- M-step:最大化期望完全数据对数似然。 / Maximize the expected complete-data log-likelihood.
6. 强化学习 / Reinforcement learning
$$ Q(s,a)\leftarrow Q(s,a)+\alpha\left[r+\gamma \max_{a'}Q(s',a')-Q(s,a)\right] $$$$ Q(s,a)\leftarrow Q(s,a)+\alpha\left[r+\gamma Q(s',a')-Q(s,a)\right] $$- 第一条是 Q-learning,第二条是 SARSA。 / The first update is Q-learning, and the second is SARSA.
7. K-Fold / 交叉验证
- K 折切分 -> 轮流验证 -> 求平均分 -> 选最优参数 -> 全量重训。
Split into K folds -> validate in turns -> average scores -> choose the best hyperparameters -> retrain on the full training set.
七、复习建议 / Revision Tips
- 先背题型,再背句式 / Memorize question types before memorizing wording
先把“概念题、对比题、公式题、代码题、场景题”分类,再分别记答案骨架。
First classify questions into conceptual, comparison, formula, coding, and scenario-based types, then memorize the answer skeleton for each type.
2. PyMC 代码题必须会“分组解释” / Grouped explanation is essential for PyMC questions
用“导入 -> 读数 -> 先验 -> 确定性节点 -> 似然 -> 采样 -> 诊断”的顺序答。
Answer in the order of imports -> data loading -> priors -> deterministic nodes -> likelihood -> sampling -> diagnostics.
3. PCA、SVM、EM、RL 要会写一句定义 + 一句公式 + 一句应用 / For PCA, SVM, EM, and RL, know one definition, one formula, and one application sentence
这类题最容易用“三句法”拿稳基础分。
These topics are easiest to secure with a “three-sentence method.”
4. 对比题优先表格式思维 / Use tabular thinking for comparison questions
Ridge vs LASSO、K-Means vs EM、DQN vs SARSA 都适合按“相同点 + 3 个不同点 + 应用场景”作答。
Ridge vs LASSO, K-Means vs EM, and DQN vs SARSA are best answered as “one similarity + three differences + use case.”
5. 场景题不要只讲算法 / Do not discuss only the algorithm in scenario questions
要把业务背景、数据来源、变量定义和决策价值一起写出来。
Always include the business context, data source, variable definition, and decision value.
附录:核心考点参考答案
A. 概念类高频题参考答案
A1. Smart Manufacturing vs Intelligent Manufacturing
- Smart Manufacturing(智能制造):广义概念,指整合传感器、云计算、数据分析、自动化技术,使制造过程能够快速响应需求变化、最小化环境影响、优化生产效率。强调"数据驱动 + 柔性响应"。
- Intelligent Manufacturing(智能化制造):在 Smart Manufacturing 基础上,进一步引入 人工智能(AI)、机器学习、专家系统 来替代或辅助人工决策,具备 自感知、自学习、自决策、自执行 能力。强调"AI 驱动 + 自主决策"。
- 核心区别:Smart 是 数字化 + 自动化,Intelligent 是 Smart + 自主智能决策;前者侧重数据流通与响应速度,后者侧重认知能力与主动优化。
A2. Big Data 定义(5 V 模型)
Big Data 指无法用传统工具在合理时间内捕获、存储、管理和处理的数据集合,具备 5V 特性:
- Volume 体量大(TB/PB 级)
- Velocity 速度快(实时/流式)
- Variety 多样性(结构化、半结构化、非结构化)
- Veracity 真实性(数据质量、不确定性)
- Value 价值密度低(海量原始数据中提炼价值)
Smart/Intelligent Manufacturing 依赖大数据的原因:传感器 7×24 小时产生海量时序数据,只有通过大数据技术才能提取故障预测、质量优化、需求预测等价值。
A3. Industry 4.0 的五大关键技术
- Cyber-Physical Systems (CPS):物理设备与数字孪生模型双向联动
- Internet of Things (IoT):设备互联、实时数据采集
- Big Data & Analytics:从海量数据中提取洞察、驱动决策
- Cloud Computing / Edge Computing:弹性算力、边缘实时响应
- AI & Machine Learning:预测性维护、质量检测、优化调度
(扩展选项:AR/VR、3D 打印、机器人协作、区块链、5G)
A4. Industry 4.0 的收益
- 生产效率提升(减少停机、优化调度)
- 质量一致性(实时 SPC、AI 质检)
- 柔性与定制化(小批量、按需生产)
- 预测性维护(减少 40–50% 意外停机)
- 供应链可见性(端到端追溯)
- 节能减排(能源管理与优化)
A5. Data Science vs Big Data 差异
| 维度 | Data Science | Big Data |
|---|---|---|
| 关注点 | 方法与模型:统计、ML、可视化 | 数据本身:存储、采集、规模 |
| 目标 | 从数据中 发现洞察/预测 | 能够 处理海量异构数据 |
| 工具 | Python、R、Scikit-learn、PyMC | Hadoop、Spark、Kafka、NoSQL |
| 产出 | 模型、报告、决策建议 | 数据管道、湖仓架构 |
二者相互依存:Big Data 提供"燃料",Data Science 提供"引擎"。
A6. 数据分析四大挑战
- 数据质量(缺失、噪声、异构、不一致)
- 数据规模与算力(存储成本、实时处理)
- 隐私与合规(PDPA、GDPR、数据脱敏)
- 可解释性与信任(黑箱模型、业务采信)
(可扩展:数据孤岛、人才短缺、模型漂移)
B. PCA / 降维参考答案
B1. PCA 预处理步骤
- 中心化:每列减去均值 $x_i \leftarrow x_i - \bar{x}$
- 标准化/归一化:除以标准差(当各特征量纲不一致时必要)
- 计算协方差矩阵 $\Sigma = \frac{1}{n-1} X^T X$
- 特征值分解:$\Sigma = V\Lambda V^T$,得到特征值 $\lambda_i$ 与特征向量 $v_i$
- 按特征值降序排列 主成分,选取累计方差 ≥ 阈值(如 95%)的前 k 个
- 投影:$Z = X V_k$,得到降维数据
B2. 词频矩阵 PCA 分析共同作者
- 3 篇散文 × 词频向量 → PCA 得到前两主成分;
- 若 第一主成分占 83%,说明 3 篇文章沿某一方向(共同文风特征)高度相关;
- 若 3 篇散文的第一主成分得分 非常接近,则三者可能出自 同一作者(共同用词习惯形成同一主方向)。
- 第二主成分占 16% 代表次要差异(题材/年代)。
B3. Scree Plot(碎石图)
- 横轴:主成分编号 1,2,3,…
- 纵轴:对应特征值或方差百分比
- 用法:选择 “肘部” 处的主成分数,或对应累计方差 ≥ 95% 的最小 k 值。
B4. 特征值分解 vs SVD
- Eigendecomposition:$\Sigma = V\Lambda V^T$,仅适用于 方阵 + 对称;
- SVD:$X = U\Sigma V^T$,适用于 任意矩阵;
- SVD 数值更稳定(无需显式构造 $X^TX$,避免条件数平方放大);
- 当 $\Sigma = X^TX/(n-1)$ 时,SVD 的 $V$ 即为 PCA 主方向,奇异值 $\sigma_i^2/(n-1) = \lambda_i$。
B5. LASSO 与特征降维
- LASSO:L1 正则 $\min y-X\beta^2 + \lambda\beta_1$,使部分 $\beta_j=0$,实现 变量选择;
- 与 PCA 区别:PCA 创造新的线性组合(黑箱特征),LASSO 保留原特征但剔除不重要的(可解释);
- LASSO 可看作"硬“降维(特征选择),PCA 是"软"降维(特征提取)。
B6. Ridge vs LASSO
| Ridge (L2) | LASSO (L1) | |
|---|---|---|
| 正则项 | $\lambda | \beta |
| 解的稀疏性 | 系数收缩但不为 0 | 部分系数为 0(稀疏) |
| 特征选择 | 否 | 是 |
| 适用 | 所有特征都相关、多重共线性 | 特征众多但只有少数真正相关 |
| 几何直觉 | 圆形约束 | 菱形约束(角点导致稀疏) |
C. SVM / Kernel 参考答案
C1. SVM 对偶问题
- 原问题:$\min \frac{1}{2}w^2$,s.t. $y_i(w^Tx_i+b) \ge 1$
- 对偶:$\max_\alpha \sum\alpha_i - \frac{1}{2}\sum_i\sum_j \alpha_i\alpha_j y_i y_j k(x_i,x_j)$
- 约束:$\alpha_i \ge 0, \sum\alpha_i y_i = 0$
- 决策边界:$f(x) = \text{sign}\left(\sum_i \alpha_i y_i k(x_i,x) + b\right)$
- 仅支持向量($\alpha_i > 0$)参与决策。
C2. Kernel Trick
- 思想:用 $k(x,x') = \phi(x)^T\phi(x')$ 替代显式映射 $\phi$;
- 好处:无需在高维/无限维空间显式计算内积,避免维数灾难;
- 常见核:线性、多项式 $(x^Tx'+c)^d$、RBF $\exp(-\gammax-x'^2)$、Sigmoid;
- 计算降维意义:原本 $\phi(x)$ 可能是高维甚至无限维,但 $k(x,x')$ 只是一个标量内积,计算复杂度从 $O(D)$ 降为 $O(d)$($d$ 为原特征维度)。
C3. Mercer 定理
- 核函数 $k(x,x')$ 能写成某个内积 $\phi(x)^T\phi(x')$ 的 充要条件 是:
- 对任意有限点集 $x_1,\dots,x_n$,对应的 Gram 矩阵 $K_{ij}=k(x_i,x_j)$ 半正定(PSD)。
- 意义:保证存在隐式特征空间,SVM 对偶问题为凸规划,存在全局最优。
- 实用:验证自定义核合法性(RBF、多项式都满足 Mercer)。
D. 神经网络 / 反向传播参考答案
D1. 2-3-2 ANN 前向 + MSE(2022-2023 Q4)
给定 $X=[7.220, 6.204]$、权重 $W^{(1)} \in \mathbb{R}^{3\times2}$、$W^{(2)} \in \mathbb{R}^{2\times3}$:
- 隐藏层:$z^{(1)} = W^{(1)} X$,$o^{(1)} = \sigma(z^{(1)})$
- 输出层:$z^{(2)} = W^{(2)} o^{(1)}$,$o^{(2)} = \sigma(z^{(2)})$
- MSE:
给定 $y=[1,0]$,$\hat{y}=[0.475, 0.505]$:
$$ \text{MSE} = \frac{1}{2}[(1-0.475)^2 + (0-0.505)^2] = \frac{1}{2}[0.2756 + 0.2550] = 0.2653 $$(若使用 $\frac{1}{n}$ 版本则 $=\text{上式}$;这里 $n=2$ 两种写法数值相同。)
D2. 反向传播链式求导
以输出层为例:
$$ \frac{\partial L}{\partial w^{(2)}} = \frac{\partial L}{\partial o_2} \cdot \frac{\partial o_2}{\partial z^{(2)}} \cdot \frac{\partial z^{(2)}}{\partial w^{(2)}} $$- $\frac{\partial L}{\partial o_2} = -(y-o_2)$(MSE 对输出求导)
- $\frac{\partial o_2}{\partial z^{(2)}} = o_2(1-o_2)$(Sigmoid 导数)
- $\frac{\partial z^{(2)}}{\partial w^{(2)}} = o_1$(前一层激活值)
三项意义:
- 预测误差信号;
- 激活函数对净输入的局部敏感度;
- 权重所乘的输入值。
乘积即为”误差 × 局部梯度 × 输入",沿层反向传递更新参数。
D3. Sigmoid 函数
$\sigma(x) = \frac{1}{1+e^{-x}}$,$\sigma'(x) = \sigma(x)(1-\sigma(x))$;把任意实数映射到 (0,1),常作为二分类输出或旧式隐藏层激活。
E. 贝叶斯统计 / PyMC 参考答案
E1. PyMC 贝叶斯回归代码典型结构
with pm.Model() as ship_model:
alpha = pm.Normal('a', mu=0, sigma=1) # 截距先验
beta = pm.Normal('b', mu=0, sigma=2) # 系数先验
sigma = pm.HalfNormal('e', sigma=1) # 噪声标准差(正值)
mu = pm.Deterministic('mu', alpha + beta*x_3) # 确定性变换
y_obs = pm.Normal('y_pred', mu=mu, sd=sigma,
observed=y_3) # 似然函数
trace = pm.sample(2000) # MCMC 采样后验
逐行含义:
pm.Normal(..., sigma=...):先验分布(参数未知值的初始信念)pm.HalfNormal:只允许正值,常用于尺度参数/标准差pm.Deterministic:封装中间确定性计算,便于跟踪observed=y_3:挂载实际观测数据 → 变为 似然pm.sample(2000):NUTS/Metropolis 采样 2000 次后验样本
E2. HPD vs ROPE
- HPD(Highest Posterior Density)区间:后验密度最高的区域,使包含概率 = 95%(或指定水平)的最小区间;
- ROPE(Region of Practical Equivalence):研究者定义的"实际无差异“区间(例如 $\beta \in [-0.1, 0.1]$ 视为无效应);
- 关系:
- HPD ⊂ ROPE ⇒ 接受零效应;
- HPD ∩ ROPE = ∅ ⇒ 拒绝零效应;
- 部分重叠 ⇒ 证据不充分,暂不决策。
E3. Sampling Divergence(采样发散)
- 现象:NUTS/HMC 采样过程中,哈密顿能量误差超出阈值,表明后验几何复杂、步长过大;
- 后果:后验估计有偏,诊断图(trace plot / pair plot)出现明显漂移;
- 解决:
- 增大
target_accept(0.9 → 0.95 → 0.99) - 重参数化(如 non-centered parameterization)
- 收紧先验
- 增大
tune(burn-in) - 检查模型(是否存在多峰、尺度不匹配)
- 增大
E4. Dirichlet + Normal Mixture(2025-2026 Q4)
p = pm.Dirichlet('p', a=np.ones(K)) # K 类别混合权重
means = pm.Normal('means', mu=X.mean(), sigma=10, shape=K)
sd = pm.HalfNormal('sd', sigma=10)
y = pm.NormalMixture('y', w=p, mu=means, sigma=sd, observed=X)
- Dirichlet:混合权重先验,保证 $\sum p_k = 1, p_k \ge 0$;$\alpha=\mathbf{1}$ 等价于均匀先验;
- NormalMixture:$p(x) = \sum_k p_k \mathcal{N}(x;\mu_k,\sigma)$,可刻画多峰分布;
- 动机:单一正态无法拟合 NORM/WARN/FAIL 三类状态,混合模型自然对应"多机器状态聚类”。
E5. Negative Binomial 的参数
$\text{NB}(y; r, p)$:
- $r$(或 $\alpha$):离散参数(控制方差/均值比)
- $p$(或 $\mu$):成功概率/均值
- 相对 Poisson 的优势:允许 过度离散(variance > mean),更适合计数型集装箱/故障数据。
F. 集成学习 / 无监督学习 参考答案
F1. Bagging vs Pasting vs Voting
| Bagging | Pasting | Voting | |
|---|---|---|---|
| 采样 | 有放回 Bootstrap | 无放回 | 不采样 |
| 模型 | 同类模型多个副本 | 同类模型多个副本 | 不同类型模型 |
| 合并 | 平均/多数投票 | 平均/多数投票 | 硬投票/软投票 |
| 典型 | Random Forest | — | Stacking 前置 |
- Variance Reduction:Bagging 通过平均 N 个模型将方差降至约 $1/N$(相关时下降较少),显著缓解过拟合。
- Overfitting:单棵决策树易过拟合;Bagging 聚合后方差 ↓,偏差几乎不变,泛化提升。
F2. K-Means vs EM
| 维度 | K-Means | EM (GMM) |
|---|---|---|
| 簇形状 | 球形(欧氏距离) | 椭圆(协方差矩阵) |
| 归属 | 硬分配(每点只属于一簇) | 软分配(概率 $\gamma_{ik}$) |
| 参数 | 质心 $\mu_k$ | $\mu_k, \Sigma_k, \pi_k$ |
| 算法 | 交替:分配 → 更新质心 | E-step(求责任) + M-step(更新参数) |
| 优化目标 | SSE | 最大似然 |
K-Means 可视为 GMM 在 $\Sigma_k=\sigma^2 I, \sigma\to 0$ 的特例。
F3. EM 算法用于 MAR 缺失值填补
- 前提:数据 Missing At Random (MAR),缺失仅依赖观测变量。
- E-step:基于当前参数 $\theta^{(t)}$,计算缺失值的期望 $\mathbb{E}[x_\text{miss}|x_\text{obs}, \theta^{(t)}]$;
- M-step:用填补后的完整数据最大化似然,更新参数 $\theta^{(t+1)}$;
- 迭代至收敛,最终既得到模型参数,也得到缺失值估计。
与简单删除对比:EM 利用了所有观测的边际信息,比 listwise deletion 更高效,偏差更小。
F4. 异常值检测(两个判定标准)
- 统计法:
- $|x - \mu| > 3\sigma$(3-sigma 法)
- IQR 法:超出 $Q_1 - 1.5\text{IQR}$ 或 $Q_3 + 1.5\text{IQR}$
- 基于模型:
- Isolation Forest:孤立路径短的点即异常;
- LOF(Local Outlier Factor):局部密度显著低于邻居;
- One-Class SVM:只学"正常"边界,外部为异常。
检测流程示例(Isolation Forest):
1. 划分特征、标准化
2. 训练 IsolationForest(n_estimators=100)
3. 预测 score → 分位数阈值判定
4. 可视化(PCA → 2D 散点 + 异常标红)
5. 领域专家复核 → 保留 / 修正 / 剔除
G. 强化学习参考答案
G1. RL Agent 决策流程
- 观察状态 $s_t$
- 依据策略 $\pi(a|s)$ 选择动作(如 ε-greedy)
- 环境返回 奖励 $r_t$ 和新状态 $s_{t+1}$
- 更新 价值函数/策略:$Q(s,a) \leftarrow Q(s,a) + \alpha[r + \gamma \max_{a'}Q(s',a') - Q(s,a)]$
- 重复至收敛
核心要素:MDP $(\mathcal{S}, \mathcal{A}, P, R, \gamma)$。
G2. SARSA vs DQN vs Q-Learning
| Q-Learning | SARSA | DQN | |
|---|---|---|---|
| 更新目标 | $\max_{a'} Q(s',a')$ | $Q(s', a')$(实际执行动作) | $\max_{a'} Q_{\text{target}}(s',a')$ |
| 策略 | Off-policy | On-policy | Off-policy |
| 函数近似 | 表格 | 表格 | 神经网络 + 经验回放 + target net |
| 适用 | 小型离散 | 小型离散 | 高维连续观测(像素、传感器) |
G3. 10×10 迷宫 RL 建模
- Agent:机器人
- Environment:10×10 网格(含固定墙壁/障碍)
- States:机器人位置 $(i,j)$,$|\mathcal{S}|=100$(部分不可达)
- Actions:{上, 下, 左, 右}
- Rewards:到达终点 +100;撞墙 -10;每步 -1(鼓励最短路径)
- 推荐算法:表格 Q-Learning 或 SARSA
- 状态空间仅 100,无需深度网络;
- Q-Learning 收敛快、可求最优;
- SARSA 更保守(在学习阶段安全避障)。
- 若障碍动态变化 → 改用 DQN 以捕捉更丰富的上下文。
G4. DQN 核心组件
- Q-Network $Q_\theta$:预测每个动作的 Q 值
- Target Network $Q_{\theta^-}$:周期性同步,稳定目标
- Experience Replay:$(s,a,r,s')$ 存入缓冲区,随机采 batch 训练 → 打破样本相关性
- ε-greedy:探索 vs 利用
- 损失:$L(\theta) = \mathbb{E}\left[(r + \gamma \max_{a'} Q_{\theta^-}(s',a') - Q_\theta(s,a))^2\right]$
- 参数更新:$\theta \leftarrow \theta - \alpha \nabla_\theta L$
G5. SARSA 算法要素
- $Q(s,u)$:状态-动作价值表
- $r(i,u)$:即时奖励
- $r_\text{max}$:最大迭代步数(episode 长度)
- $\alpha$:学习率(通常 0.1–0.5,随训练衰减)
- ε-greedy 中的 ε 随训练 逐步衰减(exploration → exploitation)
- 更新:$Q(s,u) \leftarrow Q(s,u) + \alpha[r + \gamma Q(s',u') - Q(s,u)]$
- 与 Q-Learning 差别:$u'$ 是实际采取的动作,而非 $\max$。
H. K-Fold 交叉验证参考答案
H1. K-Fold 基本流程
- 将训练集均分 K 折(常见 K=5 或 10)
- 对每一折:用其余 K-1 折训练、该折验证、记录性能
- 求 K 折平均性能,作为该超参数下的估计
- 选择平均性能最优的超参数 → 用全部训练集重新训练 → 在独立测试集评估
H2. 回归模型的性能度量
- MSE:$\frac{1}{n}\sum(y_i - \hat{y}_i)^2$(对大误差敏感)
- RMSE:$\sqrt{\text{MSE}}$(与 $y$ 同量纲)
- MAE:$\frac{1}{n}\sum|y_i-\hat{y}_i|$(对离群点稳健)
- R²:$1 - \frac{\sum(y_i-\hat{y}_i)^2}{\sum(y_i-\bar{y})^2}$(解释方差比例)
推荐:当离群点多 → MAE;量纲报告 → RMSE;对比不同模型 → R²。
H3. 泛化能力的解释
- 交叉验证分数反映模型在 未见数据 上的期望性能,是 泛化误差 的估计;
- K-Fold 平均减小单次划分的方差,降低估计不确定性;
- 若 训练分数 » CV 分数 → 过拟合;反之 → 欠拟合;
- 配合 learning curve / validation curve 诊断偏差-方差权衡。
I. Python / Pandas 参考代码片段
I1. CNC 预测性维护(2025-2026 Q2)伪代码
import pandas as pd, numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
# 1. 加载 + 预处理
df = pd.concat([pd.read_csv(f'cnc_{i}.csv') for i in range(1,13)])
df = df.dropna(subset=['machine_state']) # 保留有标签行
num_cols = ['x','y','z','br','spindle_RPM','power_kW']
df[num_cols] = df[num_cols].fillna(df[num_cols].median())
# 2. 标准化
scaler = StandardScaler()
X = scaler.fit_transform(df[num_cols])
# 3. PCA 至 95% 方差
pca = PCA(n_components=0.95)
X_pca = pca.fit_transform(X)
print(f'选用 {pca.n_components_} 个主成分,累计方差 {pca.explained_variance_ratio_.sum():.3f}')
# 4. Scree plot
plt.plot(np.cumsum(pca.explained_variance_ratio_), marker='o')
plt.axhline(0.95, color='r', ls='--'); plt.show()
# 5. 载荷矩阵 → 最重要的传感器
loadings = pd.DataFrame(pca.components_.T, index=num_cols)
print(loadings.abs().sum(axis=1).sort_values(ascending=False))
推荐:spindle_RPM 与 bearing 温度 通常对 FAIL 状态最敏感(载荷绝对值最大)。
I2. 词频矩阵清洗(2022-2023 Q2)
from collections import Counter
def clean_words(words):
return [w.lower().strip('.,!?') for w in words if w.isalpha()]
essays = []
for i, path in enumerate(file_paths):
with open(path) as f:
raw = f.read().split()
essay = clean_words(raw)
essays.append(Counter(essay)) # 每篇的词频字典
# 统一词汇表 → 构造矩阵
vocab = sorted(set().union(*essays))
matrix = np.array([[e.get(w,0) for w in vocab] for e in essays])
J. 综合答题策略
| 场景 | 思路 |
|---|---|
| 概念题(5–10 分) | 定义 + 2–3 个特点 + 1 个应用例子 |
| 代码解释(逐行) | 分组:导入 → 数据加载 → 预处理 → 模型 → 采样/训练 → 诊断 |
| 算法步骤题 | 按 初始化 → 迭代(含公式) → 终止 三段式 |
| 对比题 | 用 表格 呈现,至少列 3 条区别 + 1 条共同点 |
| 场景建模题(如 RL) | 状态 / 动作 / 奖励 / 算法 / 评估 五要素 |
| 数学推导 | 写出公式 → 代入数值 → 保留 3–4 位小数 → 下结论 |
附录 K:原始代码解析部分(恢复保留)
这一部分作为“原本代码解析区”的恢复版保留。上面的中英双语内容是补充,这里继续保留偏原始复习笔记风格的代码题详解。
K1. PyMC 贝叶斯线性回归(出现概率:极高)
背景
三套试卷每年都考 PyMC 代码逐行解释。场景是“航运集装箱数量 y_3 与某变量 x_3 的关系建模”。
完整代码
import pymc as pm # 第 1 行
import numpy as np # 第 2 行
import pandas as pd # 第 3 行
# 第 4 行:加载数据
df = pd.read_excel('shipping.xlsx', sheet_name='container')
x_3 = df['month'].values # 第 5 行:自变量(月份)
y_3 = df['container_count'].values # 第 6 行:因变量(集装箱数)
# 第 7 行:构建贝叶斯模型
with pm.Model() as ship_model:
# 第 8-10 行:先验分布
alpha = pm.Normal('alpha', mu=0, sigma=10)
beta = pm.Normal('beta', mu=0, sigma=10)
sigma = pm.HalfNormal('sigma', sigma=1)
# 第 11 行:确定性变换(线性预测)
mu = pm.Deterministic('mu', alpha + beta * x_3)
# 第 12 行:似然函数
y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma, observed=y_3)
# 第 13 行:MCMC 采样后验
trace = pm.sample(2000, tune=1000, target_accept=0.95)
# 第 14 行:查看后验摘要
print(pm.summary(trace, var_names=['alpha', 'beta', 'sigma']))
0 基础逐行解释
| 行号 | 代码 | Python 语法 | 作用 / 数学含义 |
|---|---|---|---|
| 1 | import pymc as pm | import ... as 给库起别名 | 导入贝叶斯建模库 PyMC,后续用 pm. 调用 |
| 2 | import numpy as np | 同上 | NumPy 用于数值数组运算 |
| 3 | import pandas as pd | 同上 | Pandas 用于表格数据处理 |
| 4 | pd.read_excel(...) | 函数调用,返回 DataFrame | 读取 Excel 文件,sheet_name 指定工作表 |
| 5-6 | df['col'].values | 取出列,.values 转为 NumPy 数组 | 把 DataFrame 的列转成数组,供模型使用 |
| 7 | with pm.Model() as ship_model: | with 上下文管理器 | 创建模型容器;with 内所有 pm.xxx 都自动加入该模型 |
| 8 | pm.Normal('alpha', mu=0, sigma=10) | 构造先验 | 截距 α 的先验:N(0, 10²),表示“初始信念是 0,但不确定性大” |
| 9 | pm.Normal('beta', ...) | 同上 | 斜率 β 的先验,对应回归系数 |
| 10 | pm.HalfNormal('sigma', sigma=1) | 半正态分布 | 噪声标准差 σ 的先验,只能取正值(标准差不能是负的) |
| 11 | pm.Deterministic('mu', alpha + beta*x_3) | 确定性节点 | 记录中间量 μ = α + βx,这不是随机变量,是确定的计算 |
| 12 | pm.Normal('y_obs', mu=mu, sigma=sigma, observed=y_3) | 似然 | 关键:observed=y_3 把数据挂上去,变成似然 `P(y |
| 13 | pm.sample(2000, tune=1000, target_accept=0.95) | MCMC 采样 | 用 NUTS 算法从后验抽 2000 个样本;tune 是预热,target_accept 高可减少发散 |
| 14 | pm.summary(trace, ...) | 后验摘要 | 打印每个参数的均值、标准差、94% HPD 区间等 |
考点变形
- 改
pm.HalfNormal为pm.Normal:问题?答:σ可能取负,违反标准差非负约束。 - 没写
observed=...:不再是似然,只是另一个先验,模型无法“学习”数据。 - 出现 divergence:解决办法见上文的采样发散部分,例如提升
target_accept、重参数化、收紧先验。
K2. PCA 主成分分析 + Scree Plot(出现概率:极高)
背景
PCA 是三套试卷必考题。场景从“作者鉴定词频矩阵”到“CNC 传感器预测性维护”都有。
完整代码
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
# 第 1 步:加载数据
df = pd.read_csv('cnc_sensors.csv')
features = ['temp_x', 'temp_y', 'temp_z', 'bearing_temp',
'spindle_RPM', 'power_kW']
X = df[features].values # shape: (n_samples, 6)
# 第 2 步:处理缺失值
X = pd.DataFrame(X).fillna(pd.DataFrame(X).median()).values
# 第 3 步:标准化(关键!)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# 第 4 步:PCA 降维到 95% 方差
pca = PCA(n_components=0.95)
X_pca = pca.fit_transform(X_scaled)
print(f'原始维度: {X.shape[1]}')
print(f'降维后: {X_pca.shape[1]}')
print(f'各主成分方差占比: {pca.explained_variance_ratio_}')
print(f'累计方差: {pca.explained_variance_ratio_.sum():.4f}')
# 第 5 步:Scree Plot
plt.figure(figsize=(8, 5))
plt.plot(range(1, len(pca.explained_variance_ratio_)+1),
np.cumsum(pca.explained_variance_ratio_),
marker='o', linestyle='--')
plt.axhline(y=0.95, color='r', linestyle='-', label='95% 阈值')
plt.xlabel('主成分编号')
plt.ylabel('累计方差占比')
plt.title('Scree Plot')
plt.legend()
plt.grid(True)
plt.show()
# 第 6 步:载荷分析(哪些传感器最重要)
loadings = pd.DataFrame(
pca.components_.T,
index=features,
columns=[f'PC{i+1}' for i in range(pca.n_components_)]
)
print('载荷矩阵:')
print(loadings)
print('\n各特征总重要性:')
print(loadings.abs().sum(axis=1).sort_values(ascending=False))
0 基础逐行解释
| 代码片段 | 含义 |
|---|---|
df[features].values | 从 DataFrame 选出指定列,转为 NumPy 二维数组 |
.fillna(...median()) | 把缺失值 NaN 用中位数填补(比均值更抗离群点) |
StandardScaler().fit_transform(X) | 必做的标准化:每列减均值除以标准差,使所有特征量纲一致。PCA 对量纲敏感,不做标准化会被大数值特征主导 |
PCA(n_components=0.95) | 小数表示“保留累计方差 ≥ 95% 的最少主成分数”;整数表示“保留 n 个” |
pca.fit_transform(X_scaled) | 拟合 + 变换,输出降维后的数据 |
pca.explained_variance_ratio_ | 每个主成分解释的方差占比(如 [0.65, 0.20, 0.10]) |
np.cumsum(...) | 累计求和,用于画 scree plot |
pca.components_ | 主成分向量矩阵,shape 为 (n_components, n_features),转置后叫“载荷” |
.abs().sum(axis=1) | 每个特征在所有主成分上的绝对值之和,近似衡量其整体重要性 |
考点变形
- 为什么必须标准化?:温度单位是 °C,RPM 单位是 rpm,数值差几个数量级,不标准化会让 RPM 主导方差。
- Scree Plot 如何选
k?:肘部法(曲线斜率明显变缓处)或 95% 方差阈值。 - PCA vs LASSO?:PCA 是特征提取(新组合特征),LASSO 是特征选择(保留原特征,剔除
β=0的)。
K3. 神经网络前向传播 + MSE(出现概率:高)
背景
2022-2023 Q4 考过 2-3-2 结构 ANN 的前向传播 + MSE 计算。
完整代码
import numpy as np
# 输入:2 维
X = np.array([7.220, 6.204])
# 权重矩阵
# W1: 3x2(隐藏层 3 个神经元,每个接收 2 个输入)
W1 = np.array([
[0.1, 0.2],
[0.3, 0.4],
[0.5, 0.6],
])
# W2: 2x3(输出层 2 个神经元,每个接收 3 个隐藏层输入)
W2 = np.array([
[0.1, 0.2, 0.3],
[0.4, 0.5, 0.6],
])
# 真实标签
y = np.array([1.0, 0.0])
# Sigmoid 激活函数
def sigmoid(x):
return 1.0 / (1.0 + np.exp(-x))
# --- 前向传播 ---
# 第 1 层
z1 = W1 @ X # (3,2) @ (2,) = (3,)
o1 = sigmoid(z1) # 隐藏层激活
# 第 2 层
z2 = W2 @ o1 # (2,3) @ (3,) = (2,)
o2 = sigmoid(z2) # 输出层激活(预测值)
# MSE 损失
mse = 0.5 * np.sum((y - o2) ** 2)
print(f'z1 = {z1}')
print(f'o1 = {o1}')
print(f'z2 = {z2}')
print(f'预测 o2 = {o2}')
print(f'MSE = {mse:.4f}')
0 基础逐行解释
| 代码 | 数学含义 |
|---|---|
np.array([...]) | 创建一维/二维数组(向量/矩阵) |
W1 @ X | @ 是 NumPy 的矩阵乘法算子;等价于 np.dot(W1, X) |
z1 = W1 @ X | 线性变换:$z^{(1)}*i = \sum_j W^{(1)}*{ij} x_j$ |
sigmoid(z1) | 非线性激活:$\sigma(z) = \frac{1}{1+e^{-z}}$,把实数压到 (0,1) |
0.5 * np.sum((y - o2)**2) | MSE: $L = \frac{1}{2}\sum(y_i - \hat{y}_i)^2$ |
考点变形
- 改用 ReLU:
o1 = np.maximum(0, z1) - 多样本批量:
X变成(batch_size, 2),公式改为X @ W1.T - 反向传播: 要求写出
dL/dW2 = (o2 - y) * o2 * (1-o2) · o1.T这样的链式求导
K4. SVM + RBF 核(恢复补全版)
背景
2022-2023 Q3 考过“三样本 + RBF 核”手算题。这里保留一个代码版,帮助把“核函数、支持向量、分类边界”三件事连起来理解。
完整代码
import numpy as np
from sklearn.svm import SVC
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, accuracy_score
# 第 1 步:构造二维二分类数据
X, y = make_classification(
n_samples=300,
n_features=2,
n_informative=2,
n_redundant=0,
n_clusters_per_class=1,
class_sep=1.2,
random_state=42
)
# 第 2 步:划分训练集 / 测试集
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, stratify=y, random_state=42
)
# 第 3 步:标准化
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# 第 4 步:训练 RBF 核 SVM
svm = SVC(kernel='rbf', C=1.0, gamma='scale')
svm.fit(X_train, y_train)
# 第 5 步:预测
y_pred = svm.predict(X_test)
# 第 6 步:评估
print('accuracy =', accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))
# 第 7 步:查看支持向量
print('support_vectors =', svm.support_vectors_.shape[0])
print('dual_coef =', svm.dual_coef_)
print('intercept =', svm.intercept_)
0 基础逐行解释
| 代码片段 | 含义 |
|---|---|
make_classification(...) | 生成一个可用于分类练习的模拟数据集 |
train_test_split(...) | 切分训练集和测试集,用训练集学模型,用测试集看泛化效果 |
StandardScaler() | 标准化输入特征,避免某个量纲特别大的变量主导距离计算 |
SVC(kernel='rbf', C=1.0, gamma='scale') | 建立使用 RBF 核的支持向量分类器 |
svm.fit(X_train, y_train) | 用训练数据拟合分类边界 |
svm.predict(X_test) | 对测试集做分类预测 |
accuracy_score(...) | 计算分类准确率 |
classification_report(...) | 输出 precision、recall、f1-score 等指标 |
svm.support_vectors_ | 返回真正决定分类边界的支持向量 |
svm.dual_coef_ | 对偶问题中的系数,对应课堂里的 $\alpha_i y_i$ |
svm.intercept_ | 分类超平面的偏置项 b |
对应考试怎么写
- 若题目考核函数的意义:先写
k(x,x') = \phi(x)^T \phi(x'),再说明它避免显式映射。 - 若题目考 RBF 核:写出
$$ k(x,x')=\exp(-\gamma x-x'^2) $$ 并说明两个样本越接近,核值越大。 - 若题目考决策边界:写出
$$ f(x)=\operatorname{sign}\left(\sum_i \alpha_i y_i k(x_i,x)+b\right) $$ - 若题目问为什么要标准化:因为 SVM 尤其是 RBF 核依赖距离,量纲不统一会扭曲距离结构。
考点变形
**C变大**:更强调训练集分类正确,边界更“硬”,过拟合风险上升。**gamma变大**:RBF 核作用范围变小,边界会更弯曲。- 线性核 vs RBF 核:线性核适合近似线性可分;RBF 核更适合非线性边界。

