联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codehelp

您当前位置:首页 >> CS程序CS程序

日期:2023-10-05 07:48

Week 2 Homework
1
Homework
(Level 1: high-level understanding) (10 pt)
? Which of the following can describe SVM? (select all that apply)
? It is a supervised learning algorithm.
? It is an unsupervised learning algorithm.
? It can be used to solve a classification problem.
? The algorithm is based on the concept of finding a hyperplane that separates
the data with a maximal margin, in order to classified future data points with
more confidence.
? Which of the following problems SVM is best suited for? (select one)
? Predict housing prices based on location, size, and other factors.
? Classify images, such as identifying the contents of a photograph
? Group customers according to their purchasing behavior
? Extract important features from a dataset, allowing for more accurate analysis
2
Homework
(Level 1: high-level understanding) (5 pt)
? What is the purpose of the training set, validation set, and testing set
in machine learning? (One each)
? To evaluate the performance of the model on unseen data
? To fine-tune the model's parameters and learn the underlying patterns in the
data
? To evaluate the performance of the model during the training process
Homework
(Level 1: high-level understanding) (5 pt)
? What are the ML problems due to data?
1. If we are trying to predict the price of a house, but the training data includes
irrelevant information such as the color of the house or the name of the previous
owner
2. If we are trying to predict the success of a new product based on customer data,
but we only have a small amount of historical customer data.
3. If we are trying to predict customer behavior but the data contains a lot of missing
values or errors
4. If we are trying to predict customer behavior, but the training data only includes
data from a single demographic group.
? The choices are (one each)
a. insufficient quantity of training data
b. nonrepresentative training data
c. poor-quality data
d. irrelevant features
Homework
(Level 2: manual exercise) (20 pt)
? Given the following dataset of labeled points in two dimensions, use a
support vector machine to find the best separating hyperplane.
(Note: please use hard margin. Using high-school geometry should be
sufficient; no need to solve the NP optimization problem.)
? Positive samples: {(3,3), (4,4)}
? Negative samples: {(2,1), (1,2)}
? Use the constructed hyperplane to predict the class of a new data
point at (2,2.4)
5
Homework
(Level 3: extension of the basic algorithm) (20 pt)
? There may be outliers or noises in the data from real-world applications.
? To address this issue, a soft margin can be used in a modified optimization
problem, known as a soft-margin SVM:
? Objective: min !
"
 " +   ∑#  #
? Constraint:  #   ?  # +   = 1 ?  # and  # ≥ 0
?  # is the slack, which allows  # to be inside the margin
? SVM without the slacks is known as hard-margin SVM.
? Where is  ! relative to where the margin is when its  ! value is 0?
? Where is  ! relative to where the margin is when 0 ≤  ! ≤ 1?
? Where is  ! relative to where the margin is when  ! > 1?
6
Homework
(Level 4: computer-based exercise) (20 pt)
? (Using HW2-4.ipynb as the template.)
? This is related to HW1-4
? Using popular scikit-learn package for SVM
? Assuming (0, 1, 1) is the ground truth of the decision boundary, create 40 unique
samples (20 are positive and 20 are negative).
? First, evenly split the 40 samples into two sets: one is called training samples, and the
other is called testing samples.
? Second, train a hard-margin SVM using 100% of the training samples, and test the accuracy of the unseen testing samples. (Repeat 10 times for the average accuracy)
? Third, train a hard-margin SVM using 60% of the training samples (e.g., 6 positive and 6
negative samples), and test the accuracy of the unseen testing samples. (Repeat 10 times
for the average accuracy)
? Compare your results of PLA vs. SVM? What do you observe?
7
Homework
(Level 5: computer-based exercise) (20 pt)
? (Using HW2-5.ipynb as the template.) ? Use LIBSVM (https://www.csie.ntu.edu.tw/~cjlin/libsvm/)
? Need to install libsvm-official (in colab, to install new python package, use “!” in front of command line)
? Follow “A Practical Guide to Support Vector Classification”
https://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
? Use A.1 Astroparticle Physics dataset: svmguide1 and svmguide1.t
? Use the radial basis function (RBF) (i.e., t =2)
? First, train the model without scaling the dataset with   = 2, C = 32, then report your prediction accuracy on the testing data
? Second, scale datasets with default parameters, train the model, and then report your prediction
accuracy on the testing data
? What do you observe? Why?
? Third, change C = 2, 8, 32, 128, 512, repeat the model training (using scaled datasets), and report
the prediction accuracy of the training data and that of the testing data.
? What do you observe across different C’s? Why?
8

相关文章

版权所有:留学生编程辅导网 2021,All Rights Reserved 联系方式:QQ:99515681 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。