Machine Learning for Big Data: Exercises for Lecture 2

(k-NN, Linear and Logistic Regression)

Question 1 (require somes calculation which is probably best dealt with by implementing a small

program). The following represents a 2-dimensional data set with two numerical features and two

binary classes.

1. Do you recognise any issues about the data set? If so, apply some preprocessing before proceeding

to the next part of the question.

2. Consider now the unlabelled point

x = (x1, x2) = (4.41, 25.0),

which we would like to classify. Perform a k-nearest neighbour search for k = 1, 3, 5 and Use

majority vote to determine if x should get label ?1 or +1. If you like, you can also experiment

with different distance functions.

Question 2. Consider the following data set, which records several phone calls of clients to a company.

1. Fit the data to a linear regression model and compute the R2

-value.

2. Consider a client that calls the company for 90 minutes. Can you estimate the number of

purchased items?

1

3. Somebody tells you that a recent client purchased 3 items, but you do not know the length of the

call. Can you estimate that number?

Question 3. A medical study examined 300 people with high blood pressure and 200 people with low

blood pressure. During the period of the study, 30 of these people in the low-blood-pressure group and

100 in the high-blood pressure group suffered from cardiovascular disease (heart disease).

1. Sketch how the data set looks like (i.e., explanatory variables and outcomes).

2. Apply logistic regression.

3. Consider now the classification problem. How would you classify a person with high-bloodpressure?

How could you try to refine your prediction?

2

版权所有：留学生编程辅导网 2020,All Rights Reserved 联系方式：QQ:99515681 电子信箱：99515681@qq.com

免责声明：本站部分内容从网络整理而来，只供参考！如有版权问题可联系本站删除。