联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codehelp

您当前位置:首页 >> Python程序Python程序

日期:2020-05-05 10:56

RMHI Assignment
Your name goes here
Hi students! This document is your assignment for RMHI.
TO DO
1.Replace Your name goes here above with your name and student ID. Don’t delete the quotation marks!
2.Save this document as studentID-assignment.Rmd.
3.While we encourage collaboration in tutorials and learning in general, you should not be collaborating with anybody AT ALL for this assignment. That means sharing code privately or publically or even talking about what you’re answering for different problems will effectively be collusion. You should be completing it independently, with no help from any other person in any capacity. Of course, as always, you are free to use any of the resources from the class to help you, and you’re also free to google (as long as you aren’t asking anybody specific questions about this assignment).
4.Plagiarism check is enabled and you can check the similarity report on you submission. However, please understand that we will not be naively looking at the overall % figure: with this sort of assignment a certain amount overlap is inevitable, so don’t freak out if you get what looks like a high % score. Probably most people will. We will be using the plagiarism check for the parts of the assignment where we’d expect some variability, and to give a general sense of the overall gestalt.
5.Complete all of the problems below. Do not change any of the arguments to the code chunks, like the names of the code chunks or where it says echo=FALSE or whatever. If a problem asks you to display a dataframe/tibble or whatever so it shows up in the knitted version, make sure that you do as the marker cannot evaluate it without seeing it, and if they can’t see it then you may lose points! Remember that to display a dataframe/tibble (or any variable) you just type its name on a line of its own within the R code.
6.I’ve structured this so that, as much as possible, questions do not build on each other. That means that if, say, you can’t get Q5 then you can still get Q6. Try to do all of them.
7.Go for partial credit! Many of these questions have some form of partial credit possible. What that means is that if it is asking for some R code, break down the problem into pieces. Even if you can only do some of the pieces, or do them part of the way, that will be worth something. [Note that there is no question-by-question rubric available because designing one would mean giving away the answers. In general we will give full credit for responses that correctly address all of the parts of the question.]
8.If the question is a short answer question (SAQ), it specifies a word count. Unfortunately there is no way that I know of in R Markdown to do a word count. What you will have to do is type up your answer in Word and then cut and paste it into the R Markdown file and then put the word count in brackets at the end. I know that’s annoying; sorry. Anything else I thought of, like specifying a number of sentences or having no limit, was worse in terms of equity across students. SAQs are also worth partial credit and are generally asking for some thoughtful interpretation. If it is based on a previous graph or analysis you’ve done, if you made the graph wrong but analysed it well, you can still get most or all points for the SAQ part. The word counts I’ve specified in each question are upper limits and designed so that you can answer completely and correctly within that word count, sometimes substantially within (that is, I’ve generally tried to overestimate how much you’ll need so if you feel you’ve answered it fully without using all of the words allocated, that is fine and probably to be expected in many cases).
9.There is no word count for code chunks. Word count only applies to the SAQs. There is thus also no total word count for the assignment as a whole; you really need to only specify that for the SAQs.
10.You’ll be turning in the knitted output of this R Markdown file. We prefer that you knit to Word but if you can’t get Word to knit then pdf or html is okay. In the worst case, you can turn in the completed Rmd file.
LFB’s story
For this assignment, we’re going to go back to meet up with LFB and hear her story. As you’ll recall, in Tutorial 4 she, Doggie, and Flopsy went on a mission to Otherland to steal some of their data. The mission was successful but LFB went missing! In your assignment we get to see what happened to LFB.

LFB is standing as lookout as Doggie and Flopsy enter the building. She is squinting through the darkness trying to see when she hears a rustle. Then another one, then another one, coming ever closer. Not wanting to raise the alarm prematurely, LFB holds still, but when she hears another rustle only meters from where she is, she whistles, giving the signal.
At the sound, the rustling bushes part and LFB has only a hasty glimpse of two very large shapes before they are on her! Two large hairy hands grab her and she has time for one more whistle, as loud and shrill as she can make it, before the hands cover her mouth and she is taken away.
Q1 [2% of total mark]
LFB is, quite naturally, terrified. Shadow taught her a good calming mantra to use in situations like this: imagine she is writing R code using operators and logical connectives. (Yes, both she and Shadow are strange). Unfortunately she is a little too scared to be good at it. Help her out here. In the code chunk below, use the operators and logical connectives to ask whether 5 squared is less than 20 OR the square root of 25 is equal to 5 (or both). That is, your code should return TRUE if either 5 squared is less than 20 OR the square root of 25 is equal to 5, or both are true.
# insert your code here
Q2 [2% of total mark]
After a short and frightening journey through the dark, LFB is taken into a building where the people holding her have a short, terse whispered conversation. She is unceremoniously put into a small room that looks like a library, but has very large chairs and books. She has to jump just to get up to the chair and reach the door handle. It is locked.
After about a half an hour of worry, the door opens and seven people come in. The first is an enormous bear, larger than anybody LFB has ever seen; she suspects that is who grabbed her with the very large hands. He is followed by an owl, a small unicorn with a rainbow mane, a hippo, and a cute penguin carrying a snake. The entire group is trailed by what looks, to LFB’s astonishment, to be a sentient guitar (that’s right, a musical instrument that can walk and talk). This is a very strange place, thinks LFB.
The seven strangers introduce themselves to LFB. Their names are in the vector called names below, along with their species. (a) Is it a character vector, numeric vector, or logical vector? (b) In the chunk below, using one additional line of R code, select the 2nd and 6th items (i.e., the owl and penguin) out of names and assign them to a new vector called `birds’.
names <- c(bear="super size", owl="big wol", unicorn="rainbow", hippo="hugo",
snake = "sissily", penguin="little blue", guitar="kevin")

# insert your code here
ANSWER: (a) Put your answer here.
LFB is feeling a little calmer now that it appears nobody is going to try to kill her on sight. Still, they seem rather suspicious (not that she blames them, really).
“What’s your name?” the giant bear, Super Size, asks.
“LFB,” she answers, trembling.
“What kind of name is LFB?” asks Kevin, the guitar.
LFB bites her tongue and narrowly avoids asking what kind of guitar is named Kevin, and just says “It stands for Lovable Fluffy Bunny. My mum named me.”
The unicorn fluffs her tail and says “It’s a lovely name. I like it,” and glares at Kevin.
“What are you doing here?” the giant owl asks.
LFB tells everyone the whole story – how they fear they are running out of food, and they wanted to see if the Others were stealing it (at this point LFB trembled a little bit more) or were having similar problems. As she gets into the story, she can’t help but noticing that most of her listeners seem stunned. The penguin (Little Blue, LFB reminds herself) whispers enthusiastically to the unicorn (Rainbow, LFB thinks) several times during her explanation. When she stops, there is a long silence.
“How do we know you’re telling the truth?” the snake, Sissily, finally asks.
“I… don’t know,” LFB says. “I am, I swear.”
“Rainbow and I have an idea,” says the penguin. “We can give her the TRUST scale, and that will help.”
“uh.. what is that?” LFB asks.
“You don’t have that?” Kevin asks with some incredulity. “Even better, it means you won’t be able to fool the test.”
“T.R.U.S.T. stands for Trust, Rightness, Uprightness, and Straightforwardness Test,” says Sissily. It’s got lots of questions, all of them designed to measure how trustworthy you are."
“All right,” says LFB, a bit nervous. (She’s always been fond of standardised tests, but never have the stakes been so high!)
Q3 [3% of total mark]
Consider the T.R.U.S.T. scale. In 100 words or less, explain what is the (a) construct being measured; (b) the measure; and (c) the observation(s).
ANSWER: Put your answer here. [Word count: ]
Q4 [2 % of total mark]
The dt dataset, which has been loaded for you already, contains LFB’s scores on the T.R.U.S.T. scale. Each row consists of one of the fifty questions. The column lfb contains her score on that question and the column meanNorm contains the mean scores that most people get on that question. In general, a higher score on a question means one is more trustworthy according to that question; scores for all questions have a minimum possible of 1 and a maximum possible of 20.
Select just the first half of the dataset (i.e., the first 25 rows) and assign it to a variable called firsthalf.
# insert your code here
Q5 [10% of total mark]
(a)Using dt, report the mean, median, mode, standard deviation, and IQR of both LFB’s scores and the normed scores (in meanNorm) across all 50 questions. (b) In 100 words or less, would you say LFB is generally more trustworthy than the average or less? Justify your answer with reference to at least three of these reported values, stating what each shows.
Note: Your code chunk should include all of the code you used to calculate these answers, but you should additionally answer in the blank answer spaces below (we want to see not just that you can run the code but that you can extract the relevant information).
# insert your code here
(a) ANSWER:
Mean: LFB: ___ meanNorm: ___
Median: LFB: ___ meanNorm: ___
Mode: LFB: ___ meanNorm: ___
Standard deviation: LFB: ___ meanNorm: ___
IQR low end: LFB: ___ meanNorm: ___
IQR high end: LFB: ___ meanNorm: ___
ANSWER: (b) Put your answer here. [Word count: ]
Q6 [4% of total mark]
Make a new variable in dt called diff which is lfb’s scores on each row minus the meanNorm on that row (thus a positive number in diff means that LFB scored higher on that question than the norm). On how many questions did LFB score equal to or below the norm? [Note: in class we covered several ways to make a new variable. For this question, any of them will work.]
# insert your code here
ANSWER: Put your answer here.
Upon seeing the results of LFB’s test, most of the Others (aside from Kevin and Sissily) are tentatively willing to trust her story. After a long, whispered conference amongst each other, Rainbow steps forward and unties LFB.
“We’ve been having food problems ourselves,” she confides quietly. “We haven’t known what to do about it, and are pretty worried.”
“Maybe I could help?” LFB offers. “I mean, I don’t know much, but perhaps if we compare problems we’ll be able to figure out what’s going on. I’ll tell you what I know about our situation too.”
LFB shares her survey data and the Others share their food data that you went over in the tutorials, and everyone agrees that there is a problem.
“The thing is,” Super Size observes (everyone is now very companionable and speaking frankly), “I fear that this is having a lot of bad indirect effects on everyone’s health and wellness.”
“Do you have any data about that?” LFB asks, curious. “We’ve found that doing surveys has been really helpful for understanding what’s going on.”
There is a long silence, and then Little Blue volunteers: “I’ve been asking people questions about their feelings and hunger for a project of my own. I could show you the data.”
Everyone is enthusiastic, and Little Blue loads her data up. It is in the datafile called do which has already been loaded for you too. The dataset contains the following columns:
name: the name of the person being surveyed
species: the species of the person being surveyed
size: the size of the person being surveyed (small, medium, large, enormous)
time: time point 1, 2, or 3 (Little Blue asked each of the questions three times, so this variable indicates which time the answer corresponds to. This means that each person contributed three rows to the dataset, one for each time point)
hunger: rating of their current level of hunger on a scale of 1-10 (10 is high)
fear: rating of their current level of fear on a scale of 1-10 (10 is high)
anxiety: rating of their current level of hunger on a scale of 1-10 (10 is high)
“How far apart were the three time points?” Rainbow asks after they all take a look.
“Six months each,” says Little Blue. “Time point 3 was just this week, time point 2 was six months ago, and time point 1 was a year ago.”
Q7 [6% of total marks]
First let’s get a sense of what species and sizes are represented in this dataset. (a) So that you’re not triple counting everyone, make a smaller dataset called dunique which contains only the rows from time point 3. (b) Using dunique, make a table that shows the cross-tabulation of species by size. Use the kable() function to make it look nice, with an appropriate caption.
# insert your code here
LFB looks at the chart. “There is a sentient string in Otherland?” she asks incredulously.
Kevin looks up, miffed. “That’s my best friend, Kevin Clark,” he says. “What, do you think a string can’t be intelligent? Or a guitar?”
“No, no, just curious,” LFB backpedals hastily. “All good.”
Rainbow whispers to her, “We don’t understand it either. Just go with it.”
Super Size clears his enormous throat. “Ahem. So now you have a sense of the kinds of species in our dataset. That’s reasonably representative of Otherland, I would say.”
Sissily nods. “Yes. Mostly birds, bears, and bunnies, with a bunch of other things too.”
“Alright, so let’s have a look at the data,” LFB says eagerly.
Q8 [7% of total mark]
As a first pass, let’s just see how hunger, fear, and anxiety are changing over time. Use your grouping and summary functions to group by time and calculate the mnHunger (the mean hunger rating for each of the three time intervals), mnFear (the mean fear rating for each of the three time intervals), and mnAnxiety (the mean anxiety ratings for each of the three time intervals). Put the result in a new tibble called dtime, and display dtime. Does it look to you like hunger, fear, and anxiety are going up between time 1 and time 3?
# insert your code here
ANSWER: Put your answer here.
Q9 [8% of total mark]
Summary statistics are useful, but you learn more from figures. As a first step, let’s graph how hunger, fear, and anxiety change with time. To do this, you’ll need to use the dataset doLong, which is basically just the do dataset that I used the pivot_longer() function to put into long form. (In a subsequent problem you’ll do this yourself, but I didn’t want your ability to graph this to depend on your ability to make the long form dataset, so just use doLong for this). Anyway, you’ll note that doLong has a column called question which has three possible values (hunger, fear, anxiety) corresponding to the question that was asked of that person at that time. There is also a column called rating which has the answer (the rating) that person gave to that question.
Using doLong, make a boxplot with three facets (anxiety, fear, and hunger) where the x axis is time (1, 2, or 3) and the y axis is the rating. Make the colour of the boxes change depending on time point, use a nice theme, and remember to give your figure a title and axis labels. Finally, remove the legend (since it is redundant information) and add the data points themselves using geom_jitter.
# insert your code here
Q10 [4% of total marks]
The previous problem used the tibble doLong which was created and loaded for you. However, you have the knowledge to create your own. Do so. Hint: use do as the starting tibble and the pivot_longer function. Call your version myDoLong and display it so it shows up when you knit the document.
# insert your code here
Q11 [6% of total marks]
Now let’s also see how hunger changes with time and species size. To do this, we’re going to create a barplot, which means that we need a tibble that has the necessary summary information. That tibble has been loaded for you already (it is called dtsize) but this problem asks you to make it yourself. Call your version mydtSize and use the group_by and summarise() functions on the dataset do to make your tibble so it is exactly the same as it (hint: as a first step, take a look at the rows and columns of it to figure out what you need to do). Display your tibble so it shows up when you knit the document.
# insert your code here
Q12 [10% of total marks]
Now we can make the barplot. Using dtsize, create a barplot that plots time on the x axis, mnHunger on the y axis, and the size of people in different facets as well as corresponding to different coloured bars. You should also make sure that: (a) your plot has a title and axis labels; (b) your plot use a theme; (c) you show error bars corresponding to sderrHunger; (d) you make the bars semi-transparent; (e) you plot the individual datapoints (from do) using geom_jitter(); and (f) you use RColorBrewer to set the colours of both the bars and the individual datapoints according to a palette of your choice.
# insert your code here
Q13 [6% of total marks]
Make a plot of your own, with the goal of learning something new about the data that hasn’t been shown by the previous plots. Requirements: (a) it needs to involve a geom other than geom_col() or geom_boxplot(); (b) it needs an informative title and axis label; (c) you should use a theme; (d) it should involve more than one facet; (e) it should use colour.
# insert your code here
Q14 [6% of total marks]
Refer to your plot in question Q13. In 150 words or less, explain what it shows and what you think this reveals about this dataset.
ANSWER: Put your answer here. [Word count: ]
Q15 [6% of total marks]
The distribution of responses to the hunger question (in do) represents the sample distribution and its mean is the sample mean. Using it as an example, what would be the sampling distribution of the mean? Calculate and report the standard deviation of the sampling distribution of the mean. [Hint: the function nrow() will give you the number of rows in the dataset.] How is the sampling distribution of the mean related to the 95% confidence interval? Use 200 words or less to answer these questions.
# insert your code here
ANSWER: Put your answer here. [Word count: ]
“Wow, this is really great data,” says LFB. “It’s really lucky that you were here, Little Blue. What were the odds that of everybody in Otherland, I’d run into the one person who has been collecting the data that we need.”
Little Blue shrugs modestly. “It might not be that surprising,” she says. “A lot of people here in Otherland like doing surveys on this sort of thing. Maybe around 5%. It doesn’t seem extremely unlikely that in a random group of seven people, you’d find one person who had this data.”
Q16 [4% of total marks]
Is Little Blue correct? Calculate the probability of finding exactly one person with the relevant dataset out of a random collection of seven people, assuming that 5% of the total population does so.
# insert your code here
“What we really want to know is if any of this is significant,” says Super Size. “Are the levels of fear, anxiety, and hunger really different between time 1 and time 3?”
“To do that we need to run a statistical test,” says Sissily.
Q17 [5% of total marks]
Consider the hunger question. Define both the null and alternate statistical hypotheses you would use here to answer Super Size’s research question. If the p-value of your test was 0.0231, what would you conclude? Explain why, referring to a definition of p-value as you do so. Use 250 words or less to answer these questions.
ANSWER: Put your answer here. [Word count: ]
Q18 [3% of total marks]
Suppose you got a p-value of 0.083. Assuming an alpha of 0.05, what type of error (I or II) are you in danger of committing? Explain your answer in 200 words or less.
ANSWER: Put your answer here. [Word count: ]
Q19 [4% of total marks]
“I don’t like statistical tests,” Kevin says grumpily. “There’s so much chance for error, especially Type I error. I don’t understand why we don’t just set alpha really low, like 0.01 or even zero, and have a much lower chance of error.”
What would you say to Kevin, in 200 words or less?
ANSWER: Put your answer here. [Word count: ]
Q20 [2% of total marks]
This one is a freebie - any answer is fine as long as you answer it. What is your favourite character in Bunnyland, and why?
ANSWER: Put your answer here.

版权所有:留学生编程辅导网 2021,All Rights Reserved 联系方式:QQ:99515681 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。