Assignment Task
Questions
Part One – small questions
1. Use the nycflights13 package and the flights data frame to answer the following questions:
1.1 What month had the highest proportion of canceled flights (the arr_delay variable is NA)? What month had the lowest? Plot the proportion of canceled flights each month and interpret any seasonal patterns.
1.2 What plane (specified by the tailnum variable) traveled the most times from New York City airports (JFK, LGA or EWR) in 2013? Plot the number of trips per week.
2. Use the Lahman package and the Teams data frame to answer the following questions:
2.1 Define two new variables in the Teams data frame: batting average (BA) and slugging percentage (SLG). Batting average is the ratio of hits (H) to at-bats (AB), and slugging percentage is total bases divided by at-bats. To compute total bases, you get 1 for a single, 2 for a double, 3 for a triple, and 4 for a home run.
2.2 Plot a time series of SLG since 1954 conditioned by lgID. Is slugging percentage typically higher in the American League (AL) or the National League (NL)? (8 points)
2.3 Display the top 15 teams ranked in terms of slugging percentage in MLB history. Repeat this using teams since 1969.
2.4 Create a factor called election that divides the yearID into four-year blocks that correspond to U.S. presidential terms (from 1788 to 2017). During which term have the most home runs been hit? (Hint: seq function)
Need Help Writing an Essay?
Tell us about your assignment and we will find the best writer for your paper.
Get Help Now!3. Using the storms data frame from the nasaweather package:
Create a scatterplot between wind and pressure, with color being used to distinguish the type of storm. You might notice there are lots of overlapping data points in the scatterplot due to a comparatively large sample size, How would you improve your visualization?
4. Suppose you are rolling two fair dies with success defined as getting a total value 6. If you roll two dies independently for eight times:
4.1 What is the probability of observing exactly five successes (five total value 6s) in total? (calculated by hand)
4.2 Use R to confirm the result of Pr(X=5) for the die-roll example.
4.3 Plot the corresponding full probability mass function for X for this die-rolling example.
Part Two – small projects
COVID-19 Pandemic
The COVID-19 outbreak was first identified in December 2019 in Wuhan, China. The WHO declared the outbreak a Public Health Emergency of International Concern on 30 January 2020 and a pandemic on 11 March (Wikipedia). Organizations worldwide have been collecting data so that the government can monitor and learn from this pandemic. You will use the dataset ‘time_series_covid_19_confirmed.csv’ from LMS to explore the COVID-19 data.
Note: This data set details can be found via https://www.kaggle.com/sudalairajkumar/novel- corona-virus-2019-dataset#time_series_covid_19_confirmed.csv;
Your data analysis should include but not limited to the answers to the following questions:
1. Create two graphs that displays the latest number of COVID-19 cases of the top 10 and bottom 10 countries, respectively. Consider how to improve the quality and aesthetics of your visualization.
2. Visualize the confirmed cases worldwide from January to March.
3. Visualize the confirmed cases of COVID-19 in China and the rest of the world from January to March. Can you relate the main changes observed from the plot with the landmark events such as WHO declared a pandemic?
Welcome to Our Online Academic Writing Service. Our online assignment writing website provide various guarantees that will never be broken. No matter whether you need a narrative essay, 5-paragraph essay, persuasive essay, descriptive essay, or expository essay, we will provide you with quality papers at student friendly price.
Ask for Instant Writing Help. No Plagiarism Guarantee!


