Data Science Story Telling with R

class: center, middle, inverse, title-slide

# Data Science Story Telling with R
## klikR
### Tatjana Kecojevic
### 24 Nov 2018

---

background-image: url(https://upload.wikimedia.org/wikipedia/commons/c/c1/Rlogo.png)
???

Image credit: [Wikimedia Commons](https://commons.wikimedia.org/wiki/File:Rlogo.png)
---
class: inverse, center, middle

#R Workshop: part I
## Hi, Zdravo, Ciao! Welcome to the Data Science Story Telling with R! Let me introduce you to our team:

- Hi, I'm Tanja! A Data Scientist at [DataTeka](https://datateka.com/). 
- Hi, I'm Zeljko! I work at [InfostudHub](https://www.infostudhub.rs). 
---
## How's the day planned

- There will be a little bit of instruction, and a few exercises, then some more instruction, and some more exercises, some reading, some more exercises, ...

- The goal for the day is to work with your team and mentor to make a web app for looking at data.

- We are going to learn about the software R, and the language of data analysis. There's a lot of things to learn. It's ok if you can't remember it all. Most important thing is to have fun and play, break things and fix them, try out new stuff!

- We will have breaks whenever you feel you want them - there are snacks, drinks and pizzas 🍕😋.

### CODE of CONDUCT:

1. Be positive
2. Be inclusive
3. Ask for help, and give some help

---

## MATERIALS for WORKSHOP:

To download workshop's material please go to:
<https://github.com/DataTeka/klikrws>

---
class: center, middle
# How do we do it? 🤔

###Steps of a typical data science project:
<img src="images/Program_HW.png" width="500px" />
---
class: inverse, center, middle

#Get Started 🤫😴
<img src="images/George_Desk.gif" width="600px" />

---
## Write R Code 😇🎶

To start using **R** you need to:

1) Install [R](https://cran.r-project.org/) [(and RStudio)](https://www.rstudio.com/products/rstudio/download/#download)

2) Launch it and set your working directory: letting R know where to find all of your files.

- **On a mac**, it'd look like this
`setwd("~/Documents/DS_Story")`

- **On a pc**, it might look like this
`setwd("C:/Documents/DS_Story")`

3) Start writing **R** code!

**Tip**💡:
- When start working on a new R code/R Project in [RStudio IDE](https://support.rstudio.com/hc/en-us/sections/200107586-Using-the-RStudio-IDE) use 
***File -> New Project*** 
This way your working directory would be set up when you start a new project and it will save all your files in it. Next time you open your project it would set project's directory as a working directory... It would help you with so much [more](https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects).

---
class: center, middle

##[RStudio IDE Cheatsheet](https://www.rstudio.com/wp-content/uploads/2016/01/rstudio-IDE-cheatsheet.pdf)

***Top Left:*** Code Editor;
--

***Bottom Left:*** R Console;
--

***Top Right:*** Environment
--

***Bottom Right:*** Plots and Files
---
#Dataset

Today we will examine [Olympic games data](https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results) that include data about the Games from Athens 1896 to Rio 2016.
The file `athlete_events.csv` contains `$271,116$` rows and `$15$` columns. Each row corresponds to an individual athlete competing in an individual Olympic event (athlete-events).

**Note** 💡: there are 15 columns, each of which we call a **variable**. 
---
class: inverse, center, middle
  
  # Let's get introduced to some basic statistical concepts 🧐
---
##What will I learn in Part I?

During Part I of the workshop you will be introduced to some basic R syntax and a set of methods that enable data to be explored using `R` with the **objective** 
- of summarising and understanding the main features of the variables contained within the data and

- to investigate the nature of any linkages between the variables that may exist.

The starting point is to understand **what data is**.
- What is the **population**?
- Why do we use **samples**?

So, from where do I start?

- **Do I understand the problem** under investigation and are the objectives of the investigation clear? *The only way to obtain this information is to ask questions, and keep asking questions until satisfactory answers have been obtained.*

- Do I understand exactly **what each variable is measuring/recording?**
---
#Describing Variables

A starting point is to examine the characteristics of each individual variable in the data set.

The way to proceed depends upon the type of variable being examined.

**Classification of variable types**

The variables can be one of two broad types:

-	Attribute variables

-	Measured variables
.pull-left[
**attribute**

gender

days in a week
]
.pull-right[
**measured**

age

weight
]
---
##The Concept of Statistical Distribution

**The concept of the statistical distribution is central to statistical analysis.**

This concept relates to the population and conceptually assumes that we have perfect information, the exact composition of the population is known.

.pull-left[
**attribute:**
![](DSSR_files/figure-html/unnamed-chunk-6-1.png)
]

.pull-right[
**measured:**
![](DSSR_files/figure-html/unnamed-chunk-7-1.png)
]
---
class: center, middle
##Summary Statistics

![](DSSR_files/figure-html/unnamed-chunk-8-1.png)
---
##Investigating relationship between variables

One of the key steps required of the Data Analyst is to investigate the relationship between variables. This requires a further **classification of the variables** contained within the data, as either a **response** variable or an **explanatory** variable.

A **response** variable is a variable that measures either directly or indirectly the objectives of the analysis.

An **explanatory** variable is a variable that may influence the response variable.
---
class: center, middle
##Bivariate Relationships
<img src="images/RelationshipMatrix.png" width="500px" />
---
class: center, middle
##DA Methodology
<img src="images/DaMethodology.png" width="600px" />

Note that the 'Further Data Analysis' stage may or may-not be required depending on the outcome of the 'Initial Data Analysis' at stage 1. 
---
class: center, middle
##Measured Vs Attribute(2-levels)
<img src="images/MvAMethodology.png" width="700px" />
---
class: center, middle
##Measured Vs Measured
<img src="images/MvMMethodology.png" width="700px" />
---
##Further Data Analysis

If the '**Initial Data Analysis**' is *inconclusive* then '**Further Data Analysis**' is required.

The 'Further Data Analysis' is procedure that enables a decision to be made, based on the sample evidence, as to one of two outcomes:  
- There is no relationship
-	There is a relationship

These statistical procedures are called **hypothesis tests**, which essentially *provide a decision rule for choosing between one of the two outcomes*: "There is no relationship" or "There is a relationship" based on the sample evidence.

All hypothesis tests are carried out in four stages:
- Stage 1:		Specifying the hypotheses.

- Stage 2:		Defining the test parameters and the decision rule.

- Stage 3:		Examining the sample evidence.

- Stage 4:		The conclusions.

---
class: inverse, center, middle

#How do we do it in R?: part II 🤓 
##klikR
  
---
##Before Tidyverse R, there is Base R!
When you download and install **R** for the first time, you are installing **the Base R** software. **Base R** contains most of the functions you’ll use on a daily basis: `mean()`, `subset()`...

To learn about **R**'s basic operations, data structures and base functions you could look at one of the R-Ladies Manchester's handouts: [Introduction to base R](https://tanjakec.github.io/blog/introduction-to-r/).

If you want to access data and code written by other people, you’ll need to install it as a **package**. An **R package** is a bundle of functions (code), data, documentation, vignettes (examples), stored in one neat place.

"In **R**, the fundamental unit of shareable code is the package." [Hadley Wickham](http://r-pkgs.had.co.nz/intro.html)  
---
##The verse! 😇🎶
An opinionated collection of **R packages** for data science.

[`install.packages("tidyverse")`](https://www.tidyverse.org/)

[`library(tidyverse)`](https://www.tidyverse.org/packages/)

- Have you tried learning data science by reading books?

📖📘 [**R for Data Science**](http://r4ds.had.co.nz/) by Garrett Grolemund & Hadley Wickham

- Have you tried learning data science by posting your questions and discussing it with other people within the R community?

👥💻📊📈🗣 [**RStudio Community**](https://community.rstudio.com/)
---
##The `dplyr` Package 🗜🛠🔩⚙️: 
provides a “grammar” (the verbs) for data manipulation and for operating on data frames. The **key opertor and the esential verbs** are :

- `%>%`: **the “pipe” operator** used to connect multiple verb actions together into a pipeline.

- `select()`: return a subset of the columns of a data frame.

- `mutate()`: add new variables/columns or transform existing variables.

- `filter()`: extract a subset of rows from a data frame based on logical conditions.

- `arrange()`: reorder rows of a data frame according to single or multiple variables.

- `summarise()` / `summarize()`: reduces each group to a single row by calculating aggregate measures. 
---
##The Olimpic Games Data

A historical dataset on the modern Olympic Games, including all the Games from Athens 1896 to Rio 2016.

The main data frame olympics has **271,116 rows** and **15 variables**:
- **ID** - Unique number for each athlete
- **Name** - Athlete's name
- **Sex** - M or F
- **Age** - Integer
- **Height** - In centimeters
- **Weight** - In kilograms
- **Team** - Team name
- **NOC** - National Olympic Committee 3-letter code
- **Games** - Year and season
- **Year** - Integer
- **Season** - Summer or Winter
- **City** - Host city
- **Sport** - Sport
- **Event** - Event
- **Medal** - Gold, Silver, Bronze, or NA

---

##The Olimpic Games Data

```r
# import csv data file into R
olympic <- read.csv("data/athlete_events.csv")
olympic[1:5,]
```

```
## ID Name Sex Age Height Weight Team NOC
## 1 1 A Dijiang M 24 180 80 China CHN
## 2 2 A Lamusi M 23 170 60 China CHN
## 3 3 Gunnar Nielsen Aaby M 24 NA NA Denmark DEN
## 4 4 Edgar Lindenau Aabye M 34 NA NA Denmark/Sweden DEN
## 5 5 Christine Jacoba Aaftink F 21 185 82 Netherlands NED
## Games Year Season City Sport
## 1 1992 Summer 1992 Summer Barcelona Basketball
## 2 2012 Summer 2012 Summer London Judo
## 3 1920 Summer 1920 Summer Antwerpen Football
## 4 1900 Summer 1900 Summer Paris Tug-Of-War
## 5 1988 Winter 1988 Winter Calgary Speed Skating
## Event Medal
## 1 Basketball Men's Basketball <NA>
## 2 Judo Men's Extra-Lightweight <NA>
## 3 Football Men's Football <NA>
## 4 Tug-Of-War Men's Tug-Of-War Gold
## 5 Speed Skating Women's 500 metres <NA>
```

**Note** 💡: we are reading only first 5 raws and there are 15 columns!!
---
##Setting up Working Environment 💡

Install necessary packages you will be working with!

```r
install.packages("dplyr", repos = "http://cran.us.r-project.org")
install.packages("ggplot2", repos = "http://cran.us.r-project.org")
install.packages("DT", repos = "http://cran.us.r-project.org")
```

And now we're ready to start practicing Elain's Dance!!! 😃🎵🎶

<img src="images/ElainDanceI.png" width="300px" style="display: block; margin: auto;" />
---
## First look at the data: `dim()` & `head()`

```r
dim(olympic)
```

```
## [1] 271116     15
```

```r
head(olympic, n = 3)
```

```
## ID Name Sex Age Height Weight Team NOC Games
## 1 1 A Dijiang M 24 180 80 China CHN 1992 Summer
## 2 2 A Lamusi M 23 170 60 China CHN 2012 Summer
## 3 3 Gunnar Nielsen Aaby M 24 NA NA Denmark DEN 1920 Summer
## Year Season City Sport Event Medal
## 1 1992 Summer Barcelona Basketball Basketball Men's Basketball <NA>
## 2 2012 Summer London Judo Judo Men's Extra-Lightweight <NA>
## 3 1920 Summer Antwerpen Football Football Men's Football <NA>
```

This is hard to read...?! 😕

---
##Examine the structure of the data: `str()`

```r
str(olympic) 
```

```
## 'data.frame':	271116 obs. of  15 variables:
##  $ ID    : int  1 2 3 4 5 5 5 5 5 5 ...
##  $ Name  : Factor w/ 134732 levels "  Gabrielle Marie \"Gabby\" Adcock (White-)",..: 8 9 44318 29412 21469 21469 21469 21469 21469 21469 ...
##  $ Sex   : Factor w/ 2 levels "F","M": 2 2 2 2 1 1 1 1 1 1 ...
##  $ Age   : int  24 23 24 34 21 21 25 25 27 27 ...
##  $ Height: int  180 170 NA NA 185 185 185 185 185 185 ...
##  $ Weight: num  80 60 NA NA 82 82 82 82 82 82 ...
##  $ Team  : Factor w/ 1184 levels "30. Februar",..: 199 199 273 278 705 705 705 705 705 705 ...
##  $ NOC   : Factor w/ 230 levels "AFG","AHO","ALB",..: 42 42 56 56 146 146 146 146 146 146 ...
##  $ Games : Factor w/ 51 levels "1896 Summer",..: 38 49 7 2 37 37 39 39 40 40 ...
##  $ Year  : int  1992 2012 1920 1900 1988 1988 1992 1992 1994 1994 ...
##  $ Season: Factor w/ 2 levels "Summer","Winter": 1 1 1 1 2 2 2 2 2 2 ...
##  $ City  : Factor w/ 42 levels "Albertville",..: 6 18 3 27 9 9 1 1 17 17 ...
##  $ Sport : Factor w/ 66 levels "Aeronautics",..: 9 33 25 62 54 54 54 54 54 54 ...
##  $ Event : Factor w/ 765 levels "Aeronautics Mixed Aeronautics",..: 160 398 349 710 623 619 623 619 623 619 ...
##  $ Medal : Factor w/ 3 levels "Bronze","Gold",..: NA NA NA 2 NA NA NA NA NA NA ...
```

The **output could look messy** and it might not fit the screen when dealing with a big data set that has lots of variables! 🤪
---
##Do it in a tidy way: `glimpse()`

```r
suppressPackageStartupMessages(library(dplyr))
glimpse(olympic) 
```

```
## Observations: 271,116
## Variables: 15
## $ ID <int> 1, 2, 3, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 7...
## $ Name <fct> A Dijiang, A Lamusi, Gunnar Nielsen Aaby, Edgar Lindena...
## $ Sex <fct> M, M, M, M, F, F, F, F, F, F, M, M, M, M, M, M, M, M, M...
## $ Age <int> 24, 23, 24, 34, 21, 21, 25, 25, 27, 27, 31, 31, 31, 31,...
## $ Height <int> 180, 170, NA, NA, 185, 185, 185, 185, 185, 185, 188, 18...
## $ Weight <dbl> 80, 60, NA, NA, 82, 82, 82, 82, 82, 82, 75, 75, 75, 75,...
## $ Team <fct> China, China, Denmark, Denmark/Sweden, Netherlands, Net...
## $ NOC <fct> CHN, CHN, DEN, DEN, NED, NED, NED, NED, NED, NED, USA, ...
## $ Games <fct> 1992 Summer, 2012 Summer, 1920 Summer, 1900 Summer, 198...
## $ Year <int> 1992, 2012, 1920, 1900, 1988, 1988, 1992, 1992, 1994, 1...
## $ Season <fct> Summer, Summer, Summer, Summer, Winter, Winter, Winter,...
## $ City <fct> Barcelona, London, Antwerpen, Paris, Calgary, Calgary, ...
## $ Sport <fct> Basketball, Judo, Football, Tug-Of-War, Speed Skating, ...
## $ Event <fct> Basketball Men's Basketball, Judo Men's Extra-Lightweig...
## $ Medal <fct> NA, NA, NA, Gold, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
```
Ahhh... this 👀 better! 😅

---
##The pipeline operater: `%>%` ⛓⛓⛓
<pre>
**Left Hand Side (LHS)** `%>%` **Right Hand Side (RHS)**
</pre>
<pre>
x %>% f(..., y)

f(x,y)
</pre>

The "pipe" passes the **result** of the **LHS** as the 1st operator argument of the **function** on the **RHS**

`%>%` is very practical for chaining together multiple `dplyr` functions in a sequence of operations.
---
##pick variables by their names: `select()`,
<img src="images/select().png" width="450px" />

- `starts_with("X")` every name that starts with "X".

- `ends_with("X")` every name that ends with "X".

- `contains("X")` every name that contains "X".

- `matches("X")` every name that matches "X", where "X" can be a regular expression.

- `num_range("x", 1:5)` the variables named x01, x02, x03, x04, x05.

- `one_of(x)` => every name that appears in x, which should be a character vector.

---
##Select your variables

Use `olympic df` to select the variable(s)

1) that ends with letter `t`

2) starts with letter `S`. Try to do this selection using base R.

Check out all the [`select()`](https://dplyr.tidyverse.org/reference/select_helpers.html) options that are available.

---
##Solutions:

```r
end_t <- select(olympic, ends_with("t"))
head(end_t, n = 1)
```

```
##   Height Weight      Sport                       Event
## 1    180     80 Basketball Basketball Men's Basketball
```

```r
beg_S <- select(olympic, starts_with("S"))
head(beg_S, n = 1)
```

```
##   Sex Season      Sport
## 1   M Summer Basketball
```
of course all of this could be done using **base R** like for example:

```r
beg_S_base <- olympic[c("Sex", "Season", "Sport")]
head(beg_S_base, n = 1) 
```

```
## Sex Season Sport
## 1 M Summer Basketball
```
but it's less intuitive and often requires more typing. 
---
##Create new variables of existing variables: `mutate()`
<img src="images/mutate().png" width="400px" />

It would allow you to add to the data frame `df` a new column, `z`, which is the multiplication of the columns `x` and `y`: `mutate(df, z = x * y)`.
If we would like to observe `BMI` of the athletes we could create a new column `BMI`. The BMI is universally expressed in kg/m2, resulting from mass in kilograms and height in metres.
**Note**💡: variable `**Height**` - In centimeters!

```r
olympic <- mutate(olympic, BMI = Weight / (Height/100)^2) 
head(olympic, n = 1)
```

```
## ID Name Sex Age Height Weight Team NOC Games Year Season
## 1 1 A Dijiang M 24 180 80 China CHN 1992 Summer 1992 Summer
## City Sport Event Medal BMI
## 1 Barcelona Basketball Basketball Men's Basketball <NA> 24.69136
```
Check [here](https://dplyr.tidyverse.org/reference/mutate.html) for more functionalities with mutate.
---

##Pick observations by their values: `filter()`
<img src="images/filter().png" width="450px" />

There is a set of logical operators in **R** that you can use inside `filter()`:

- `x < y`: `TRUE` if `x` is less than `y`
- `x <= y`: `TRUE` if `x` is less than or equal to `y`
- `x == y`: `TRUE` if `x` equals `y`
- `x != y`: `TRUE` if `x` does not equal `y`
- `x >= y`: `TRUE` if `x` is greater than or equal to `y`
- `x > y`: `TRUE` if `x` is greater than `y`
- `x %in% c(a, b, c)`: `TRUE` if `x` is in the vector `c(a, b, c)`
- `is.na(x)`: Is `NA`
- `!is.na(x)`: Is not `NA`
---
##Filter your data:

Use `olympic df` to filter:

1) only Serbian teams and save it as `olympicSR`

2) only Serbian teams from 2000 onward and save it as `olympicSR21c`

3) athletes whos wight is bigger then 100kg and height is over 2m.

Don't forget to **use `==` instead of `=`**! and
Don't forget the quotes ** `""` **
---
##Solutions:

```r
olympicSR <- filter(olympic, Team == "Serbia") 
dim(olympicSR)
```

```
## [1] 388  16
```

```r
olympicSR21c <- filter(olympicSR, Year >= 2000)
dim(olympicSR21c)
```

```
## [1] 386  16
```

```r
big_athlete <- filter(olympic, Weight > 100 & Height > 200)
dim(big_athlete)
```

```
## [1] 894 16
```
---
##Reorder the rows: `arrange()`
is used to reorder rows of a **d**ata **f**rame (df) according to one of the variables/columns.

- If you pass `arrange()` a character variable, **R** will rearrange the rows in alphabetical order according to values of the variable.

- If you pass a factor variable, **R** will rearrange the rows according to the order of the levels in your factor (running `levels()` on the variable reveals this order).
---
##Arranging your data
1) Arrange Serbian athletes in `olympicSR21c` `df` by `Height` in ascending and descending order.

2) Using `olympicSR df`
  - Find the youngest athlete.
  
  - Find the heaviest athlete.
---
##Solution 1):

```r
olympicSR21c_hs <- arrange(olympicSR21c, Height)
head(olympicSR21c_hs, 2)
```

```
## ID Name Sex Age Height Weight Team NOC Games Year
## 1 81094 Olivera Moldovan F 23 158 62 Serbia SRB 2012 Summer 2012
## 2 81094 Olivera Moldovan F 27 158 62 Serbia SRB 2016 Summer 2016
## Season City Sport
## 1 Summer London Canoeing
## 2 Summer Rio de Janeiro Canoeing
## Event Medal BMI
## 1 Canoeing Women's Kayak Doubles, 500 metres <NA> 24.83576
## 2 Canoeing Women's Kayak Singles, 200 metres <NA> 24.83576
```

```r
*olympicSR21c_ht <- arrange(olympicSR21c, desc(Height))
head(olympicSR21c_ht, 2)
```

```
##       ID               Name Sex Age Height Weight   Team NOC       Games
## 1  98227 Miroslav Raduljica   M  28    213    130 Serbia SRB 2016 Summer
## 2 115246     Vladimir timac   M  28    211    112 Serbia SRB 2016 Summer
##   Year Season           City      Sport                       Event  Medal
## 1 2016 Summer Rio de Janeiro Basketball Basketball Men's Basketball Silver
## 2 2016 Summer Rio de Janeiro Basketball Basketball Men's Basketball Silver
##        BMI
## 1 28.65393
## 2 25.15667
```
---
##Solution 2):

```r
head(arrange(olympicSR, Age), 5)
```

```
## ID Name Sex Age Height Weight Team NOC
## 1 23792 Anja Crevar F 16 164 49 Serbia SRB
## 2 23792 Anja Crevar F 16 164 49 Serbia SRB
## 3 89864 Milica Ostoji F 16 172 60 Serbia SRB
## 4 54201 Tatjana Jelaa (-Mirkovi ) F 17 178 85 Serbia SRB
## 5 80027 Duan Miloevi M 17 171 62 Serbia SRB
## Games Year Season City Sport
## 1 2016 Summer 2016 Summer Rio de Janeiro Swimming
## 2 2016 Summer 2016 Summer Rio de Janeiro Swimming
## 3 2008 Summer 2008 Summer Beijing Swimming
## 4 2008 Summer 2008 Summer Beijing Athletics
## 5 1912 Summer 1912 Summer Stockholm Athletics
## Event Medal BMI
## 1 Swimming Women's 200 metres Individual Medley <NA> 18.21832
## 2 Swimming Women's 400 metres Individual Medley <NA> 18.21832
## 3 Swimming Women's 200 metres Freestyle <NA> 20.28123
## 4 Athletics Women's Javelin Throw <NA> 26.82742
## 5 Athletics Men's 100 metres <NA> 21.20311
```
---

```r
head(arrange(olympicSR, desc(Weight)), 5)
```

```
## ID Name Sex Age Height Weight Team NOC Games
## 1 62130 Asmir Kolainac M 23 187 140 Serbia SRB 2008 Summer
## 2 62130 Asmir Kolainac M 27 187 140 Serbia SRB 2012 Summer
## 3 62130 Asmir Kolainac M 31 187 140 Serbia SRB 2016 Summer
## 4 98227 Miroslav Raduljica M 28 213 130 Serbia SRB 2016 Summer
## 5 106231 Dejan Savi M 33 190 120 Serbia SRB 2008 Summer
## Year Season City Sport Event Medal
## 1 2008 Summer Beijing Athletics Athletics Men's Shot Put <NA>
## 2 2012 Summer London Athletics Athletics Men's Shot Put <NA>
## 3 2016 Summer Rio de Janeiro Athletics Athletics Men's Shot Put <NA>
## 4 2016 Summer Rio de Janeiro Basketball Basketball Men's Basketball Silver
## 5 2008 Summer Beijing Water Polo Water Polo Men's Water Polo Bronze
## BMI
## 1 40.03546
## 2 40.03546
## 3 40.03546
## 4 28.65393
## 5 33.24100
```
---
##Collapse many values down to a single summary: `summarise()`
<img src="images/summarise().png" width="450px" />

- uses the same syntax as `mutate()`, but the resulting dataset consists of a single row instead of an entire new column in the case of `mutate()`.

- builds a new dataset that contains only the summarising statistics.

Use `summarise()`:

1) to print out a summary of `olypicSR` `df` containing two variables: max_Age and max_BMI.

2) to print out a summary of `olypicSR` `df` containing two variables: mean_Age and mean_BMI.

Explore more about [`summarise()`](https://dplyr.tidyverse.org/reference/summarise.html).
---
##Solution: Summarise your data

```r
summarise(olympicSR, max_Age = max(Age), max_BMI = max(BMI))
```

```
##   max_Age  max_BMI
## 1      46 40.03546
```

```r
summarise(olympicSR, mean_Age = mean(Age), mean_BMI = mean(BMI))
```

```
##   mean_Age mean_BMI
## 1 26.38918 23.34068
```
---
class: inverse, center, middle

## Let's `%>%` all up!

Confer with your team members.

What relationship do you expect to see between:

`Age` and `Height` of the athletes?
  
  `Age` and `BMI`?

---

<img src="images/pipe_short_cut.png" width="750px" style="display: block; margin: auto;" />
---
**Do you know what this code does?**

```r
olympicSR_pipe <- olympic %>%
 filter(Team == "Serbia" & Year > 2000) %>%
 mutate(BMI = Weight / (Height/100)^2)
plot(olympicSR_pipe$Age, olympicSR_pipe$Height, cex = 0.5, col = "red")
```

<img src="images/Cosmo.jpg" width="250px" style="display: block; margin: auto;" />
---
<img src="DSSR_files/figure-html/unnamed-chunk-38-1.png" style="display: block; margin: auto;" />
---
class: inverse, center, middle

##We have learnt all of Elain's moves!!! 😃🎵🎶

<img src="images/ElainDanceII.png" width="300px" />
---
class: inverse, center, middle

## Can we make it look better?: ggplot; part III 😁
##klikR

---
class: inverse, center, middle

#"The simple graph has brought more information to the data analyst’s mind than any other device."
John Tukey
---
## grammar of graphics
Enables you to specify building blocks of a plot and to combine them to create graphical display you want. There are 8 building blocks:

- data

- aesthetic mapping

- geometric object

- statistical transformations

- scales

- coordinate system

- position adjustments

- faceting
---
##ggplot()
1. "Initialise" a plot with `ggplot()`
2. Add layers with `geom_` functions

```r
library(ggplot2)
ggplot(olympicSR_pipe, aes(x = Age, y = Height)) +
  geom_point(col ="red")
```

<img src="DSSR_files/figure-html/unnamed-chunk-40-1.png" style="display: block; margin: auto;" />
**Tip**: You can use the following code template to make graphs with [`ggplot2`](https://ggplot2.tidyverse.org):

```r
ggplot(data = <DATA>, (mapping = aes(<MAPPINGS>)) +
 <GEOM_FUNCTION>()
```
---
#ggplot() gallery
Run the following code to see what graphs it's going to produce.

```r
ggplot(data = olympic, mapping = aes(x = Height), binwidth = 10) +
  geom_histogram()
#
ggplot(data = olympic, mapping = aes(x = Height)) +
  geom_density()
#
ggplot(data = olympic, mapping = aes(x = Season, color = Sex)) +
  geom_bar()
#
ggplot(data = olympic, mapping = aes(x = Sex, fill = Season)) +
  geom_bar()
```

You can see a nice list of all kinds of `ggplot`s at <http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html>

---
##Confer with your neighbours:

**Does the BMI of the athletes depend upon their Age?**
`$$\hat{y}=\hat{\beta_0} + \hat{\beta_1} x + e$$`
Run this code in your console to fit the model `Age` vs `BMI`.

Pay attention to spelling, capitalization, and
parentheses!

```r
m1 <- lm(olympic$BMI ~ olympic$Age)
summary(m1)
```
---
**Can you answer the question usig the output of the fitted model?**

```r
m1 <- lm(olympic$BMI ~ olympic$Age)
summary(m1)
```

```
## 
## Call:
## lm(formula = olympic$BMI ~ olympic$Age)
## 
## Residuals:
## Min 1Q Median 3Q Max 
## -14.301 -1.790 -0.232 1.410 41.587 
## 
## Coefficients:
## Estimate Std. Error t value Pr(>|t|) 
## (Intercept) 19.880164 0.029279 679.0 <2e-16 ***
## olympic$Age 0.115906 0.001142 101.5 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.842 on 206163 degrees of freedom
## (64951 observations deleted due to missingness)
## Multiple R-squared: 0.04762,	Adjusted R-squared: 0.04762 
## F-statistic: 1.031e+04 on 1 and 206163 DF, p-value: < 2.2e-16
```
---
## Your turn!

Use `olympic` data.

**Does the Weight depend upon Age?**

1) Data set is big, hence let us use a sample of 10,000 athletes (tip: `sample_n(df, n)`)

2) Produce a scattep plot: what does it tell you?

3) Fit a regression model: is there a relationship? How strong is it?
Is the relationship linear? What conclusion(s) can you draw?

4) What are the other questions you could ask; could you provide the answers to them?
---
## Possible Solution Q1 & Q2: sample and scatter plot

```r
sam_olymp <- sample_n(olympic, 10000) 
ggplot(sam_olymp, aes(x = Age, y = Weight)) +
 geom_point(alpha = 0.2, shape = 21, fill = "blue", colour="black", size = 5) +
 geom_smooth(method = "lm", se = F, col = "maroon3") +
 geom_smooth(method = "loess", se = F, col = "limegreen") 
```

<img src="DSSR_files/figure-html/unnamed-chunk-45-1.png" style="display: block; margin: auto;" />
---
## Possible Solution Q3: simple regression model

```r
my.model <- lm(sam_olymp$Weight ~ sam_olymp$Age)
summary(my.model)
```

```
## 
## Call:
## lm(formula = sam_olymp$Weight ~ sam_olymp$Age)
## 
## Residuals:
## Min 1Q Median 3Q Max 
## -43.643 -9.922 -1.358 8.078 143.205 
## 
## Coefficients:
## Estimate Std. Error t value Pr(>|t|) 
## (Intercept) 56.70828 0.74080 76.55 <2e-16 ***
## sam_olymp$Age 0.56346 0.02883 19.54 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.92 on 7734 degrees of freedom
## (2264 observations deleted due to missingness)
## Multiple R-squared: 0.04706,	Adjusted R-squared: 0.04693 
## F-statistic: 381.9 on 1 and 7734 DF, p-value: < 2.2e-16
```
---
## Adding layers to your `ggplot()`

```r
ggplot(sam_olymp, aes(x = Age, y = Weight, col = "red")) +
 geom_point(alpha = 0.2, shape = 21, fill = "blue", colour="black", size = 5) +
 geom_smooth(method = "lm", se = F, col = "maroon3") +
 geom_smooth(method = "loess", se = F, col = "limegreen") +
 labs (title= "Age vs Weight", 
 x = "Age", y = "Weight") +
 theme(legend.position = "none", 
 panel.border = element_rect(fill = NA, 
 colour = "black",
 size = .75),
 plot.title=element_text(hjust=0.5)) +
 geom_text(x = 80000, y = 125, label = "regression line", col = "maroon3") +
 geom_text(x = 90000, y = 75, label = "smooth line", col = "limegreen")
```
---
## Voila
<img src="DSSR_files/figure-html/unnamed-chunk-48-1.png" style="display: block; margin: auto;" />
---
## **There is a challenge:**

- `dplyr`'s `group_by()` function enables you to group your data. It allows you to create a separate df that splits the original df by a variable.
- `datatable()` from `DT` package enables you to display as table on HTML page an R data object that could be filtered, arranged etc.
- `boxplot()` function produces boxplot(s) of the given (grouped) values.

Knowing about `group_by()` and `DT::datatable()` functions, coud we find out number of medals per each team?

```r
olympic %>% 
  filter(!is.na(Medal)) %>% 
  group_by(Team, Medal) %>% 
  summarize(cases=n()) %>% 
  DT::datatable()
```

Could you find the number of medals per each team for the last Rio games?
**Hint**💡: Games in Rio were in 2016!
---
## Possible Solution: 
<div id="htmlwidget-e2df4af3bbd1a6ec195c" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-e2df4af3bbd1a6ec195c">{"x":{"filter":"none","data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63","64","65","66","67","68","69","70","71","72","73","74","75","76","77","78","79","80","81","82","83","84","85","86","87","88","89","90","91","92","93","94","95","96","97","98","99","100","101","102","103","104","105","106","107","108","109","110","111","112","113","114","115","116","117","118","119","120","121","122","123","124","125","126","127","128","129","130","131","132","133","134","135","136","137","138","139","140","141","142","143","144","145","146","147","148","149","150","151","152","153","154","155","156","157","158","159","160","161","162","163","164","165","166","167","168","169","170","171","172","173","174","175","176","177","178","179","180","181","182","183","184","185","186","187","188","189","190","191","192","193","194","195","196","197","198","199","200","201","202","203","204","205","206","207","208","209","210","211","212","213","214","215","216","217","218","219","220","221","222","223","224","225","226","227","228","229","230","231","232","233","234","235","236","237","238","239","240","241","242","243","244","245","246","247","248","249","250","251","252","253","254","255","256","257","258","259","260","261","262","263","264","265","266","267","268","269","270","271","272","273","274","275","276","277","278","279","280","281","282","283","284","285","286","287","288","289","290","291","292","293","294","295","296","297","298","299","300","301","302","303","304","305","306","307","308","309","310","311","312","313","314","315","316","317","318","319","320","321","322","323","324","325","326","327","328","329","330","331","332","333","334","335","336","337","338","339","340","341","342","343","344","345","346","347","348","349","350","351","352","353","354","355","356","357","358","359","360","361","362","363","364","365","366","367","368","369","370","371","372","373","374","375","376","377","378","379","380","381","382","383","384","385","386","387","388","389","390","391","392","393","394","395","396","397","398","399","400","401","402","403","404","405","406","407","408","409","410","411","412","413","414","415","416","417","418","419","420","421","422","423","424","425","426","427","428","429","430","431","432","433","434","435","436","437","438","439","440","441","442","443","444","445","446","447","448","449","450","451","452","453","454","455","456","457","458","459","460","461","462","463","464","465","466","467","468","469","470","471","472","473","474","475","476","477","478","479","480","481","482","483","484","485","486","487","488","489","490","491","492","493","494","495","496","497","498","499","500","501","502","503","504","505","506","507","508","509","510","511","512","513","514","515","516","517","518","519","520","521","522","523","524","525","526","527","528","529","530","531","532","533","534","535","536","537","538","539","540","541","542","543","544","545","546","547","548","549","550","551","552","553","554","555","556","557","558","559","560","561","562","563","564","565","566","567","568","569","570","571","572","573","574","575","576","577","578","579","580","581","582","583","584","585","586","587","588","589","590","591","592","593","594","595","596","597","598","599","600","601","602","603","604","605","606","607","608","609","610","611","612","613","614","615","616","617","618","619","620","621","622","623","624","625","626","627","628","629","630","631","632","633","634","635","636","637","638","639","640","641","642","643","644","645","646","647","648","649","650","651","652","653","654","655","656","657","658","659","660","661","662","663","664","665","666","667","668","669","670","671","672","673","674","675","676","677","678","679","680","681","682","683","684","685","686","687","688","689","690","691","692","693","694","695","696","697","698","699","700","701","702","703","704","705","706","707","708","709","710","711","712","713","714","715","716","717","718","719","720","721","722","723","724","725","726","727","728","729","730","731","732","733","734","735","736","737","738","739","740","741","742","743","744","745","746","747","748","749","750","751","752","753","754","755","756","757","758","759","760","761","762","763","764","765","766","767","768","769","770","771","772","773","774","775","776","777","778","779","780","781","782","783"],["A North American Team","Afghanistan","Algeria","Algeria","Algeria","Ali-Baba II","Amateur Athletic Association","Amstel Amsterdam","Ancora","Angelita","Antwerpia V","Aphrodite","Argentina","Argentina","Argentina","Argonaut Rowing Club","Armenia","Armenia","Armenia","Aschenbrodel","Aschenbrodel","Atalanta Boat Club-1","Atalanta Boat Club-2","Atlanta","Australasia","Australasia","Australasia","Australia","Australia","Australia","Australia-1","Australia-1","Australia/Great Britain","Austria","Austria","Austria","Austria-1","Austria-1","Austria-1","Austria-2","Austria-2","Azerbaijan","Azerbaijan","Azerbaijan","Baby-1","Baby-1","Bagatelle Polo Club, Paris","Bahamas","Bahamas","Bahamas","Bahrain","Bahrain","Bahrain","Ballerina IV","Barbados","Barion/Bari-2","Barrenjoey","Beatrijs III-1","Belarus","Belarus","Belarus","Belgium","Belgium","Belgium","Belgium-1","Belgium-1","Bem II","Bera","Berliner Ruderclub","Berliner Ruderverein von 1876-2","Bermuda","Bingo","Bissbi","BLO Polo Club, Rugby","Bluebottle","Bohemia","Bohemia","Bohemia/Great Britain","Bona Fide","Bonaparte","Bonzo","Boreas-2","Boston Archers","Botswana","Brazil","Brazil","Brazil","Brazil-1","Brazil-1","Brazil-1","Brazil-2","Brazil-2","Brussels Swimming and Water Polo Club","Brynhild-2","Bucintoro Venezia","Bucintoro Venezia-1","Bulgaria","Bulgaria","Bulgaria","Buraddoo","Burundi","Burundi","Cambridge University Boat Club-2","Cameroon","Cameroon","Cameroon","Camille","Canada","Canada","Canada","Canada-1","Canada-1","Canada-1","Canada-2","Caprice","Carabinier-15","Central Turnverein, Chicago","Century Boat Club-1","Cercle de l'Aviron Roubaix-4","Chicago Athletic Association","Chicago Athletic Association-2","Chile","Chile","Chile","China","China","China","China-1","China-1","China-1","China-2","China-2","China-2","China-3","Chinese Taipei","Chinese Taipei","Chinese Taipei","Christian Brothers' College-1","Chuckles","Cicely-1","Cincinnati Archers","Cincinnati Archers","Clearwater","Club Nautique de Lyon-2","Cobweb-1","Colombia","Colombia","Colombia","Comanche","Complex II","Cornwall","Costa Rica","Costa Rica","Costa Rica","Cote d'Ivoire","Cote d'Ivoire","Cote d'Ivoire","Crabe II-1","Crabe II-4","Croatia","Croatia","Croatia","Cuba","Cuba","Cuba","Cyprus","Czech Republic","Czech Republic","Czech Republic","Czech Republic-1","Czech Republic-1","Czechoslovakia","Czechoslovakia","Czechoslovakia","Czechoslovakia-1","Denmark","Denmark","Denmark","Denmark-1","Denmark-2","Denmark/Sweden","Deutscher Schwimm Verband Berlin","Devon and Somerset Wanderers","Digby","Djibouti","Djinn","Dominican Republic","Dominican Republic","Dominican Republic","Don Schufro","Dormy-1","East Germany","East Germany","East Germany","East Germany-1","East Germany-1","East Germany-1","East Germany-2","East Germany-2","East Germany-2","Ecuador","Ecuador","Edelweiss II-1","Egypt","Egypt","Egypt","Eleda","Elisabeth V","Elisabeth X","Elsie","Elvis Va","Emily","Encore","England","England-1","England-1","Eritrea","Erna Signe","Espadarte","Esterel-1","Estonia","Estonia","Estonia","Ethiopia","Ethiopia","Ethiopia","Ethnikos Gymnastikos Syllogos","Falcon IV","Fantlet-7","Favorite Hammonia-3","Favorite-1","Femur-1","Fiji","Finland","Finland","Finland","Formosa","Fornebo","Foxhunters Hurlingham","France","France","France","France-1","France-1","France-1","France-2","France-3","France-3","France/Great Britain","Frankfurt Club","Frimousse","Gabon","Gallant","Gallia II","Galt Football Club","Gem","Gem IV","Georgia","Georgia","Georgia","Germania II","Germania Ruder Club, Hamburg-2","Germany","Germany","Germany","Germany-1","Germany-1","Germany-1","Germany-2","Germany-2","Germany-2","Ghana","Ghana","Gitana-2","Gitana-2","Glider","Great Britain","Great Britain","Great Britain","Great Britain-1","Great Britain-1","Great Britain-1","Great Britain-2","Great Britain-2","Great Britain-2","Great Britain-3","Great Britain-3","Great Britain/Germany","Greece","Greece","Greece","Greece-1","Greece-2","Grenada","Grenada","Guatemala","Gustel X","Guyana","Guyoni","Gwendoline-2","Gyrinus-1","Haiti","Haiti","Heatherbell","Heira II","Hera-1","Heroine","Hi-Hi","Hilarius","Hojwa","Hollandia","Hong Kong","Hong Kong","Hong Kong-2","Humbug V","Hungary","Hungary","Hungary","Hungary-1","Hungary-1","Hurlingham-2","Iceland","Iceland","Independent Rowing Club-3","India","India","India","Individual Olympic Athletes","Individual Olympic Athletes","Individual Olympic Athletes","Indonesia","Indonesia","Indonesia","Indonesia-1","Indonesia-1","Indonesia-1","Iran","Iran","Iran","Iraq","Ireland","Ireland","Ireland","Ireland-2","Ireland-3","Irene","Israel","Israel","Israel","Italia","Italy","Italy","Italy","Italy-1","Italy-1","Italy-1","Italy-2","Italy-2","Jamaica","Jamaica","Jamaica","Japan","Japan","Japan","Japan-1","Jest","Jo","Jordan","Joy","Jupiter","K Division Metropolitan Police Team-3","Kathleen","Kazakhstan","Kazakhstan","Kazakhstan","Kenya","Kenya","Kenya","Kerstin-1","Kitty-1","Kosovo","Kristiania Roklub-1","Kullan","Kurush II","Kuwait","Kyrgyzstan","Kyrgyzstan","L'Aile VI","Lady C","Lalage","Large boat, Central Naval Prep School \"Poros\"-1","Latvia","Latvia","Latvia","Latvia-1","Latvia-1","Laurea-1","Leander Club #1-1","Leander Club #2-2","Leander Club-1","Leander Club-2","Lebanon","Lebanon","Lerina","Lerina","Libellule de Paris","Liechtenstein","Liechtenstein","Liechtenstein","Life boat naval ship \"Spetsai\"-1","Lithuania","Lithuania","Lithuania","Liverpool Police Team-2","Llanoria","London City Police-1","Lucky Girl-1","Ludwigshafener Ruder Verein-1","Ludwigshafener Ruderverein","Lully II","Luxembourg","Luxembourg","Lyn-2","Ma'Lindo","Mac Miche","Macedonia","Macky VI","Magda IX","Magdalen College Boat Club-1","Malaysia","Malaysia","Malaysia-1","Malaysia-2","Margaret","Marinai della nave da guerra \"Varese\"","Marinai della nave da guerra \"Varese\"","Marmi II-1","Martha-1","Martha-1","Mascotte","Mauritius","May Be","Merope","Merope III","Mexico","Mexico","Mexico","Mignon-3","Milwaukee Athletic Club-1","Minerva Amsterdam","Minerva Amsterdam","Minerva Amsterdam","Minotaur","Missouri Athletic Club-3","Mohawk Indians-2","Moldova","Moldova","Monaco","Mongolia","Mongolia","Mongolia","Montenegro","Morocco","Morocco","Morocco","Moseley Wanderers","Mosk II","Mouchette-2","Mound City Rowing Club-2","Mozambique","Mozambique","Mutafo","Nadine","Namibia","Namoussa","Nepal","Netherlands","Netherlands","Netherlands","Netherlands Antilles","Netherlands-1","New College, Oxford-2","New York Athletic Club","New York Athletic Club #1-1","New York Athletic Club-1","New York Turnverein, New York","New Zealand","New Zealand","New Zealand","Niger","Niger","Nigeria","Nigeria","Nigeria","Nina","Nina Claire-2","Nirefs","Norna","North Korea","North Korea","North Korea","North Korea-1","Norway","Norway","Norway","Nrnberg","Nurdug II","Nykjbings paa Falster","Olle","Omas Helliniki P. S.","Oranje","Ormsund Roklub-2","Osborne Swimming Club, Manchester","Pakistan","Pakistan","Pakistan","Pan","Panama","Panama","Pandora","Paraguay","Peru","Peru","Phalainis ton Thorichtou \"Hydra\"-2","Phalainis ton Thorichtou \"Hydra\"-2","Philadelphia Turngemeinde, Philadelphia","Philippines","Philippines","Pistoja/Firenze","Poland","Poland","Poland","Poland-1","Polyteknisk Roklub-1","Portugal","Portugal","Portugal","Potomac Archers","Potsdam","Puerto Rico","Puerto Rico","Puerto Rico","Pupilles de Neptune de Lille #2-1","Pupilles de Neptune de Lille-1","Qatar","Qatar","Quand-Mme-2","Racing Club de France","Ralia","Ravenswood Boat Club-2","Roddklubben af 1912-1","Roehampton-1","Romania","Romania","Romania","Romania-1","Rose Pompon","Rostock","Rowing Club Castillon-3","Royal Club Nautique de Gand","Rush V","Rush VII","Russia","Russia","Russia","Russia-1","Russia-1","Russia-1","Russia-2","Russia-2","Salinero","Sans Atout-1","Santa Maria","Sarcelle-3","Satchmo","Saudi Arabia","Saudi Arabia","Scamasaxe-2","Scamasaxe-3","Scotia","Scotland-3","Seawanhaka Boat Club-1","Senegal","Serbia","Serbia","Serbia","Serbia and Montenegro","Serbia and Montenegro","Serbia and Montenegro","Shrew II","Sif","Sildra-1","Silja","Singapore","Singapore","Singapore","Sirene","Skum","Slaghoken","Slaghoken II","Slovakia","Slovakia","Slovakia","Slovenia","Slovenia","Slovenia","Smyrna","Snap","Societ Nautique de la Marne-1","Socit Nautique de Bayonne-1","Socit Nautique de Bayonne-2","Socit Nautique de la Basse Seine-1","Socit Nautique de la Basse Seine-1","Sorais-2","South Africa","South Africa","South Africa","South Korea","South Korea","South Korea","South Korea-1","South Korea-1","South Korea-1","South Korea-2","South Korea-2","South Korea-2","Soviet Union","Soviet Union","Soviet Union","Soviet Union-1","Soviet Union-1","Soviet Union-2","Soviet Union-2","Spain","Spain","Spain","Spain-1","Spain-2","Spain-2","Sri Lanka","St. Louis Amateur Athletic Association","St. Louis Southwest Turnverein #1-2","St. Louis Southwest Turnverein #2-3","St. Rose-2","Starita","Stella-2","Sudan","Sunrise","Sunshine","Suriname","Suriname","Sweden","Sweden","Sweden","Sweden-1","Sweden-2","Sweden-3","Swedish Star","Swift","Switzerland","Switzerland","Switzerland","Switzerland-1","Switzerland-1","Switzerland-1","Switzerland-2","Switzerland-2","Switzerland-2","Sylvia","Symphony","Syria","Syria","Syria","Taifun","Tajikistan","Tajikistan","Tajikistan","Tan-Fe-Pah","Tango","Tanzania","Thailand","Thailand","Thailand","Thames Rowing Club","Thessalonki-1","Tip","Togo","Tonga","Tornado","Tornado","Toronto Argonauts","Toronto Argonauts","Trans-Mississippi Golf Association-2","Trinidad and Tobago","Trinidad and Tobago","Trinidad and Tobago","Tritons Lillois-2","Tunisia","Tunisia","Tunisia","Turkey","Turkey","Turkey","Turquoise-1","Tutti V","Uganda","Uganda","Uganda","Ukraine","Ukraine","Ukraine","Ukraine-1","Unified Team","Unified Team","Unified Team","Unified Team-1","Unified Team-2","Unified Team-2","Union des Socits Franais de Sports Athletiques","Union des Socits Franais de Sports Athletiques","United Arab Emirates","United Arab Emirates","United Arab Republic","United Arab Republic","United States","United States","United States","United States Golf Association-3","United States Virgin Islands","United States-1","United States-1","United States-1","United States-2","United States-2","United States-2","United States-3","United States-4","United States/France","United States/Great Britain","Univ. of Brussels","Upton Park FC","Uruguay","Uruguay","Uruguay","USFSA","Uzbekistan","Uzbekistan","Uzbekistan","Venezuela","Venezuela","Venezuela","Venilia","Vesper Boat Club","Vietnam","Vietnam","Vinga-1","Vision","Wales-4","Wannsee","Web II","West Germany","West Germany","West Germany","West Germany-1","West Germany-1","West Germany-1","West Germany-2","West Indies Federation","Western Golf Association-1","Western Rowing Club-3","White Lady","Widgeon","Willem-Six","Winnipeg Shamrocks-1","Yugoslavia","Yugoslavia","Yugoslavia","Zambia","Zambia","Zimbabwe","Zimbabwe","Zimbabwe","Zut"],["Bronze","Bronze","Bronze","Gold","Silver","Bronze","Gold","Bronze","Gold","Gold","Bronze","Bronze","Bronze","Gold","Silver","Bronze","Bronze","Gold","Silver","Gold","Silver","Gold","Silver","Gold","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Gold","Bronze","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Silver","Bronze","Gold","Silver","Bronze","Gold","Bronze","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Bronze","Silver","Gold","Gold","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Gold","Bronze","Gold","Bronze","Bronze","Bronze","Bronze","Gold","Silver","Bronze","Bronze","Silver","Bronze","Gold","Gold","Silver","Silver","Bronze","Silver","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Silver","Silver","Silver","Gold","Gold","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Bronze","Gold","Silver","Gold","Bronze","Gold","Silver","Bronze","Gold","Silver","Silver","Bronze","Gold","Bronze","Gold","Gold","Silver","Silver","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Bronze","Gold","Silver","Silver","Bronze","Gold","Gold","Silver","Bronze","Silver","Gold","Bronze","Gold","Silver","Silver","Gold","Silver","Bronze","Gold","Silver","Bronze","Gold","Silver","Silver","Bronze","Bronze","Gold","Silver","Bronze","Gold","Silver","Silver","Bronze","Gold","Silver","Bronze","Silver","Bronze","Gold","Silver","Silver","Bronze","Gold","Silver","Bronze","Bronze","Gold","Gold","Gold","Bronze","Bronze","Silver","Bronze","Gold","Silver","Bronze","Gold","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Gold","Silver","Gold","Silver","Gold","Bronze","Gold","Silver","Gold","Gold","Silver","Silver","Gold","Silver","Silver","Silver","Gold","Silver","Bronze","Silver","Bronze","Gold","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Silver","Gold","Bronze","Silver","Gold","Gold","Bronze","Gold","Silver","Bronze","Silver","Gold","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Bronze","Gold","Silver","Silver","Bronze","Silver","Silver","Bronze","Gold","Gold","Bronze","Bronze","Gold","Silver","Bronze","Gold","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Silver","Bronze","Silver","Silver","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Silver","Gold","Bronze","Gold","Silver","Silver","Bronze","Gold","Silver","Silver","Bronze","Bronze","Bronze","Bronze","Gold","Bronze","Silver","Bronze","Gold","Gold","Gold","Silver","Gold","Bronze","Silver","Bronze","Gold","Silver","Bronze","Bronze","Gold","Silver","Bronze","Silver","Silver","Bronze","Silver","Bronze","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Bronze","Gold","Silver","Silver","Silver","Gold","Bronze","Gold","Silver","Gold","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Silver","Bronze","Gold","Silver","Bronze","Gold","Silver","Silver","Gold","Gold","Gold","Silver","Gold","Bronze","Gold","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Gold","Gold","Bronze","Gold","Silver","Bronze","Bronze","Silver","Gold","Silver","Gold","Gold","Bronze","Gold","Silver","Bronze","Silver","Bronze","Gold","Silver","Gold","Silver","Bronze","Silver","Gold","Silver","Bronze","Bronze","Gold","Silver","Silver","Bronze","Gold","Silver","Silver","Gold","Gold","Bronze","Bronze","Gold","Silver","Gold","Silver","Silver","Silver","Gold","Bronze","Bronze","Gold","Gold","Bronze","Silver","Silver","Bronze","Bronze","Bronze","Gold","Silver","Bronze","Silver","Silver","Bronze","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Gold","Bronze","Gold","Silver","Gold","Bronze","Bronze","Bronze","Silver","Bronze","Bronze","Gold","Silver","Silver","Bronze","Gold","Silver","Silver","Gold","Silver","Silver","Bronze","Gold","Silver","Silver","Silver","Bronze","Gold","Bronze","Gold","Silver","Silver","Bronze","Silver","Gold","Gold","Gold","Silver","Bronze","Gold","Silver","Bronze","Silver","Bronze","Gold","Silver","Silver","Bronze","Gold","Gold","Bronze","Gold","Silver","Bronze","Bronze","Gold","Silver","Gold","Silver","Gold","Gold","Silver","Gold","Bronze","Gold","Bronze","Gold","Silver","Gold","Bronze","Gold","Gold","Silver","Gold","Silver","Bronze","Silver","Gold","Bronze","Silver","Bronze","Bronze","Gold","Silver","Silver","Bronze","Bronze","Gold","Silver","Gold","Bronze","Bronze","Gold","Silver","Bronze","Bronze","Bronze","Silver","Silver","Silver","Bronze","Silver","Silver","Gold","Bronze","Gold","Silver","Bronze","Silver","Silver","Bronze","Silver","Gold","Silver","Bronze","Gold","Silver","Bronze","Gold","Silver","Gold","Silver","Silver","Silver","Silver","Bronze","Gold","Bronze","Silver","Bronze","Silver","Gold","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Gold","Gold","Silver","Bronze","Gold","Silver","Gold","Silver","Silver","Gold","Bronze","Gold","Silver","Bronze","Gold","Silver","Silver","Bronze","Silver","Bronze","Bronze","Bronze","Silver","Bronze","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Gold","Silver","Gold","Silver","Bronze","Silver","Bronze","Gold","Silver","Silver","Gold","Silver","Silver","Silver","Silver","Bronze","Bronze","Bronze","Bronze","Silver","Silver","Silver","Bronze","Gold","Bronze","Gold","Silver","Silver","Silver","Bronze","Bronze","Gold","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Silver","Bronze","Gold","Silver","Gold","Bronze","Gold","Silver","Silver","Silver","Silver","Bronze","Gold","Silver","Silver","Bronze","Silver","Bronze","Silver","Gold","Silver","Bronze","Silver","Silver","Bronze","Gold","Silver","Silver","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Bronze","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Bronze","Gold","Silver","Gold","Bronze","Silver","Gold","Silver","Bronze","Gold","Bronze","Silver","Bronze","Gold","Silver","Bronze","Silver","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Bronze","Silver","Bronze","Bronze","Gold","Bronze","Gold","Silver","Silver","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Gold","Gold","Silver","Silver","Silver","Bronze","Gold","Silver","Bronze","Gold","Silver","Bronze","Gold","Silver","Gold","Bronze","Gold","Bronze","Gold","Bronze","Bronze","Gold","Bronze","Gold","Silver","Bronze","Silver","Bronze","Gold","Silver","Silver"],[4,2,8,5,4,5,5,4,4,12,5,3,91,91,84,6,9,2,5,4,4,2,2,9,5,20,4,511,342,453,2,2,2,150,95,168,4,12,16,2,2,25,7,12,1,1,4,13,12,11,1,1,1,3,1,3,3,3,71,24,44,154,94,161,5,1,2,5,2,9,1,3,4,4,3,10,1,2,3,1,3,2,4,1,185,103,161,2,6,12,4,2,10,1,8,3,144,54,144,3,1,1,9,1,20,1,1,408,422,413,12,16,8,2,4,1,6,4,5,5,11,20,3,9,268,308,325,10,28,14,12,14,8,2,18,3,28,11,2,1,4,4,1,4,5,14,5,8,2,4,15,2,1,1,1,1,1,5,5,37,58,54,116,164,127,1,60,42,32,6,4,182,81,223,2,162,168,223,2,2,6,4,12,1,1,5,2,3,2,1,3,263,369,309,8,24,10,10,4,8,1,1,3,12,7,8,7,3,5,4,1,5,3,1,11,12,1,10,2,2,20,13,12,22,22,9,2,2,1,6,8,1,13,415,198,263,1,4,5,577,455,518,11,4,7,4,2,2,2,15,1,1,5,7,13,2,2,18,8,6,6,5,678,679,627,22,28,20,11,14,4,22,1,5,5,2,572,519,582,4,20,10,14,4,10,4,2,2,62,42,70,6,2,1,1,1,3,1,3,3,6,5,1,7,9,10,4,4,2,3,6,1,1,2,2,365,432,330,6,2,4,2,15,2,40,138,19,3,1,1,9,7,11,4,4,6,29,18,21,1,13,9,13,12,4,5,7,1,1,6,484,535,508,14,14,8,4,10,44,38,75,357,247,307,2,2,3,1,2,2,8,2,32,20,25,31,34,41,3,8,1,5,3,2,2,2,1,6,2,5,17,9,3,13,4,6,1,2,2,18,4,2,2,3,3,7,5,2,2,7,48,6,7,8,11,8,5,5,5,5,4,4,5,2,3,1,3,10,4,3,9,2,2,2,17,7,3,4,4,3,1,5,2,2,51,30,26,1,5,9,3,5,3,11,12,5,3,1,14,2,10,14,12,6,5,15,7,10,4,1,1,3,1,4,5,1,390,277,321,1,2,9,5,4,7,6,82,85,56,1,1,46,23,30,7,4,3,4,33,16,16,2,281,299,330,1,3,5,6,8,3,5,7,34,42,45,6,2,1,2,17,1,14,7,17,6,7,3,9,253,117,193,2,5,24,4,7,4,1,6,1,2,8,4,4,1,9,11,5,2,5,4,290,161,200,2,3,1,3,21,3,3,393,366,351,8,22,6,2,10,1,5,6,3,1,5,1,2,2,6,11,2,1,41,15,29,26,12,26,2,4,4,6,4,1,4,2,2,3,3,13,15,19,27,8,13,11,3,2,3,5,3,5,5,52,32,47,159,211,222,18,6,6,8,4,4,677,1058,716,22,4,12,12,136,108,239,2,2,2,2,12,5,5,12,2,3,1,1,2,1,1,507,451,476,2,4,2,2,2,231,144,213,30,20,22,4,8,10,6,2,1,1,1,5,2,1,1,3,3,2,13,9,8,5,11,3,1,1,2,3,9,9,10,17,7,8,5,7,3,3,28,40,27,1,5,2,2,3,98,47,52,2,79,123,69,4,2,2,17,12,1,1,1,1,1233,2474,1512,10,1,30,38,33,20,17,14,2,2,2,2,11,11,30,31,2,13,17,10,7,10,2,3,3,18,1,3,5,4,11,2,3,219,155,184,14,2,10,2,5,10,6,3,2,3,12,93,130,167,1,1,1,17,4,3]],"container":"<table class=\"display\">\n <thead>\n <tr>\n <th> <\/th>\n <th>Team<\/th>\n <th>Medal<\/th>\n <th>cases<\/th>\n <\/tr>\n <\/thead>\n<\/table>","options":{"columnDefs":[{"className":"dt-right","targets":3},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script>
---
**Exercise:** 💪 Let us Visualise data about number of female and male athletes from ex YU countries available in the data set: "Bosnia and Herzegovina", "Croatia", "Serbia", "Serbia and Montenegro", "Montenegro", "Slovenia".

First we would need to get the data we want to be presented on a graph.

```r
exyu <- olympic %>% 
 filter(Team %in% c("Bosnia and Herzegovina", "Croatia", "Serbia", "Serbia and Montenegro", "Montenegro", "Slovenia")) %>% 
 group_by(Team, Sex) %>% 
* summarize(total = n())
exyu
```

```
## # A tibble: 12 x 3
## # Groups: Team [?]
## Team Sex total
## <fct> <fct> <int>
## 1 Bosnia and Herzegovina F 39
## 2 Bosnia and Herzegovina M 95
## 3 Croatia F 236
## 4 Croatia M 640
## 5 Montenegro F 36
## 6 Montenegro M 58
## 7 Serbia F 139
## 8 Serbia M 249
## 9 Serbia and Montenegro F 58
## 10 Serbia and Montenegro M 263
## 11 Slovenia F 410
## 12 Slovenia M 697
```
---
**How do we plot this?** 🤔

```r
# we need a bar chart with each team on the x axis and number of male and female athlethes on the y axis.
ggplot(data = exyu, aes(x = Team, y = total, fill = Sex)) +
  geom_bar(stat="identity",  position="dodge", col = "black") +
# to make it read easier we will flip x & y coordinates
    coord_flip() +
# we will add description for x and y axies and title and subtitle  
    labs(x="ex YU country", y="No of athletes", 
      title = "Comparisons of M and F representatives in exYU Teams",
      subtitle = "for klikR workshop",
      caption = "Data from: kaggle - 120 years of Olympic history") +
# add the border on the graph  
    theme(panel.border = element_rect(fill = NA, colour = "black", size = 1)) +
#remove the grid lines
    theme(plot.title = element_text(size = 14, vjust = 2),
      panel.grid.major = element_blank(),
      panel.grid.minor = element_blank(), 
      axis.line = element_blank())
```
---
**Our graph!** 😇😎

<img src="DSSR_files/figure-html/unnamed-chunk-53-1.png" style="display: block; margin: auto;" />
---
class: inverse, center, middle

##Let's do Elain's Dance!!! 😃🎵🎶

<img src="images/Elain_dance.gif" width="500px" />
---
## useful links:

cheatsheets:

- [data-wrangling-cheatsheet](https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf)

- [ggplot2-cheatsheet](https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf)

websites:

- [tidyverse, visualization, and manipulation basics](https://www.rstudio.com/resources/webinars/tidyverse-visualization-and-manipulation-basics/)

- [ggplot part of tidy verse](http://ggplot2.tidyverse.org/index.html)

- [Introduction to R graphics with ggplot2](http://tutorials.iq.harvard.edu/R/Rgraphics/Rgraphics.html#introduction)
---
class: inverse, center, middle

#R Workshop: part IV
##klikR
---
#R Markdown 💻📊📈📃

Enables you to:
- save and execute code and display its output
- create high quality reports that could include [LaTeX](https://www.latex-project.org/) equations

[R Markdown](https://rmarkdown.rstudio.com/) documents are fully reproducable and support many static and dynamic output formts, to name a few: PDF, HTML, MS Word, Beamer...

It is a variant of [Markdown](https://daringfireball.net/projects/markdown/) that has embedded **R code chunks** (denoted by three backticks), to be used with [knitr](https://yihui.name/knitr/) to make it easy to create reproducible web-based reports.

To use **R Markdown** you will need to install package from [CRAN](https://cran.r-project.org/) and load it with:

```r
install.packages("rmakdown",repos = "http://cran.us.r-project.org")
suppressPackageStartupMessages(library(rmarkdown))
```
---
class: middle

You would deffinitely find usefull the following:

- [The R Markdown Cheatsheet](https://ntaback.github.io/UofT_STA130/rmarkdown-2.0.pdf)

- [The R Markdown Reference Guide](https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf)
---
#Starting with RMarkdown

**Task 1:**
Open the file `RMarkdown_Intro.Rmd`

- Change the title of the Markdown Document from `My First Markdown Document` to `RMarkdown Introduction`.

-  Click the **"Knit"** button to see the compiled version of your sample code.
---
class: inverse, center, middle

##Congratulations! You’ve just Knitted your `$1^{st}$` Rmd document!!!! 👍😃

<img src="images/kramer_congrats.gif" width="300px" />
---
## Basic Text editing

**Task 2:**
Let’s formatted this document further by

- Changing the author of the document to your own name.

- Rewriting the first sentence of the document to say "This is my first R Markdown document."

- Recompiling the document so you can see your changes?
---
##Adding a link

You can turn a word into a link by surrounding it in **hard brackets: [ ]** and then placing the link behind it in **parentheses: ( )**, like this:

[RStudio] (www.rstudio.com)

**Task 3:**
Make GitHub in the following paragraph link to https://github.com/DataTeka/DSStory
---
#Text formatting

To embed formatting instructions into your document using Markdown, you
would surround text by:
- one asterisks to make it italic: *italic*;
- two asterisks to make it bold: **bold** and
- backticks to make it monospaced: `monospaced`.

To make an ordered list you need to place each item on a new line after a
number followed by a period followed by a space:
1. order list
2. item 2
Note that you need to place a blank line between the list and any paragraphs
that come before it.
---
##**Task 4:**

- Make the following paragraph (line #20) in your Rmd document look like this:

The variables can be one of two broad types:

1) **Attribute variable**: has its outcomes described in terms of its characteristics or
attributes;

2) **Measured variable**: has the resulting outcome expressed in numerical terms.

- Make word Knit in the following paragraph bold.
---
#Embeding the `R` code 
To embed an R code chunk you would use three back ticks:

` ```{r} `

` chunk of code`

` ``` `

**Task 5**: Replace the `cars` data set with the `olympic` data set (but don't forget to read the data!).

You can also embed plots by setting `echo = FALSE` to the code chunk to
prevent printing of the R code that generates the plot:

` ```{r, echo=FALSE} `

` chunk of code`

` ``` `

**Task 6**: Replace the base boxplot of mpg vs. cyl by one of the ggplot you have created earlier (remember to upload the necessary packages!).
---
##Adding **LaTex** equations

Finally, if you wish to add mathematical equations to your Markdown
document you can easily embed LaTeX math equations into your report.

To display equation in its own line it needs to be surrounded by double dollar
symbol `$$` `y = a + bx` `$$`, or to embed an equation in line within the text you
would use only one dollar symbol: `$y = a + bx$`.

**Task 7**: Display the equation into it’s own line.
---
class: inverse, center, middle

#Congratulations! You have got the basics to start creating your own fabulous dynamic documents… !!!! 👍😃

##💻📊📈📃🤓🤪🤩😎
**Useful Links**:

R Markdown:
<http://www.stat.cmu.edu/~cshalizi/rmarkdown/>

RStudio R Markdown:
<https://rmarkdown.rstudio.com>

[RStudio Cheatsheets](https://www.rstudio.com/resources/cheatsheets/)

---

class: center, middle

# Thanks!

[www.datateka.com](www.datateka.com)

[tanjakec.github.io](tanjakec.github.io)

@DataTeka

@Tatjana_Kec

Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan).

The chakra comes from [remark.js](https://remarkjs.com), [**knitr**](http://yihui.name/knitr), and [R Markdown](https://rmarkdown.rstudio.com).