Welcome

Welcome to this R shiny application. This application may be used to explore and analyse the Lifelines dataset.

This shiny application is completely open source and licenced under AGPL. You can find all code related to this project on it's

GitHub page.

Dataset overview

Number of rows:

Number of columns:

Number of datapoints:

Number of NA values:

Variable Definitions

Here you may find definitions for all variables in the dataset. For more information, please visit the

Lifelines wiki.


Dataset table

Here you may explore the dataset. Select variables to view using the field below.

The table will (should) update as you alter the data using this application.




Plot

Here you may select variables to plot against each other to gain new insights.


Scatterplot

Scatterplots are great for exploring possible correlations!


Boxplot

Boxplots are great for exploring means, quartile distances, and outliers!

Please note that choosing a categorical variable like "GENDER" would be best for creating a good looking boxplot.


Barplot

Bar plots are great for exploring distributions, and checking frequencies!

Missing data

Missing values in a dataset (otherwise known as NA values) may lead to countless problems down the road. It is best to deal with them ASAP!


Dealing with missing data





Viewing missing data




Normalisation

The data that you're working with may be in completely different ballparks when it comes to ranges. For example, an age variable will usually range between 0 and 100, while something like caloric intake in kcal may range in the thousands. To remedy this difference in range, one may opt to normalise the dataset.

Standard score normalisation: Converts the given values to a normal distribution with a mean of 0 and a standard deviation of 1.

Min-Max normalisation: Converts the given values to a range between 0 and 1.




Data Transformation

Skewness of a given set of data can introduce a lot of problems when trying to work on a dataset. This issue can be resolved by transforming the data. Below you will find a selection of options for transforming this dataset.


Transform variable


Histogram


QQ plot

Correlations

Here you may select variables and examine their correlation. You may decide a certain variable is redundant, in which case you may opt to remove it from the dataset using the button below.

In the plot you find below the size of the circle indicates the strength of the correlation, while the shade of colour indicates whether the correlation is positive (more blue) or negative (more red).


Delete a variable


Correlation matrix

Please note that the plot will display an error message until you select some variables to compare.