Three Principles of Tidy Data
The three principles of tidy data includes each variable forming a column, each observation forming a row, and each type of observation unit forming a table. The best way to tidy messy datasets is by firstly ensuring that column headers not are values, but rather variable names. If this is the case, in order to fix it we must “melt” or stack columns into rows, to create a “molten” dataset. Secondly, we must make sure that multiple variables are not stored in one column. This may fixed by simply adding additional variables (columns) as needed. Thirdly, we must ensure that variables are not stored in both rows and columns. This is done by performing a cast operation, which unstacks untidy elements. Fourth, we must ensure that there are not multiple types of observational units in one table. This may require breaking down datasets into multiple datasets, a process known as normalization. Lastly, we must make sure that there isn’t one type of observational unit in multiple tables. If there is, we must put the files into lists of tables, add a new column for each table that includes the original file name, and then combine all tables into a single table.
For my current research work on cultural effects on women’s autonomy in healthcare decision-making, I may be able to implement tidy data based on a survey. This may look like a table with the variables (columns): country, age, religion, ethnicity, martial status, and response. The “response” column refers the participant’s answer to the question, “Do you feel like you have autonomy when it comes to your healthcare decision-making?”. For the sake of this tidy dataset, this answer would be entered as either a “y” for yes, or “n” for no. Using this tidy dataset, I may be able to more easily identify correlations between culture and women’s autonomy in healthcare. This can also be made easier to visualize by creating plots or charts from the dataset using tidy tools. Three side-by-side bar plots may be created with a value axis of Number of Women, a category axis of Yes or No, and color axises of Religions, Ethnicities, and Countries. Then, using qualitative data collected through interviews, I would be able to identify which aspects (beliefs, traditions, et cetera) of these cultures are limiting to women’s autonomy.