Chapter 2 Data Preparation
Before we can process and analyze data files in R to generate a report, we first need to collect, collate, and prepare the data for R.
How you set up and define your data is critical to the future value of that data. Implementing a good data management strategy need not be an onerous task; it may simply require the adoption of a number of standard practices by members of a research group, and can have many benefits, including:
- Better usability and so increased value of data within a research group, along with reduced training time for new staff and less risk of loss of key knowledge when staff leave
- Easier error checking and correction, as well as avoidance of misinterpretation caused by ambiguous or unclear data
- Lower time and financial cost of analysis: well-formatted data is simpler and easier to import into analysis software, and also simplifies engagement with core analysis facilities and external service providers
- If data meets with community standards, it is easier to submit it to public resources to support research publications. Researchers who share their data have been shown to be cited more often In addition to these benefits, funding bodies such as the ARC now require a data management plan and researchers and their data may be subject to regulatory and institutional obligations, such as the Code for the Responsible Conduct of Research.