Data Storage and Access (Introduction to SQL)
This course will focus on the technical skills needed for working with datasets originally in CSV, JSON, and SQL formats, as well as data management concerns and privacy regulations around data access. These are popular data formats that you will come across regularly.
The course will begin with an introduction to data modelling and the structure of databases. The majority of the course is devoted to learning about SQL. You will also learn how to read, create, and manipulate CSV and JSON files in both R and Python. You will be introduced to principles around reproducibility, sharing data and ethics. This course will also cover professional skills such as communication with different stakeholders and documentation. Please note that this course makes use of both textbooks as well as online blog posts and tutorials.
Requirements
This course is designed for those who have a degree in something other than Computer Science/Statistics and are looking to enhance their data science skills for their career.
Learning Outcomes
- An understanding of the structure of databases
- The ability to save and transport data in CSV and JSON file formats
- Familiarity with the essentials of querying and manipulating data in SQL and an understanding of how to use Google for their answers
- Familiarity with and appreciation of the legal framework around sharing data
- An understanding of how to analyze data requests and the ability to discuss data with different stakeholders such as analysts and managers
Delivery Format and Schedule
Online for 7 hours/week for 3 weeks (21 hours in total).
2023 Dates
- Monday 9 January, 6pm-8pm: Intro to data modelling and big data I (Data management systems)
- Thursday 12 January, 6pm-8pm: Intro to data modelling and big data II (Data modelling and schema)
- Saturday 14 January, 9am-noon: SQL I (Introducing SQL; data lakes; JOINs and aggregation)
- Monday 16 January, 6pm-8pm: SQL II (Window functions and subqueries; Dates and time)
- Thursday 19 January, 6pm-9pm: SQL III (Building datasets for analytics; UNION)
- Saturday 21 January, 9am-noon: SQL IV (Building datasets for machine learning)
- Monday 23 January, 6pm-8pm: Transporting data, reproducibility, ethics, inequity
- Thursday 26 January, 6pm-8pm: Professional skills: Industry case study
- Saturday 28 January, 9am-noon: Data Storage and Access: Review and Practice