Though visual representations of quantitative information were traditionally cast as the end phase of the data analysis pipeline, visualizations can play important roles throughout the analytic process and are critical to the work of the data scientist. Where static outputs and tabular data may render patterns opaque, human visual analysis can uncover volumes and lead to more robust programming and better data products. For students getting started with data science, visual diagnostics are particularly important for effective machine learning. When all it takes is few lines of Python to instantiate and fit a predictive model, visual analysis can help navigate the feature selection process, build intuition around model selection, identify common pitfalls like local minima and overfit, and support hyperparameter tuning to render more successful predictive models.
In this course, students will learn to deploy a suite of visual tools using Scikit-Learn, Matplotlib, Pandas, Bokeh, and Seaborn to augment the analytic process and support machine learning from preliminary feature analysis through model selection, evaluation, and tuning.
Upon successful completion of the course, students will be able to use visualizations to:
- Summarize and analyze a range of data sets.
- Support feature engineering and feature selection.
- Diagnose common machine learning problems like bias, heteroscedasticity, underfit, and overtraining.
- Evaluate their machine learning models' performance, stability, and predictive value.
- Steer their predictive models toward more successful results.
Enrollment in this course is restricted. Students must submit an application and be accepted into the Certificate in Data Science in order to register for this course.
Current Georgetown students must create an application using their Georgetown NetID and password. New students will be prompted to create an account.
Course prerequisites include:
- A bachelor's degree or equivalent
- Completion of at least two college-level math courses (e.g. statistics, calculus, etc.)
- Successful completion of Data Analysis II: Machine Learning (XBUS-505)
- Basic familiarity with programming or a programming language
- A laptop for class meetings and coursework