ORIE 4741
Course description (from class roster):
Modern data sets, whether collected by scientists, engineers, medical researchers, government, financial firms, social networks, or software companies, are often big, messy, and extremely useful. This course addresses scalable robust methods for learning from big messy data. We'll cover techniques for learning with data that is messy - consisting of real numbers, integers, booleans, categoricals, ordinals, graphs, text, sets, and more, with missing entries and with outliers - and that is big - which means we can only use algorithms whose complexity scales linearly in the size of the data. We will cover techniques for cleaning data, supervised and unsupervised learning, finding similar items, model validation, and feature engineering.
Offered: Fall, Spring.
Prerequisites: MATH 2940, ENGRD 2700, ENGRD 2110/CS 2110, CS 2800 or equivalents.
Is Python used?
Yes. The course used to use Julia (several years ago), but switched to Python, maybe around the same time the ORIE department as a whole switched. The switch was probably helped by the fact that for many students, this was the one and only time they ever used Julia in their lives.
If Python is used, where is it used?
Python is used in homeworks and labs. (This may change in the future.)
What is Python used for?
Most of the Python is in Jupyter Notebooks, which contain demonstrations of algorithms (e.g. gradient descent, perceptron, PCA, other ML algorithms) and mathematical concepts (matrix decompositions, etc.). (This may change in the future.)
What do I need to know?
You should have a decent understanding of Python; at the minimum, you should be familiar with the material covered in CS 1110, and maybe have some algorithm knowledge on the level of CS 2110. (Actually, CS 2110 is a prerequisite for this course.)
Relevant pages