Course Description
This course provides hands-on experience developing and deploying foundational machine learning algorithms on real-world datasets for practical applications including predicting housing prices, document retrieval, and product recommendation, and image classification using deep learning. Students will learn about the machine learning pipeline end-to-end including dataset creation, pre- and post-processing, preparation for machine learning, training, and evaluating multiple models. Students will focus on real-world challenges at each stage of the ML pipeline while handling bias in models and datasets.
Course Instructors
Prof. Angelique Taylor
-
-
-
-
-
-
- Instructor
- Office hours: Wednesdays from 2:40PM-3:40PM, Bloomberg 262 and over Zoom (see syllabus)
- Email: amt298@cornell.edu
-
-
-
-
-
Tauhid Tanjim
-
-
-
-
-
-
- Teaching Assistant
- Email: tt485@cornell.edu
- Office hours: Monday from 2:40PM-3:40PM, Bloomberg 262 and over Zoom (see syllabus)
-
-
-
-
-
Celine Lee
-
-
-
-
-
-
- Teaching Asssistant
- Email: cl923@cornell.edu
- Office Hours: Thursdays from 3-4pm, Bloomberg 262 and over Zoom (see syllabus)
-
-
-
-
-
Olga Leanos
-
-
-
-
-
-
- Grader
- Email: oea9@cornell.edu
- Office Hours: Tuesday from 12:00-1:00PM, Bloomberg 262 and over Zoom (see syllabus)
-
-
-
-
-
Kexin Cheang
-
-
-
-
-
-
- Grader
- Email: kc2248@cornell.edu
- Office Hours: None
-
-
-
-
-
Course Outcomes
After this course, students will be able to:
- Prepare datasets for a ML task, train and evaluate ML models
- Understand core challenges of dataset creation including handling missing data, bias, among others
- Visualize features in datasets to be used for ML tasks
- Apply, analyze, and identify key differences in regression, classification, clustering, and deep learning algorithms
- Evaluate model quality using appropriate metrics of performance
- Build front- and back-end ML pipelines for analysis of ML performance and tools for ML practitioners
Course Format
- Dataset Curation
- Building an End-to-End ML Pipeline
- Regression for Predicting Housing Prices
- Clustering for Document Retrieval
- Classification for Product Recommendation
- Deep Learning for Image Search
Lectures are on Monday from 1:30PM – 2:40PM ET, Bloomberg 61X and focused on hands-on collaborative coding in teams to build end-to-end ML pipelines.
Guest lectures will be given by experts in artificial intelligence (AI) and machine learning (ML) fields.
Prerequisites
CS 2800 or equivalent, linear algebra, probability, and experience programming with Python, or permission of the instructor.
Reading
Géron, Aurélien. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. ” O’Reilly Media, Inc.”, 2022.
Grading
Final grades are evaluated based on homework, class participation, and final project as follows:
- Homeworks – 40%
- Class participation – 10%
- Final Project – 50%
Summary of Course Topics
- Dataset Curation
- Building an End-to-End ML Pipeline
- Regression for Predicting Housing Prices
- Clustering for Document Retrieval
- Classification for Product Recommendation
- Deep Learning for Image Search
Frameworks, Libraries, & Tools
- Scikit-Learn is a free and easy-to-use library that implements many Machine Learning algorithms efficiently making it a great entry point for learning ML.
- TensorFlow is an end-to-end ML framework created at Google for processing and loading data for ML, building ML models, utilizing pre-trained models, deploying, and implementing large-scale ML applications for production.
- Streamlit is an open-source framework in Python for quick web application development with no front-end experience required.
- This course uses Python’s main scientific libraries—in particular, NumPy, pandas, and Matplotlib.
Final Projects
Final Project Midpoint Report: The submission should describe what you’ve accomplished so far, and briefly say what else you plan to do. The format should be the same as of the final project, with a maximum length of 3 pages (excluding references). The goal is to make sure that you are on track to finish the final project.
- Motivation: What problem are you tackling? Is this an application or a theoretical result?
- Method: What machine learning techniques are you planning to apply or improve upon and how?
- Preliminary experiments: Describe the experiments that you’ve run, the outcomes, and any error analysis that you’ve done. You should have tried at least one baseline.
- Future work: What else do you plan to do? The goal of the milestone is to make sure you’re on the right track.
As long as you follow the above guidelines, you should do well. Please submit the milestone via Gradescope and make sure to submit as a team (Borrowed from CS 5785).
Students will present their final project and write a technical report consistent with industry standards on May 15th at 2:40pm.
Attendance
Students are expected to attend lectures and participate in discussion as well as assignments to be successful in this course. If you miss a lecture due to an illness or emergency, refer to the recorded lectures to review what you missed.
Seek help early and often to avoid delays in feedback when issues come up while completing assignments.
If you miss a substantial number of classes due to an on-going illness, please contact Student Disability Services to arrange accommodations and inform the instructor.
Integrity
This course follows Cornell’s policies on academic integrity as outlined in the Academic Integrity Handbook.
Inclusivity
Students are expected to treat their classmates and course staff with respect. All individuals from different cultural backgrounds, genders, and sexual orientations are welcome here. When students encounter incidents that violate this, they are encouraged to inform the instructors so these issues can be addressed in a timely manner (See Cornell’s Computer Science Community Statement of Values of Inclusion).
Accessibility
We are happy to accommodate all students in terms of accessibility. Please contact the course instructors when you need help. Furthermore, the Office of Student Disability Services has available resources.
Late Policy
Students have 6 late days to use for the semester for assignment submissions (maximum of 2 per assignment), including homework and the final project. After that, the grade will be dropped one letter grade per day late. No exceptions.
Students have 1 week after assignments are returned to make a regrade request (no exceptions). Send an email to Prof. Taylor, Jinzhao Kang, and Kathryn Guda.
The assignment deadlines are due on Gradescope and as follows:
- Homeworks: Coding assignments on course topics are DUE 2 weeks after assigned.
- Final Project Proposal: Propose an FP idea; DUE on April 26th @ 11:59PM.
- Final Project Midpoint Report: Provide updates on project deliverables on May 1st during the class session.
- Final Project Presentation: Present FP in class for 15 minutes with a 3-minute Q&A; scheduled on May 13th @ 2:40PM.
- Final Project Report: Submit FP report; DUE on May 15th @ 2:40PM.
Students have 1 week after assignments are returned to make a regrade request (no exceptions). Send an email to Prof. Taylor, Tauhid, and Olga.
Collaboration Policy and Honor Code
You are expected to work on homework assignments in groups. You are expected to write up homeworks and code and reports from scratch, and you must acknowledge in your submission all the students worked with and their contribution to the project using Peer Assessment. The following are considered to be honor code violations:
- Looking at the writeup or code of another student outside your team.
- Showing your writeup or code to another student outside your team.
- Discussing homework problems in such detail that your solution (writeup or code) is almost identical to another team’s answer.
- Uploading your writeup or code to a public repository (e.g. github) so that it can be accessed by other student groups.
When debugging code together, you are only allowed to look at the input-output behavior of each other’s programs (so you should write good test cases!). It is important to remember that even if you didn’t copy but just gave another student your solution outside your team, you are still violating the honor code, so please be careful.
Use of Generative AI
Generative AI (Artificial Intelligence) is now widely available to produce text, images, and other media. Our goal as a community of learners is to explore and understand how these tools may be used to augment human performance. However, keep the following three principles in mind: (1) An AI cannot pass this course; (2) AI contributions must be attributed and true; (3) The use of AI resources must be open and documented.
Generative Artificial Intelligence (AI) models, including ChatGPT, are prohibited in this course.
Failure to document your use of AI tools, as well as any plagiarism, even inadvertently, from the use of AI tools (such as quotations or information that are not properly attributed) constitutes academic misconduct, and may be referred to the Center for Teaching Innovation.
You can find more information about using generative AI and access a secure AI tool provided by Cornell at https://teaching.cornell.edu/generative-artificial-intelligence.