Department of Mathematics

CMSE 890-002: Fast and Memory Efficient Algorithms for Big Data

Instructor: Mark Iwen
Time and Place: Lectures are T Th 8:30 am -- 9:50 am, in Engineering Building 2320
E-mail: markiwen@math.msu.edu
Office: D220 WH
Office Hours: Monday 10:00 am -- 11:00 am, Tuesday 10:30 am -- 11:30 am, and Wednesday 10:00 am -- 11:00 am

Topics covered will include fast linear Johnson-Lindenstrauss (JL) embeddings and their applications, small space algorithms for approximate counting problems, metric space embeddings and their applications, and randomized numerical linear algebra for large scale calculations.

Ethics, Fairness, Profiling, and Privacy:

The algorithmic techniques taught in this class can be used together with large datasets to do potentially unethical things. I am often disturbed by the ideas I myself have and hear about concerning the type of profiling and inference that fast methods applied to the large amounts of personal data available about each and every one of us online could potentially enable. I encourage all students who take this class to keep ethics, privacy, and fairness issues in mind at all times. If you are new to thinking about such issues I have a few recommended videos and articles to watch/read below to help get you started:

A video discussing surveillance capitalism and related issues See Thomas Strohmer's 1W-MINDS talk, April 30, 2020.
A few articles on China's social credit score system in Time, in The Atlantic, and in The Economist.
A couple articles about the NSA collection of US citizen's data in The Guardian and in Forbes.
An article about racism, algorithms, and machine learning in The New Republic.

These issues are complex and nuanced. Citizens of every country, state, and region must be vigilant, informed, and brave enough to ask questions if we want to make sure that technology remains a benefit to ourselves and society at large as opposed to a self-inflicted curse.

Course website for CMSE890-002:

http://math.msu.edu/~markiwen/Teaching/CMSE890/CMSE890_S23.html

The course website has the course syllabus and book.

Textbook:

``Randomized numerical linear algebra: Foundations and algorithms" in Acta Numerica (2020), pp. 403 -- 572, by Per-Gunnar Martinsson and Joel A. Tropp. Here is the version we will use.

The survey is free! Download it and prosper...

We will also use these free notes written up by a student from a prior semester.

Homework/Projects:

Exercises will be assigned in class most days. They will *not* be collected or checked other than as part of the final oral exam for the class (see below) if the student has not chosen to do a project.

Alternatively, students may elect to work on projects (approved in advance by the instructor) instead of completing the homework assignments. A written report on the project will then be required before the end of the semester which will be defended as part of the final oral exam. The written project report will count for all the homework assigned while the student was completing their optional and instructor approved research project.

Students are encouraged to work with their peers on homework assignments. Computational Math is a collaborative discipline and two or three minds are often better than one. However, your final homework solutions must be written up individually in your own words and then kept to help you during the final oral exam. The solutions will be useless to you if you don't understand how they work!

Assigned Homework

HW1: Basic Probability Review Questions

HW2: Problems on Randomized Hashing Methods

HW3: Problems on Randomized Numerical Linear Algebra

HW4: Problems on the Basics of Gaussian Width

HW5: Problems on Gaussian Widths, Covering Numbers, and Linear Systems with Solution Constraints

HW6: Problems on Coherence, Reach, and Sublinear-time Compressive Sensing

Final Oral Exam:

Students must either meet with the instructor in person or via zoom. The instructor will then ask the student to solve several HW problems that were assigned in class, or else to discuss their project (if they chose to do one). The student's HW grade will be based on their participation in the exam, together with the quality of their solutions. The oral exam will be open notes -- it is expected that the student will consult their project report and/or personal homework solutions as part of the exam. In short, the purpose of this exam is to convince the instructor that the student completed ``most'' of the assigned homework (if they did not do a project), or else to describe their project, and summarize their project's conclusions, as well as to answer instructor questions about their project.

The time and place (or link) for the final oral exam will be scheduled with the instructor individually, or in small groups. All exams should take place sometime between May 1 and May 4, 2023, and should be scheduled before April 14. Each oral exam will be 30 minutes in duration for each student. The instructor reserves the right to waive this oral exam requirement, or to shorten its duration, for any student as necessary for scheduling purposes.

Class Participation:

Please ask questions, answer questions, make constructive comments, attend class, and perform in-class coding activities. In particular, participation and performance of in-class activities requires that you regularly attend class.

Grading:

Your final grade will be assigned based on the submitted homework (and/or project report), as well as on your participation (i.e., lecture attendance).

Some of the homework exercises will be challenging -- it is only expected that the submitted homework demonstrates effort. If the student prefers, they can submit their HW assignments at the end of the course in lieu of the oral exam (or to make up for an unsatisfactory oral exam).

.