Overall Rating (4.3 / 5): ★★★★☆
Professor Rating (5 / 5): ★★★★★
Lecture Rating (3.6 / 5): ★★★★☆
Difficulty (1.4 / 5):
Workload: 10 hours/week
Pros:
1. Professors were very active on Piazza and encouraged students to discuss problem sets.
2. Reviews for midterms were made available to students. These combined with quizzes and homework meant the midterm was never a surprise.
3. Course is a great review/intro of probability and statistics that should set a strong foundation for continued success in the program. To be determined.
Cons:
1. Statistics lectures didn't really add much on top of the provided notes and lacked many examples.
2. Sometimes, the hints harmed more than helped. I suggest attempting to work the problems before reading the hints.
3.
Detailed Review:
For what it's worth, I scored a 100 in the class, so I felt it was mostly easy. There was one quiz that I had a brain lapse, but it was dropped as part of their generous grading policy. I'm not sure what the distribution of grades was, but it felt like everyone else scored very high, too. This class is a great stepping stone from undergraduate classes to graduate classes. The professors are very responsive to both e-mail and Piazza and welcome questions. I never attended office hours, so I cannot speak to those or the TAs, but I did see a TA response on Piazza occasionally.
Given the disparity of foundational knowledge in the student population, the professors have adapted the course content so that it meets the needs of all of the students. Enrichment opportunities exist if you find the entire experience droll.
Assignments are all graded using a novel multiple choice scheme. This seemed to be a huge source of confusion for the students for a very long time. General rule of thumb: work the problem, get an answer, and then choose the number that is closest to your answer. Don't overthink it beyond this. I thought this method was a clever way of easily grading hundreds of assignments while keeping the students from working backwards from the solutions. The exception is the proof-based questions on the homework. Good news for future generations, we probably found most of the bugs, and they will already be corrected for you.
Textbooks were useful if you knew how to use them. The probability text was less useful, but it did have some information that helped with a few homework assignments, but you likely could find this information in other probability textbooks. The statistics textbook helped me a lot with understanding some statistics principles that weren't thoroughly covered in the lecture/notes. As a supplementary textbook, I often referred to Statistical Inference by Casella and Berger.
Overall Rating (4.3 / 5): ★★★★☆
Professor Rating (4.3 / 5): ★★★★☆
Lecture Rating (4.3 / 5): ★★★★☆
Difficulty (3.6 / 5):
Workload: 15 hours/week
Pros:
1. Doesn't dive into too much of complex topics (so recommended for beginner)
2. Introduction to RL and slightly to NLP
Cons:
1. Final project is tough and might be stressful (subjective)
2. Some lectures might not be used at all in the homeworks/projects (especially the last)
Detailed Review:
I have very little exposure to AI before this course. Also, I am amateur to python.
Given my above scenario, I initially struggled a little in completing HW1, HW2 as I was accustoming to python. Later HW3, HW4 or very challenging.
But if I look back now, I do see that I am much more comfortable to python, DL concepts (especially on how to come up with a model) and pytorch.
My Tips:
1. Always go through the grader code to get more insights if you stuck with any homework, and don't worry too much about the final project or with your peer's progress.
2. Most importantly, you need to plan a head like complete HW1, HW2 as soon as possible, and similarly try to complete all homework by 1 month before the course completion.
3. Try to beat the extra credits / extra assignments (I lost A relatively with 0.75 marks)
Overall Rating (4.3 / 5): ★★★★☆
Professor Rating (2.1 / 5): ★★☆☆☆
Lecture Rating (2.9 / 5): ★★★☆☆
Difficulty (3.6 / 5):
Workload: 25 hours/week
Pros:
1. A lot of practical knowledge, you really learn to write parallel algorithms and a lot of languages/tools/approaches
2. Zero exams
3. You get the environment required for each lab, so not needed to get a VM.
Cons:
1. Lectures are very very long. It would be better to have 5-6 10 mins sections rather than one 50-65 mins session.
2. The instructions of the assignments are very ambiguous and are written in a poor manner. For most of them, you will spend a lot of time trying to figure out what it is required to implement.
3. The assistance through Piazza was horrible. Most of my questions were unanswered. Be ready to wait at least 2 weeks to get a response (even for critical topics, like assignments)
Detailed Review:
All the other reviews are right. Expect to put a LOT of effort in this class and consider to just take this one for the semester (I took NLP alongside and I suffered, even when NLP is a light-workload class). You will end having practical knowledge in parallel programming and learning new languages and frameworks (Rust, CUDA, Go, MPI).
It is easy to work on the assignments, although you need specific hardware to it (like GPU) you get an environment already configured via Codio, which is not the best but you can get a lot of value by connecting through VSCode. I really enjoyed the assignment 3, which is the most difficult one.
The support from both TAs and Teachers was horrible. I saw the teachers answer TWO questions during the course, and both where rants out of topic. The TAs never responded questions on Piazza, no matter what was the topic (Assignment, grading, lectures, nothing). I made a question about the instructions of an assignment two months ago and it haven't been answered yet. They need to put more effort on supervising the support.
Overall Rating (4.3 / 5): ★★★★☆
Professor Rating (5 / 5): ★★★★★
Lecture Rating (5 / 5): ★★★★★
Difficulty (4.3 / 5):
Workload: 8 hours/week
Pros:
1. The projects altogether give you a sense of how a hypervisor works from a programming standpoint.
2. The exams are tight for time (tightest of all the courses I have done so far) but fair. The questions test your knowledge and conceptual understanding well.
3. The lectures explain concepts well, and the content is quite interesting and modern. The first half goes through the inner details of virtualizing CPU, storage, etc. The second half looks at the main virtualization technologies and research (Unikernels, containers, Kubernetes, etc.). The slides aren't the best for the second half, but it conveys the ideas well.
Cons:
1. Projects are group work but are not structured well for group work. You can implement the various functions separately, but you likely won't understand what's going on unless you do them all. For later projects, that missing understanding will hurt. Also, you typically can't tell if isolated functions are working correctly; if 1 function is incorrectly implemented, all you can tell is that something is wrong. In some cases, everything can seem to work well and you still have mistakes (that you won't know of until they hit you in later projects).
2. Projects are marked overly pedantically harshly. Missing 1 or 2 error checks could result in losing half or more marks for a given function.
3.
Other notes: Try to make sure your setup works as soon as you can. I ended up having to use the faculty-provided cloud resources after the 2nd project because there was something wrong with my setup causing the 3rd project to fail just for me.
I didn't do AOS before taking this course, so it's definitely not required.
Overall Rating (4.3 / 5): ★★★★☆
Professor Rating (5 / 5): ★★★★★
Lecture Rating (5 / 5): ★★★★★
Difficulty (4.3 / 5):
Workload: 30 hours/week
Background: I work as an ML Scientist at FAANG. I started this program in Fall'2020, and this review was written in Dec'2023.
This review is very specific, so a motivated person can probably find me. Let me save you the trouble, here is my LinkedIn: linkedin.com/in/ardivekar/ . Feel free to reach out with questions. I'm usually also on the MSCSO Slack.
I applied for the Thesis in Dec'2022, and completed the Thesis over 3 semesters (Spring'23 to Spring'24).
My thesis was in the NLP domain, during 2023 when AI went mainstream via ChatGPT, and new LLMs were released every single week. As you can expect, this makes my experience unique. But I expect everyone's thesis experience will be unique since it depends on your personal strengths and interests.
Another thing which I did different from other students is that I framed my own research problem. This is recommended only if you're really dead-set on one idea, like I was. Otherwise it's okay to go with an idea your Thesis advisor gives you. But make sure you're eventually doing your own research, not just contributing to someone else's project.
I've written about the process I followed to reach out to MS thesis professors. I've tried to put this together in a step-by-step fashion, starting from the very basics: https://adivekar-utexas.github.io/files/work-with-professors.pdf
If you have questions on the steps to find an advisor, please reach out to me here: linkedin.com/in/ardivekar/
Detailed Review:
This is in response to a question on on #thesis-option: "What are the pros and cons of doing a thesis?".
====================
Personally, I've grown a lot as a researcher, both in skills and temperament.
I was an industry researcher before the thesis, and had published ML papers internally (at Amazon we have an annual internal conference with ~1k submissions and acceptance rate of ~30%). I felt like I knew what I was doing, and the thesis would be a breeze. But I had never had the perspective of academic research, where you pit your ideas against the very best work. My thesis also focuses on LLMs, where the "best" changed weekly in 2023.
The last year (2023) has been very humbling. I feel like I better understand what NLP research is about, and the challenges. My advisor is very good, but at the end its up to me to do the work and push forward the ideas I think are best. This has led me to learn a lot of skills, like how to read a lot of papers fast, how to strategically cut corners to get useful experimental results, and how to orchestrate code which runs LLMs in parallel on a 100+ GPU cluster.
As to cons? Well, like I said it can feel humbling. The thesis itself is a long grind which does not always produce results.
For example, imagine you and your advisor agree that in 2 weeks, you should get results for one set of experiments. You quickly realize 2 weeks is quite a short time to implement this fairly complex idea. But you have to get it done, so you ignore calls from your spouse, push deadlines at work, and hack till 4am at night. And finally, your experiment runs and you get some results. But they are not good results...not enough to publish.
So, you pivot and try a variation of the experiment. This also does not work. In a panic you read a lot of the research papers, and you get some new ideas but you also get the feeling like people have been doing work which is much more in-depth than your own experiments.
You pivot your approach and try again, and again, for 3 months. At the end, you've built an excel table of results from many, many variations of the same core idea. But none of the results are strongly positive, and you aren't sure why (was it a code bug? data not good enough? issue with analysis metric? so many options...). Your advisor was initially excited about your project, but now seems less excited and is trying not to show it.
At this point, your self-confidence has been steadily taking a beating for 3 months. You feel like your ideas are not good, that you are stupid, and (most importantly) that you've wasted time which you and your advisor could have given to something more productive or enjoyable.
This is the reality of doing research. To continue pushing forward, you have to be very motivated by the idea you are working on, or just be bull-headed enough (or desperate enough) that you keep working on it regardless.
At different points in time I have experienced all these feelings. What set me straight was, after one set of failed experiments, I attended ACL conference. ACL is the top NLP conference in the world, and in 2023 it was attended by true experts: I saw Geoff Hinton in-person, delivering the keynote (where he was publicly criticized by Emily Bender). I sat next to Chris Manning at a talk, and rode an elevator with Luke Zettlemoyer. I met a director at FAIR from whom I heard about LLaMa-2 a week before it was launched.
But what stuck with me was the vibe. I did not see the bravado and showmanship you get at tech conferences. Everyone here (especially the students) were aware that they did not know some ultimate truth about NLP; they had just explored an idea, sometimes for years, and found something interesting to show. Some of the presentations were very under-whelming and barely seemed like a contribution, but they were being projected to everyone on a 50-foot screen. Most of the authors were nerds (with various levels of grooming) who stammered occasionally, but were excited about their findings; it was impossible to feel intimidated by them.
I left with the impression that while some of the research I saw was truly very deep, most research is ordinary. And thus, I felt confident that with effort and guidance, I could at least produce "ordinary" work (as per the caliber of a conference like ACL). And that hopefully someday, with enough effort and some luck, I might produce very deep work.
The Thesis is the journey which facilitates that feeling.
EDIT in Nov 2024: this story has a happy ending. After 1.5 years of work, the paper resulting from my thesis was accepted at EMNLP 2024 Main (a top-tier NLP venue). You can read the paper here: https://aclanthology.org/2024.emnlp-main.1071/
Overall Rating (4.3 / 5): ★★★★☆
Professor Rating (3.6 / 5): ★★★★☆
Lecture Rating (4.3 / 5): ★★★★☆
Difficulty (4.3 / 5):
Workload: 15 hours/week
Pros:
1. Solid lectures
2. Very helpful team of TA’s/LF’s
3. Shortened schedule
Cons:
1. Exam requirements are kind of tedious
2. Loose peer grading guidelines
Detailed Review:
For context, I’m an MSDS student, and this is my 4th class after probability, data structures, and regression. I thought all three classes helped prepare me for this course, but if you’re not familiar with machine learning, taking probability as a prerequisite is a must, and you should have a good grasp of linear algebra and calculus, as well as Python for the programming assignments. There is a lot of overlap with other MSDS classes, especially in Liu’s half.
I thought the lectures were solid. Both professors write out their lecture notes (Klivans in front of a classroom of students). The textbook for Kilvans’ section is pretty dense, though some of it does tie in directly to the homework. Liu provides his own supplementary textbook, which we were allowed to print out for the exam and more or less tracks with the lectures.
The class was heavily theoretical, with a little bit of application mixed in here and there. Personally, I liked being able to learn how machine learning algorithms work under the hood and coding them out in Python. Parts of the programming assignments were plug-and-chug like others have noted, though other parts involved building algorithms from scratch, which helped me better understand them.
The first couple of homework assignments are really intense, but the class eases up significantly after that. The LF’s were also extremely helpful during office hours and very responsive on Ed. Kudos to them, because it was a very large class (around 400 I think?) with students from MSDS, MSDS, and MSAI, and there were many technical kinks from the program transitioning to Canvas.
When it came to peer grading, the staff didn’t really have any guidelines, other than to be lenient, so peer grades were all over the place for some people, and the Canvas system only did averages instead of medians. LF’s were willing to do regrades, with the caveat that they could either add or take away more points.
The exams were pretty tough, especially the first one. In addition to that, we had to record ourselves taking them via webcam (understandable) and only use printed notes instead of electronic ones (tedious). The curve for final grades ended up being pretty generous. A raw score in the low 70’s was curved to a B-.
Overall Rating (4.3 / 5): ★★★★☆
Professor Rating (3.6 / 5): ★★★★☆
Lecture Rating (5 / 5): ★★★★★
Difficulty (2.9 / 5):
Workload: 12 hours/week
Pros:
1. Textbook/website is well organized and detailed
2. Instructor and TAs were responsive and helpful
3. Explains well how linear algebra relates to other CS concepts
Cons:
1. Can be time consuming
2. Uneven assignment difficulty
3.
Detailed Review:
I enjoyed this course overall and thought the content was well-presented and extremely relevant for those in this program. I have a strong math background (my undergrad degree is in math), so I felt well prepared for this course despite it being my first course in this program. I think some of the kinks from Spring 2023 have been worked out -- grading was not too slow for Summer 2023 in my opinion, with assignments coming back around 1 week after they were due. TAs were present on Piazza and most questions got answered within a day or so. The textbook and lectures were awesome and the assignments directly related to the chapter content, making them not too difficult to figure out most of the time.
However, because it is a proof based class, the weekly homework could sometimes take a significant amount of time. Some weeks, I spent just 5 hours on the course but other weeks it was up around 20 (the week of midterm 2 was pretty brutal time-wise). The end of chapter assignments were visible all semester, so one could work ahead to mitigate this.
Overall pretty good, just be prepared to have a few very long/more difficult assignments that may overlap with exams and plan accordingly.
Overall Rating (4.3 / 5): ★★★★☆
Professor Rating (3.6 / 5): ★★★★☆
Lecture Rating (3.6 / 5): ★★★★☆
Difficulty (2.9 / 5):
Workload: 8 hours/week
Pros:
1. The assignment is very close to the lecture
2.
3.
Cons:
1. A lot of formality stuff
2.
3.
Detailed Review:
I think we I really like is the demo of the prof is actually helpful and useful in the assignment. I can apply what I just learned in my homework and that feels great.
The difficulty of the assignment is medium and doable but it involves lots of presentations, paper writting, and recording. This has not much to do with the technology itself. I understand as a data scientist, story telling skill is super important but in the course, it is little too much in my opinion
Overall Rating (4.3 / 5): ★★★★☆
Professor Rating (5 / 5): ★★★★★
Lecture Rating (4.3 / 5): ★★★★☆
Difficulty (3.6 / 5):
Workload: 12 hours/week
I will try to give as straight a review as I possibly can.
Pros:
1. Good start to the program; lots of detailed information on how to use the edX platform.
2. Lots of very interesting math in the probability section
3. The statistics section teaches some *very* powerful modern techniques right away, such as bootstrapping, which are so powerful they almost make the older theoretical methods seem irrelevant (but it teaches those as well)
4. The statistics section cleverly - maybe inadvertently? - takes you on a tour of just how easy it is to screw up p-value calculations, in subtle ways that you'd never be able to see before taking this course. It is crazy how rounding things to three vs four decimal places can change evaluations of statistical significance
5. The teachers are very knowledgeable on both the history and theory of the topic
Cons:
1. Some of the probability theory proofs weren't that well explained; they often involve just staring at some problem with infinite series until the magic trick to solve it pops into your head
2. I would have preferred the class be built around Python or R rather than the StatKey software, although the latter was a useful pedagogical tool
3. The class focuses exclusively on frequentist statistics. I would have been much happier if some Bayesian stuff had been thrown in there. In general, the statistics professor seems like she is very knowledgeable about some very deep mathematical statistics but was required to keep some of it fairly lightweight for this course.
Detailed review of pros:
I thought this was a good start to the program; it had lots of detailed information on how to use the edX platform, the discussion forums, etc. There are lots of quirks to the edX platform: for instance, if you're in the US, and if a HW is due on some day, it's typically due at 6 AM on that day for the benefit of people in e.g. India, so it's really due the night before. The course explains all of these and other snags in detail. In general it seems built to be the first class you take in the program.
There is lots and lots of very interesting math in the probability section: Markov's inequality, Jensen's inequality, Chernoff bounds, convolutions, etc. Most of this was review for me but I still learned a lot of new and interesting things. Lots of famous problems in standard probability theory are discussed. It was good to patch up my (somewhat non-standard) way of having learned things in the past. People who enjoy math will enjoy this class.
The statistics section deep dives into powerful modern simulation-based techniques such as bootstrapping, and provides a strong foundation on making sure that results are built with proper numerical precision, clearly showing many pitfalls that can happen with rounding errors and etc. The professor has a very good perspective on the central issues with using p-values and etc and how to use them properly (and how they can be often improperly misused). On some level, the central lesson I got from this part of the course is just how easy it is to inadvertently screw up p-value calculations. It is almost incredible, in fact, just how many ways there are to screw this kind of thing up. There were situations where rounding things to three or four decimal places totally changed statistical significance results; situations where adjusting the value of the null hypothesis by 0.001 changed statistical significance results; situations where internal precision errors in the software to *six* decimal places changed statistical significance results. The professor doesn't really hide her distaste for the typical state of just using p < 0.05 as the cutoff for determining if some statistical result is "significant" and after this class it is very clear why.
In general, if you've always been mystified by the talk about p-values and the complexities thereof, this class will clear all of that up for you.
Detailed review of cons:
I thought some of the proofs and derivations in the probability section weren't that well explained. Some of the proofs involve doing long algebraic manipulations with infinite series, some of which seemed difficult to grasp for me. In general, many times these proofs often involve some magic, nonobvious trick that makes it all work out. On the other hand, the problems you are being asked to prove are often very famous, classical probability theory problems, so it is good to get some knowledge of them. You would do well to have some background in real analysis (although if you don't, you'll still probably get by alright).
I would be much happier if the StatKey software that we use were replaced by something standard like, let's say, R, or SciPy in Python, etc. We spent *a ton* of time in the statistics section just learning to use this software. And, worst of all, the software has lots of bizarre "quirks": computing confidence intervals, doing hypothesis tests, etc often involve sequentially clicking several unlabeled buttons such that if you screw one thing up, the entire result is wrong. When doing multiple simulations, some parts of the interface reset, but not others, and you have to remember things like this. This part of the class is basically an exercise in seeing if you can follow directions when using a very poorly-labeled interface and keep all of that stuff in your head correctly and not screw it up - which apparently I can't very well - so the only way I was able to pass this part of class was to end up basically rebuilding all of the functionality myself in Python anyway, just as a way to "sanity-check" my results. However, even this wasn't a perfect solution, as StatKey has some internal precision quirks that often cause things to be rounded incorrectly to three decimal places, and they grade on the StatKey value, not the true value.
However, I would push back on this really being a "con" as it ends up being one of the most important lessons you learn in the entire program. Some of the "problems" are not just problems with the StatKey software, but with the entire p-value procedure. It is *so* easy to get the wrong value for a statistical significance result (!) - making something significant that would be insignificant - because you put "0.33" rather than "0.333333333333..." = 1/3 for the null hypothesis and had a relatively large sample. There were situations where rounding things to three rather than four decimal places changed statistical significance results. The class somewhat-but-not-so-subtly guides you to these realizations by actually walking you through all of the things that can go wrong when you compute stuff. I'm not sure if this was the intended result, but after taking the class I am not surprised at all that there is a "replication crisis" in scientific research publications.
The last criticism I have of this class is that it takes a 100% frequentist-based view of things. They really wanted you to get up to speed on the basics of confidence intervals, hypothesis testing, etc, which are standard in most statistical publications and scientific research, as well as modern simulation-based methods for doing these things. The older frequentist methods took a very narrow view of what is possible at times, whereas the modern Bayesian methods are much more powerful and standard in machine learning. The teacher is clearly quite knowledgeable about all forms of mathematical and Bayesian statistics and left a few clues for us who are interested, but mostly stuck to frequentist methods.
Overall Rating (4.3 / 5): ★★★★☆
Professor Rating (5 / 5): ★★★★★
Lecture Rating (5 / 5): ★★★★★
Difficulty (2.9 / 5):
Workload: 16 hours/week
Pros:
1. Engaging homework.
2. Well-presented material.
3. Good support through Piazza.
Cons:
1. Time-consuming final project.
2. Shortened deadlines for summer semester.
3. Randomness of homework grading.
Detailed Review:
From other reviews, going in, I knew that the summer schedule was going to be tight. After the first assignment, every subsequent homework only had a single week to be worked on. I finished my first two assignments early, but even with that I was still having a difficult time keeping up after the third assignment. That's not to say that the homework wasn't interesting; each one had me implementing a deep learning model to solve a particular task; like object detection or image classification.
I felt that the grading was somewhat random; since it was mostly based on how your deep learning model performed on the grader's dataset. There were a wide variety of factors that could determine whether your model performed well or not, all of which are gone into in the course's material. But I feel like luck partially can influence on whether these factors are set to create a good model or not. It's a lot of trial and error, which can be time-consuming. If you did get a good performing model on your local grader, then you would usually expect between 95 to 100% on the homework's online grader, depending on chance. These issue probably reflect the real world applications of deep learning though, so it's more something to look out for than an actual negative for the course itself.
These issues also affect the final project, which consists of a large portion of your grade. For the final project, your model is put up in a competition against a videogame AI. If your model doesn't do well, then you can still get some credit by competing against other students. Again, there was some randomness in this since you didn't know how the game's AI would react and you didn't know how well other students did on their final project.
During the semester I took this course, the final project's grader had issues due to the amount of students. When tweaking the grader, some practice competitions were messed up causing the student models to perform better against the game's AI than it was meant to. This was resolved by having the grade be based on the model's best performance over all of the practice competitions against the AI. The way that the instructors and TAs handled this issue was satisfactory and hopefully are applied in future semesters, since it greatly reduced the effect of randomness on the final grade.
Speaking of the instructors and TAs, they were quite active on Piazza. For most issues, they would be answered within a day of posting and would usually give good advice. Office hours were also held, but I never needed to attend them.
The actual course material was also good, with the instructor going in-depth into deep learning topics. There were also quizzes that would test your understanding throughout. Knowledge of linear algebra and probability theory would be useful. Code examples would also be given in some lectures, showing how to actually implement the class material. These code examples were super helpful for the first few homework assignments.