Very good first class for the program
Fall 2023Overall Rating (4.3 / 5): ★★★★☆
Professor Rating (5 / 5): ★★★★★
Lecture Rating (4.3 / 5): ★★★★☆
Difficulty (3.6 / 5):
Workload: 12 hours/week
I will try to give as straight a review as I possibly can.
Pros:
1. Good start to the program; lots of detailed information on how to use the edX platform.
2. Lots of very interesting math in the probability section
3. The statistics section teaches some *very* powerful modern techniques right away, such as bootstrapping, which are so powerful they almost make the older theoretical methods seem irrelevant (but it teaches those as well)
4. The statistics section cleverly - maybe inadvertently? - takes you on a tour of just how easy it is to screw up p-value calculations, in subtle ways that you'd never be able to see before taking this course. It is crazy how rounding things to three vs four decimal places can change evaluations of statistical significance
5. The teachers are very knowledgeable on both the history and theory of the topic
Cons:
1. Some of the probability theory proofs weren't that well explained; they often involve just staring at some problem with infinite series until the magic trick to solve it pops into your head
2. I would have preferred the class be built around Python or R rather than the StatKey software, although the latter was a useful pedagogical tool
3. The class focuses exclusively on frequentist statistics. I would have been much happier if some Bayesian stuff had been thrown in there. In general, the statistics professor seems like she is very knowledgeable about some very deep mathematical statistics but was required to keep some of it fairly lightweight for this course.
Detailed review of pros:
I thought this was a good start to the program; it had lots of detailed information on how to use the edX platform, the discussion forums, etc. There are lots of quirks to the edX platform: for instance, if you're in the US, and if a HW is due on some day, it's typically due at 6 AM on that day for the benefit of people in e.g. India, so it's really due the night before. The course explains all of these and other snags in detail. In general it seems built to be the first class you take in the program.
There is lots and lots of very interesting math in the probability section: Markov's inequality, Jensen's inequality, Chernoff bounds, convolutions, etc. Most of this was review for me but I still learned a lot of new and interesting things. Lots of famous problems in standard probability theory are discussed. It was good to patch up my (somewhat non-standard) way of having learned things in the past. People who enjoy math will enjoy this class.
The statistics section deep dives into powerful modern simulation-based techniques such as bootstrapping, and provides a strong foundation on making sure that results are built with proper numerical precision, clearly showing many pitfalls that can happen with rounding errors and etc. The professor has a very good perspective on the central issues with using p-values and etc and how to use them properly (and how they can be often improperly misused). On some level, the central lesson I got from this part of the course is just how easy it is to inadvertently screw up p-value calculations. It is almost incredible, in fact, just how many ways there are to screw this kind of thing up. There were situations where rounding things to three or four decimal places totally changed statistical significance results; situations where adjusting the value of the null hypothesis by 0.001 changed statistical significance results; situations where internal precision errors in the software to *six* decimal places changed statistical significance results. The professor doesn't really hide her distaste for the typical state of just using p < 0.05 as the cutoff for determining if some statistical result is "significant" and after this class it is very clear why.
In general, if you've always been mystified by the talk about p-values and the complexities thereof, this class will clear all of that up for you.
Detailed review of cons:
I thought some of the proofs and derivations in the probability section weren't that well explained. Some of the proofs involve doing long algebraic manipulations with infinite series, some of which seemed difficult to grasp for me. In general, many times these proofs often involve some magic, nonobvious trick that makes it all work out. On the other hand, the problems you are being asked to prove are often very famous, classical probability theory problems, so it is good to get some knowledge of them. You would do well to have some background in real analysis (although if you don't, you'll still probably get by alright).
I would be much happier if the StatKey software that we use were replaced by something standard like, let's say, R, or SciPy in Python, etc. We spent *a ton* of time in the statistics section just learning to use this software. And, worst of all, the software has lots of bizarre "quirks": computing confidence intervals, doing hypothesis tests, etc often involve sequentially clicking several unlabeled buttons such that if you screw one thing up, the entire result is wrong. When doing multiple simulations, some parts of the interface reset, but not others, and you have to remember things like this. This part of the class is basically an exercise in seeing if you can follow directions when using a very poorly-labeled interface and keep all of that stuff in your head correctly and not screw it up - which apparently I can't very well - so the only way I was able to pass this part of class was to end up basically rebuilding all of the functionality myself in Python anyway, just as a way to "sanity-check" my results. However, even this wasn't a perfect solution, as StatKey has some internal precision quirks that often cause things to be rounded incorrectly to three decimal places, and they grade on the StatKey value, not the true value.
However, I would push back on this really being a "con" as it ends up being one of the most important lessons you learn in the entire program. Some of the "problems" are not just problems with the StatKey software, but with the entire p-value procedure. It is *so* easy to get the wrong value for a statistical significance result (!) - making something significant that would be insignificant - because you put "0.33" rather than "0.333333333333..." = 1/3 for the null hypothesis and had a relatively large sample. There were situations where rounding things to three rather than four decimal places changed statistical significance results. The class somewhat-but-not-so-subtly guides you to these realizations by actually walking you through all of the things that can go wrong when you compute stuff. I'm not sure if this was the intended result, but after taking the class I am not surprised at all that there is a "replication crisis" in scientific research publications.
The last criticism I have of this class is that it takes a 100% frequentist-based view of things. They really wanted you to get up to speed on the basics of confidence intervals, hypothesis testing, etc, which are standard in most statistical publications and scientific research, as well as modern simulation-based methods for doing these things. The older frequentist methods took a very narrow view of what is possible at times, whereas the modern Bayesian methods are much more powerful and standard in machine learning. The teacher is clearly quite knowledgeable about all forms of mathematical and Bayesian statistics and left a few clues for us who are interested, but mostly stuck to frequentist methods.