Before We Speak with Data

Last week, I began an online statistics class at the awesome online education site, which offers free classes from institutions around the globe. I highly recommend this site to anyone who is interested in learning things on their own. Coursera’s courses range from statistics to science to literature. The class I’m taking is taught by Professor Andrew Conway from Princeton University. It has helped me refresh my memory on statistics knowledge I learned way back in my undergraduate school, and it has also let me pick up new skills on using some data analysis tools like R.

At my work, I often need to look at data charts, build reliable business models, and glean consumer insights from the fluctuations in data trends. This course, like Nate Silver’s book, keeps me grounded on tried and true statistics fundamentals, and gives me a good framework when looking at the causal results from our own experimental research and product AB tests.

One of the biggest takeaways so far is to avoid biases when looking at data. Before we jump to a conclusion in any data analysis, we need to check the source of the data sample, distinguish between independent variable and dependent variable, confirm that test shards are truly randomized, and examine other confounds that could potentially skew the results. Also, we have to keep in mind that experimental research usually yields a stronger causal relationship than correlational research. In the business world, we often tend to identify correlation between signals, but they don’t necessarily guarantee a causal relationship between two measurements.

Okay, that’s a wrap on my data learning for this week! It’s getting hard to write anymore with a marathon of my favorite show, “Homeland”, playing in the background. Season 3 premieres tonight, and I can’t wait for the crazy-intense drama to unfold!