Monday, December 29, 2014

Wednesday, December 17, 2014

Most Popular Data Mining Algorithms

http://www2.cs.uh.edu/~ceick/DM/10Algorithms-08.pdf

This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006.

Does anyone know of anything similar that is more recent?

Tuesday, December 16, 2014

Saturday, December 13, 2014

Boosting vs Bagging

This is a paper that compares bagging and boosting on decision trees:

http://home.eng.iastate.edu/~julied/classes/ee547/Handouts/q.aaai96.pdf

The paper shows that both bagging and boosting improve over individual trees and that boosting usually gives better results than bagging, although in some cases boosting fails, probably due to its tendency to get distracted by noisy records.

Note that Quinlan is an important name in the field of machine learning. He is the one that introduced C4.5.

Thursday, December 11, 2014

NetFlix

Here are a few articles related to the Netflix prize:

Lessons from the Netflix prize challenge (By the contest winners).
The BellKor Solution (2007)
The BellKor Solution (2009)
The Pragmatic Theory Solution (2009)
The Big Chaos Solution (2009)

De-anonymization of the Netflix Dataset (It turned out to be much easier than what I thought!)



On Bootstrapping Vs Cross Validation

Here is a link from Jan about the difference between cross validation and bootstrapping:
http://www.r-bloggers.com/comparing-the-bootstrap-and-cross-validation/

Here is also a link to a very famous paper (published in 1995) that compares between bootstrapping and cross validation:
http://www.cs.iastate.edu/~jtian/cs573/Papers/Kohavi-IJCAI-95.pdf

Monday, December 8, 2014

Model Ensembles Again!

Here is a good (a bit old though) reference and experimental comparison between ensemble methods.

Wednesday, December 3, 2014

Model Ensembles

I have added some "borrowed" slides on model ensembles. These slides are a good summary for a good portion of what we have covered in class but not everything.

Make sure also to review what I write on the board.