The authors here invited ACM KDD Innovation Award and IEEE ICDM Research Contributions Award winners to each nominate up to 10 best-known algorithms in data mining, including the algorithm name, justification for nomination, and a representative publication reference. The list was voted on by other IEEE and ACM award winners to narrow this down to a top 10 list. These algorithms are used for association analysis, classification, clustering, statistical learning, and much more.You can read the paper
here.
Here are the winners:
- C4.5
- The k-Means algorithm
- Support Vector Machines
- The Apriori algorithm
- Expectation-Maximization
- PageRank
- AdaBoost
- k-Nearest Neighbor Classification
- Naive Bayes
- CART (Classification and Regression Trees)
The 2007 paper gives a brief overview of what the method is commonly used for and how it works, along with lots of references. It also has a much more detailed description of how these winners were selected than what I've said here.
The exciting thing is I've seen nearly all of these algorithms used for mining genetic data for complex patterns of genetic and environmental exposures that influence complex disease. See some recent papers at
EvoBio and
PSB. Further, lots of these methods are implemented in several
R packages.
Top 10 Algorithms in Data Mining (PDF)
This comment has been removed by the author.
ReplyDeleteInteresting article, but are you sure that you've linked to the correct file? The PDF that you refer to was published in 2007...
ReplyDeletesir, please answer my question..
ReplyDeletewhy QUEST Algorithm called Quick?
why QUEST Algorithm called Unbiased?
why QUEST Algorithm called Efficient?
thx
best Regard.
What about the recently famous symbolic regression? (google "Introducing Robo-Scientist" if you haven't heard)
ReplyDeleteNumber One is Logistic Regression! All scorecards are based on logistic regression. Furthermore, logistic regression is a simple version of Neural Network.
ReplyDeleteWhile I wouldn't go so far as to say that Logistic Regression is "Number One" (meant tongue in cheek, no doubt), I was surprised to not see it on the list...
ReplyDeleteThis is top 10 by popularity, not by efficiency. Hierarchical Bayes and Markov Fields are superior (by far) to Naive Bayes. Also there's no mention about statistical algorithms such as EM, Logistic Regression etc.
ReplyDelete