Nearly everyone knows K-means algorithm in the fields of data mining and business intelligence. But the ever-emerging data with extremely complicated characteristics bring new challenges to this "old" algorithm. This book addresses these challenges and makes novel contributions in establishing theoretical frameworks for K-means distances and K-means based consensus clustering, identifying the "dangerous" uniform effect and zero-value dilemma of K-means, adapting right measures for cluster validity, and integrating K-means with SVMs for rare class analysis. This book not only enriches the clustering and optimization theories, but also provides good guidance for the practical use of K-means, especially for important tasks such as network intrusion detection and credit fraud prediction. The thesis on which this book is based has won the "2010 National Excellent Doctoral Dissertation Award", the highest honor for not more than 100 PhD theses per year in China.
Dr. Junjie Wu received his Ph.D. degree in Management Science and Engineering from Tsinghua University, China, in 2008. He also holds a B.E. degree in Civil Engineering from the same university. He is currently an associate professor in Information Systems Department, School of Economics and Management, Beihang University, China. He is also the Lab Director of the school, the vice director of Beijing Key Laboratory of Emergency Support Simulation Technologies for City Operations, and the outside research fellow of Research Center for Comtemporary Management, Key Research Institute of Humanities and Social Sciences at Universities, Tsinghua Univeristy. His general area of research is data mining and complex networks, with a special interest in solving the problems raised from the emerging data-intensive applications. He is currently the PI of three NSFC projects and one MOE project, and he also takes an active part in two NSFC major projects and one NSFC key project. As a young scientist, he has published over 40 papers in refereed conference proceedings and journals, such as KDD, ICDM, DMKD, TKDE, TFS, and TSMCB. He has been invited to give talks in Tsinghua University, IBM, Nokia, NLSDE, SI-TECH, etc. He has also been a reviewer for NSFC proposals, and many leading academic journals and international conferences in his area. He is the recipient of two state-level honors: the National Excellent Doctoral Dissertation award (2010) and the New Century Excellent Talents in University award (2011), and the choice of the Microsoft Star-Track program. He is a member of ACM, IEEE, AIS, and CCF.
Cluster Analysis and K-means Clustering: An Introduction.- The Uniform Effect of K-means Clustering.- Generalizing Distance Functions for Fuzzy c-Means Clustering.- Information-Theoretic K-means for Text Clustering.- Selecting External Validation Measures for K-means Clustering.- K-means Based Local Decomposition for Rare Class Analysis.- K-means Based Consensus Clustering.