CSDL Bài trích Báo - Tạp chí

Trở về

A new information theory based algorithm for clustering categorical data

Tác giả: Do Si Truong, Lam Thanh Hien, Nguyen Thanh Tung
Số trang: P. 259-278
Số phát hành: Tập 39 - Số 3
Kiểu tài liệu: Tạp chí trong nước
Nơi lưu trữ: 03 Quang Trung
Mã phân loại: 005
Ngôn ngữ: English
Từ khóa: Data mining, clustering, categorical data, information system, normalized variation of information
Chủ đề: Computer science
Tóm tắt:

In this paper, we review two baseline algorithms for use with categorical data, namely Min-Min Roughness (MMR) and Mean Gain Ratio (MGR), and propose a new algorithm, called Minimum Mean Normalized Variation of Information (MMNVI). MMNVI algorithm uses the Mean Normalized Variation of Information of one attribute concerning another for finding the best clustering attribute, and the entropy of equivalence classes generated by the selected clustering attribute for binary splitting the clustering dataset. Experimental results on real datasets from UCI indicate that the MMNVI algorithm can be used successfully in clustering categorical data. It produces better or equivalent clustering results than the baseline algorithms.

Tạp chí liên quan