2012 ©
             Publication
Journal Publication
Title of Article Hybrid linear matrix factorization for topic-coherent term clustering 
Date of Acceptance 12 June 2016 
Journal
     Title of Journal Expert Systems with Applications 
     Standard ISI 
     Institute of Journal Elsevier 
     ISBN/ISSN 0957-4174 
     Volume Vol. 62 
     Issue  
     Month Nov.
     Year of Publication 2016 
     Page 358-372 
     Abstract Topic-coherent term clustering is the foundation of document organization, corpus summarization and document classification. It is especially useful in solving the emerging problem of big data. However, a term clustering method that can cope with high-dimension data with variable length and topics and meanwhile achieve high topic coherence is an ongoing request. It is a challenging problem in research. This paper proposes a hybrid linear matrix factorization method to identify the topic-coherent terms from documents to form a thesaurus for clustering. Starting from an analog Karhunen–Loève transformation from PCA scores fully into FA's factor coefficients space (loadings), the high-dimension of the full set of PCA scores is reduced and topic-coherent terms are classified by the main factors of FA which could be topics. Karhunen–Loève transformation reduces the total mean square error to increase topic coherence. The optimization of the initial transformation is carried out further in a manner of Karhunen–Loève expansion based on stochastic Wiener process. The optimal topic coherent bags of terms are found to build a more topic-coherent model. This approach is experimented on the CISI, MedSH and Tweets dataset in different sizes and number of topics. It achieves outstanding results better than the methods in comparison. 
     Keyword Matrix factorization; Dimensional reduction; Term clustering; Karhunen–Loève transformation 
Author
567020043-0 Mrs. PING LIANG [Main Author]
Science Doctoral Degree

Reviewing Status มีผู้ประเมินอิสระ 
Status ตีพิมพ์แล้ว 
Level of Publication นานาชาติ 
citation false 
Part of thesis true 
Attach file
Citation 0