The semantic clustering and labelling project has been initiated at the LGI2P research center during the PhD of Nicolas Fiorini. The main motivation behind this work is to provide a hierarchical clustering technique that relies on semantic annotations associated to documents. Such clustering is generic as those documents can be texts, videos or even gene sequences.
Semantic clustering as we propose it aims at building a hierarchy of clusters that are semantically labeled. First, such clustering is more reliable than classical hierarchical approaches as the evaluation shows. Second, labeling the clusters is often needed after clustering documents to understand what groups have been formed. As documents are clustered according to their semantic annotations, we propose to use them to label the tree nodes as well.
This work is hosted on GitHub and it can be freely downloaded to be implemented in your project.
For more information, please visit: http://sc.nicolasfiorini.info.