Implementation of Weighted Tree Similarity and Cosine Sorensen-Dice Algorithms for Semantic Search in Document Repository Information System

  • Abdurrosyiid Amrullah Universitas Muhammadiyah Gresik
  • Indra Gita Anugrah Universitas Muhammadiyah Gresik
Keywords: Weighted Tree Similarity, Semantic Search, Cosine Similarity, Sorensen Dice Similarity


Document search has several approaches, including full-text search, plain metadata search and semantic search. This study uses the Weighted Tree Similarity algorithm with the Cosine Sorensen Dice algorithm to calculate the semantic search similarity. In this study, document metadata is represented in the form of a tree that has labeled nodes, labeled branches and weighted branches. The similarity calculation on the subtree edge label uses Cosine Sorensen Dice, while the total similarity of a document uses the weighted tree similarity. The metadata structure of the document uses the taxonomy owner, description, title, disposition content and type. The result of this research is a document search application with taxonomic weight on file storage.

Engineering and Technology