Automatic Inference of Terminology Relationships in the Persian Islamic Sciences Thesauruses using Graph Convolutional Networks (GCNs)

Document Type : Original Article


1 Department of Knowledge and Information Science, kharazmi University, Tehran, Iran.

2 Department of Knowledge and Information Studies, Kharazmi University, Tehran

3 Department of Computer Engineering, Qom University, Qom, Iran

4 Department of Computer Engineering, Kharazmi University, Tehran, Iran


Aim: The research seeks to provide a model for automatically inferring the relationships between terms in the Thesaurus of Islamic Sciences using Graph Convolutional Networks.Through new algorithms in the field of deep learning, the study seeked to speed up automatically extracting key terms and their relations in the Thesaurus of Islamic Sciences. Moreover, the study was to increase accuracy and comprehensiveness and reduce costs and at the same time improve relationships between terms.

Methodology: In the current research we used CGN method, which is very prevalent in the field of deep learning. The method can be benefited from the relationship patterns in the graph in addition to pay attention to the characteristics of each node. For this purpose, terms and documents containing their profiles will be converted into semantic vectors by an embedding model, where the proximity of two vectors means the proximity of the concepts of two input texts. These vectors will form the initial values of the graph vertices. At each stage of training, the model will try to predict the presence or absence of an edge between those two nodes by considering two input vectors and also summing up the neighbors of each node. The model is trained through the back propagation method to minimize its cost function. The studied dataset consistted of all terms of the TIS that were produced from 1993 to the beginning of 2021, which were considered as a graph. The terms formed the vertices and the relationship between terms the edges of this graph, and this graph is given as an input to the convolutional network and a model for the automatic inference of connections is obtained. To analyze the obtained outputs, AP and Roc standards have been used.

Findings: Our data is divided into two main parts.The first is the paragraphs that exist in a book and the second is the book indexes of the book. The goal is that the existing artificial intelligence model can learn the relationship between profiles and texts and paragraphs related to the profile and can show us their relationship. A neural network consists of an input layer and an output layer and a number of hidden layers. Through the input layer, the data enters the neural network and the output is produced by the output layer. Between these two layers, there are a number of other layers that increase the complexity of the network and increase the accuracy of the model, which are called hidden layers. The average accuracy of the trained model for the test data was 75% and also the Roc score for the test data was 72%. Since application of the adopted model to extract TIS’s key terms and assigning interrelationships were genuine during the TIS history, the obtained results are acceptable.

Conclusion: Despite the turn of opinion from thesauruses to ontologies, the use of thesauruses is still of interest for some collections including TIS. Compared to previous research, the adopted method to buid the thesaurus was novel and yielded reliable results.


Main Subjects