A Complex network with a combination pattern of Persian language layers

Document Type : Original Article


1 Department of Computer and Information Technology, Faculty of Engineering, Qom University, Qom, Iran

2 Member of the academic staff of the Department of Computer Engineering and Information Technology, Faculty of Technology and Engineering, University of Qom, Qom, Iran


This article proposes a method to investigate compositional patterns and topological structure of Persian language. The improved method examines Persian text in the form of simultaneous network graph in the framework of complex network theory. For the first time, Persian text was successfully converted into graph. We have constructed an open, unbounded corpus of over two million words using a random forest approach. The resulting network designed with the Bygram bag model contains 3256 nodes and 79705 edges. In addition, a null model with the same size is generated according to the Erdos-Renyi random graph for comparison with the Persian network. The comparison is based on average path length, clustering coefficient and hierarchy of both networks. From the analysis of these key features, it can be seen that the Persian network graph is different from the random network. The smaller average path length and high clustering coefficient also confirm the influence of the small global model in Persian language. In addition, unlike the random network where there is only one community, 12 communities have been identified in the Persian network. Statistical facts show that the Persian network is a scale-free network with a layered composition pattern.


Main Subjects