نوع مقاله : مقاله پژوهشی
نویسندگان
1 گروه کامپیوتر و فناوری اطلاعات، دانشکده فنی و مهندسی، دانشگاه قم، قم ، ایران
2 عضو هیات علمی گروه مهندسی کامپیوتر و فناوری اطلاعات، دانشکده فنی و مهندسی، دانشگاه قم، قم، ایران
چکیده
کلیدواژهها
موضوعات
عنوان مقاله [English]
نویسندگان [English]
This article proposes a method to investigate compositional patterns and topological structure of Persian language. The improved method examines Persian text in the form of simultaneous network graph in the framework of complex network theory. For the first time, Persian text was successfully converted into graph. We have constructed an open, unbounded corpus of over two million words using a random forest approach. The resulting network designed with the Bygram bag model contains 3256 nodes and 79705 edges. In addition, a null model with the same size is generated according to the Erdos-Renyi random graph for comparison with the Persian network. The comparison is based on average path length, clustering coefficient and hierarchy of both networks. From the analysis of these key features, it can be seen that the Persian network graph is different from the random network. The smaller average path length and high clustering coefficient also confirm the influence of the small global model in Persian language. In addition, unlike the random network where there is only one community, 12 communities have been identified in the Persian network. Statistical facts show that the Persian network is a scale-free network with a layered composition pattern.
کلیدواژهها [English]
ارسال نظر در مورد این مقاله