Presenting a Scale-Free Complex Network with a Persian Language Layered Composition Pattern

Document Type : Original Article

Authors

1 P.hD., Student, Department of Computer and Information Technology, Technical and Engineering Faculty, Qom University, Qom, Iran.

2 Assistant Professor, Department of Computer Engineering and Information Technology, Faculty of Technology and Engineering, University of Qom, Qom, Iran

Abstract

Purpose: This article proposes a method for investigating the patterns of composition and topological structure of the Persian language. The enhanced method analyzes Persian text by representing it as a simultaneous network graph within the framework of complex network theory.
Method: A null model of the same size is generated using the Erdos-Renyi random graph for comparison with the Persian network. The comparison is based on the average path length, clustering coefficient, and hierarchy of both networks. From the analysis of these key features, it can be seen that the Persian network graph differs from the random network. The smaller average path length and high clustering coefficient also confirm the influence of the small-world model in the Persian language.
Findings: For the first time, the Persian text was successfully converted into a complex network. An open, unbounded set of over two million words is created using a random forest approach.
Conclusion: The resulting network designed using the Bygram bag model contains 3256 nodes and 79705 edges. In addition, unlike the random network where there is only one community, 12 communities have been identified in the Persian network. Statistical evidence indicates that the Persian network is a scale-free network with a layered composition pattern.
 

Keywords

Main Subjects


Albert, R. & Barabási, A.-L. (2002). Statistical mechanics of complex networks. Reviews of Modern Physics, 74(1): 47–49. https://doi.org/10.1103/RevModPhys.74.47
Bafna, P., Pramod, D. & Vaidya, A. (2018). Document clustering: TF-IDF approach. In: 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT). Neurocomputing, 300: 70-79.
Barabasi, A. & Bonabeau, E. (2003). Scale-Free Networks. Scientific American, 288(5): 50–59. https://doi.org/10.1038/scientificamerican0503-60. PMID: 12701331
Barabási, A.L. & Albert, R. (2017). Emergence of scaling in random networks. Science, 286(15): 509-512.
Baran-Gale, J. & et al. (2020). Ageing compromises mouse thymus function and remodels epithelial cell differentiation. eLife, 9: e56221
Bassett, D.S. & Sporns, O. (2017). Network neuroscience. Nature Neuroscience, 20(3): 353–364. https://doi.org/10.1038/nn.4502
Bauer, A., Hoedoro, N. & Schneider, A. (2015). Rule-based Approach to Text Generation in Natural Language-Automated Text Markup Language (ATML3). In: Challenge+DC@RuleML.
Benson, A.R., Gleich, D.F. & Leskovec, J. (2016). Higher-order organization of complex networks. Science, 353(6295): 163-166.
Breuer, A., Elflein, S., Joseph, T., Termöhlen, J., Homoceanu, S. & Fingscheidt, T. (2019). Analysis of the effect of various input representations for LSTM-based trajectory prediction. IEEE Intell Transp Syst Conf (ITSC): 2728–2735.
Cancho, R.F.I. & Solé, R.V. (2001). The small world of human language. Proceedings of the Royal Society of London. Series B: Biological Sciences, 268(1482): 2261-2265.
Chen, G. & Lou, Y. (2019). Multi-Language Naming Game. In: Naming Game. Springer: 135-154.
Chen, H., Chen, X. & Liu, H. (2018). How does language change as a lexical network? An investigation based on written Chinese word co-occurrence networks. PloS one, 13(2): e0192545.
Chitradurga, R. & Helmy, A. (2014). Analysis of wired short cuts in wireless sensor networks. IEEE/ACS International Conference on, Pervasive Services: 167-176.
Cinar, I., Koklu, M. & Tasdemir, S. (2020). Classification of Raisin Grains Using Machine Vision and Artificial Intelligence Methods. https://doi.org/10.30855/gmbd.2020.03.03.
da Fontoura Costa, L. (2021). A caleidoscope of datasets represented as networks by the coincidence methodology. URL=      
https://researchgate.net/publication/356392287_A_Caleidoscope_of_Datasets_Represented_as_Networks_by_the_ Coincidence Methodology.
Fornito, A. (2020). An Introduction to Network Neuroscience: How to build, model, and analyse connectomes - 0800-10:00 | OHBM". URL=           
https://www.pathlms.com/ohbm/courses/12238/sections/15846/video_presentations/13753
Sajjadi, M.B. & Minaei Bidgoli, B. (2018). Persian language knowledge graph system architecture. Journal of Information Processing and Management, 35(2). [in persian]
Daud, A., Khan, W. & Che, D. (2017). Urdu language processing: a survey. Artificial Intelligence Review, 47(3): 279-311.
Fortunato, S. (2018). Community structure in complex networks. in EGC.
Fromkin, V., Rodman, R. & Hyam, N. )2018(. An introduction to language Cengage Learning. Michael Rosenberg.
Gao, Z.-K., Small, M. & Kurths, J. (2017). Complex network analysis of time series. EPL, 116(5): 50001.
Garnham, A. (2017). Artificial intelligence an introduction. Routledge.
Goh, W.P., Luke, K.-K. & Cheong, S.A. (2018). Functional shortcuts in language co-occurrence networks. PloS one, 13(9): e0203025.
Helmy, A. (2018). Small worlds in wireless networks. IEEE Commun Lett, 7(10): 490-492.
Howard., J. & Ruder, S. (2018). Universal language model fine-tuning for text classification. In: Annual Meeting of the Association for Computational Linguistics: 328–339.
Joulin, A. & et al. (2016). Bag of tricks for efficient text classification. arXiv preprint arXiv, 1607.01759.
Khan, N., Bakht, M.P. &d Waga, R.A. (2019). Corpus Construction and Structure Study of Urdu Language using Empirical Laws. Urdu News Headline, Text Classification by Using Different Machine Learning Algorithms.
Kiselev, V.Y., Andrews, T.S. & Hemberg, M. (2019). Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Genet., 20: 273–282.
LeCun, Y., Bengio, Y.  & Hinton, G. (2015). Deep learning. Nature, 521(7553): 436.             
https://doi.org/10.1038/nature14539
Lin, C., King, J., Bharadwaj, P., Chen, C., Gupta, A., Ding, W. & Prasad, M. (2019). EOG-based eye movement classification and application on HCI baseball game. IEEE Access, 7: 96166–96176.
Lucas, J., Tucker, G., Grosse, R. & Norouzi, M. (2019). Understanding posterior collapse in generative latent variable models. URL= https://openreview.net/pdf?id=r1xaVLUYuE
Newma, M. (2010). Networks: An Introduction. Oxford University Press.
Newman, M.E.J. & Watts, D.J. (2010). Renormalization group analysis of the small-world network model. Physics Letter A, vol. 263: 341-346.
Paul, G., Cao, F., Huang, Q.T., Wang, H.S., Gu, Q., Zhang, K., Shao, M. & Li., Y. (2018). An EOG-based human-machine interface for wheelchair control. IEEE Trans Biomed Eng, 65: 2023–2032.
Piryonesi, S.M., & Tamer, E.D. (2020). Role of Data Analytics in Infrastructure Asset Management: Overcoming Data Size and Quality Problems. Journal of Transportation Engineering, Part B Pavements,146(2).   
Robert, C. (2014). Machine learning, a probabilistic perspective. Taylor & Francis.
Russell, S.J. & Norvig, P.  (2016). Artificial intelligence: a modern approach. Malaysia: Pearson Education Limited.
Saberi, M., Khosrowabadi, R., Khatibi, A., Misic, B. & Jafari, G. (2021). Topological impact of negative links on the stability of resting-state brain network. Scientific Reports, 11(1): 2176. https://doi.org/10.1038/s41598-021-81767-7
Siegel, J.S. & et al. (2018). Re-emergence of modular brain networks in stroke recovery. Cortex, 101: 44-59.
Stanley, H.E., Amaral, L.A.N., Scala, A. & Barthelemy, M. (2000). Classes of small-world networks. PNAS, 97(21): 11149–52. https://doi.org/10.1073/pnas.200327197.
Strogatz, S.  & Watts, D.J. (1998). Collective dynamics of 'small-world' networks. Nature, 393(6684): 440–442. https://doi.org/10.1038/30918.
Sun, C. Qiu, X., Xu, Y. & Huang, X. (2019). How to fine-tune BERT for text classification? in: China National Conference on Chinese Computational Linguistics: 194–206.
Vijaymeena., M.K. & Kavitha, K. (2016). A survey on similarity measures in text mining. Machine Learning and Applications: An International Journal, 3(1): 19–28.
Wilhelm, T. & Kim, J. (2008). What is a complex graph? Physica A: Statistical Mechanics and its Applications, 387(11): 2637–2652. https://doi.org/10.1016/j.physa.2008.01.015.
Xie, Q., Dai, Z., Hovy, E., Luong, M.T. & Le, Q.V. (2020). Unsupervised data augmentation for consistency training. in: Annual Conference on Neural Information Processing Systems.
Yang, H., Cheng, J., Yang, Z., Zhang, H., Zhang, W., Yang, K. & Chen, X. (2021). A node similarity and community link strength-based community discovery algorithm Complexity. Complexity, 22:1-17. https://doi.org/10.1155/2021/8848566
Yule, C.U. (2014). The statistical study of literary vocabulary. Cambridge University Press.
Zhang, B., Zhou, W., CaiH, S., Wang, J., Zhang, Z. & Lei, T. (2020). Ubiquitous depression detection of sleep physiological data by using combination learning and functional networks. IEEE. https://doi.org/10.1109/ACCESS.2020.2994985
Zhang, Y., Gan, Z., Fan, K., Chen, Z., Henao, R., Shen, D. & et al. (2017). Adversarial feature matching for text generation. URL= https://arxiv.org/pdf/1706.03850.pdf
 
CAPTCHA Image