A Constraints Driven PSO Based Approach for Text Summarization

Shrabanti Mandal, Girish Kumar Singh, Anita Pal

Abstract


In the present scenario we are living in a digital media and virtual world. To conveniently communicate in digital world electronic data have to gradually increase. So it is a serious challenge to manage the huge digital and electronic resources efficiently and accurately. One of the important solutions of the above problem is text summarization i.e. an application of text mining. Representing the gist of a text document is called summary. A rich summary always covers the maximum coverage, high level of diversity and with user defined size. This paper proposes an approach for summarizing the text documents by extractive way using Particle Swarm Optimization (PSO) that is known as population based stochastic optimization technique and it has many similarities with evolutionary computation techniques such as Genetic Algorithms (GA). The huge volume and dimensions of terms have managed by the concepts of term document matrix followed by K-Means clustering with PSO for acquiring optimal number of concepts clusters. Then apply constraint-driven concept for selecting the best one. These key concepts were used to identify the significant gist in documents for text summarization.

Keywords


Text summarization; Particle swarm optimization; K-means; Fitness function; Cosine measure and ROUGE

Full Text:

PDF

References


R. M. Alguliev, R. M. Aliguliyev and C. A. Mehdiyev, Sentence selection for genericdocument summarization using an adaptive differential evolution algorithm, Swarm Evolutionary Comput. 1(4) (2011), 213 – 222, DOI: 10.1016/j.swevo.2011.06.006.

R. M. Alguliev, R. M. Aliguliyev and M. S. Hajirahimova, GenDocSum Å MCLR: Generic document summarization based on maximum coverage and less redundancy, Expert Systems with Applications 39(16) (2012), 12460 – 12473, DOI: 10.1016/j.eswa.2012.04.067.

R. M. Alguliev, R. M. Aliguliyev and N. R. Isazade, CDDS: Constraint-driven document summarization models, Expert Systems with Applications 40 (2013), 458 – 465, DOI: 10.1016/j.eswa.2012.07.049.

R. M. Alguliev, R. M. Aliguliyev, M. S. Hajirahimova and C. A. Mehdiyev, MCMR: Maximum coverage and minimum redundant text summarization model, Expert Systems with Applications 38(12) (2011), 14514 – 14522, DOI: 10.1016/j.eswa.2011.05.033.

R. M. Aliguliyev, Clustering techniques and discrete particle swarm optimization algorithm for multi-document summarization, Computational Intelligence 26(4) (2010), 420–448, DOI: 10.1111/j.1467-8640.2010.00365.x.

H. Asgari, B. Masoumi and O. S. Sheijani, Automatic text summarization based onmulti-agent particle swarm optimization, in Intelligent Systems (ICIS), 2014 Iranian Conference on, IEEE, February, pp. 1 – 5 (2014), DOI: 10.1109/IranianCIS.2014.6802592.

M. S. Binwahlan, N. Salim and L. Suanmali, Swarm based text summarization, in Computer Science and Information Technology-Spring Conference, 2009, IACSITSC’09, International Association of, IEEE, 2009, April, pp. 145 – 150, DOI: 10.1109/IACSIT-SC.2009.61.

M. S. Binwahlan, N. Salim L. Suanmali, Fuzzy swarm diversity hybrid model for text summarization, Information Processing & Management 46(5) (2010), 571 – 588, DOI: 10.1016/j.ipm.2010.03.004.

O. Boydell and B. Smyth, Social summarization in collaborative web search, Information Processing & Management 46(6) (2010), 782 – 798, DOI: 10.1016/j.ipm.2009.10.011.

J. G. Carbonell and J. Goldstein, The use of MMR, diversity-based re-ranking for reordering documents and producing summaries, in Proceedings of the 21st Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia pp. 335 – 336 (1998), DOI: 10.1145/290941.291025.

M. A. Fattah and F. Ren, GA, MR, FFNN, PNN and GMM based models for automatic text summarization, Computer Speech and Language 23(1) (2009), 126 – 144, DOI: 10.1016/j.csl.2008.04.002.

E. Filatova and V. Hatzivassiloglou, A formal model for information selection in multi-sentence text extraction, in Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland, pp. 397 – 403 (2004), DOI: 10.3115/1220355.1220412.

Y. Gong and X. Liu, Generic text summarization using relevance measure and latent semantic analysis, in Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, USA, pp. 19 – 25 (2001), DOI: 10.1145/383952.383955.

S. Harabagiu and F. Lacatusu, Using topic themes for multi-documen summarization, ACM Transactions on Information Systems 28(3) (2010), 13:1 – 13:47, DOI: 10.1145/1777432.1777436.

L. Huang, Y. He, F. Wei and W. Li, Modeling document summarization as multiobjective optimization, in Proceedings of the Third International Symposium on Intelligent Information Technology and Security Informatics, Jinggangshan, China, pp. 382 – 386 (2010), DOI: 10.1109/IITSI.2010.80.

L. Huang, Y. He, F. Wei and W. Li, Modeling document summarization as multiobjective optimization, in Proceedings of the Third International Symposium on Intelligent Information Technology and Security Informatics, Jinggangshan, China, pp. 382 – 386 (2010), DOI: 10.1109/IITSI.2010.80.

E. V. Kovaleva and B. G. Mirkin, Bisecting K-means and 1D projection divisive clustering: A unified framework and experimental comparison, Journal of Classification 32(3) (2015), 414 – 442, DOI: 10.1007/s00357-015-9186-y.

J. S. Lee, H. H. Hah, S. C. Park, Less-redundant text summarization using ensemble clustering algorithm based on GA and PSO, Wseas Transactions On Computers 16 (2017), http://www.wseas.org/multimedia/journals/computers/2017/a085805-082.php.

J.-H. Lee, S. Park, C.-M. Ahn and D. Kim, Automatic generic document summarization based on non-negative matrix factorization, Information Processing & Management 45(1) (2009), 20 – 34, DOI: 10.1016/j.ipm.2008.06.002.

T. Ma and X. Wan, Multi-document summarization using minimum distortion, in Proceedings of the 10th IEEE International Conference on Data Mining, Sydney, Australia, pp. 354 – 363, (2010), DOI: 10.1109/ICDM.2010.106.

I. Mani and M. T. Maybury, Advances in Automatic Text Summarization, p. 442, MIT Press, Cambridge (1999).

R. McDonald, A study of global inference algorithms in multi-document summarization, in Proceedings of the 29th European Conference on IR Research, Rome, Italy, No. 4425, pp. 557 – 564 (2007), DOI: 10.1007/978-3-540-71496-5_51.

A. Notsu and S. Eguchi, Robust clustering method in the presence of scattered observations, Neural Computation 28(6) (2016), 1141 – 1162, DOI: 10.1162/NECO_a_00833.

Y. Ouyang, W. Li, S. Li and Q. Lu, Applying regression models to query focused multidocument summarization, Information Processing & Management 47(2) (2011), 227 – 237, DOI: 10.1016/j.ipm.2010.03.005.

D. Radev, H. Jing, M. Stys and D. Tam, Centroid-based summarization of multiple documents, Information Processing & Management b(6) (2004), 919 – 938, DOI: 10.1016/j.ipm.2003.10.006.

V. S. Raj Kumar and D. Chandrakala, An effective generic summary creation of multi and single documents using genetic algorithm, International Conference on Breakthrough in Engineering, Science & Technology, Vol. 3 (3), 154 – 158, March 2016, http://ijetst.in/article/si/35%20ijetst.pdf.

R. Rautray and R. C. Balabantaray, Cat swarm optimization based evolutionary framework for multi document summarization, Phys. A: Stat. Mech. Appl. 477(2017), 174 – 186, DOI: 10.1016/j.physa.2017.02.056.

R. Rautray and R. C. Balabantaray, Comparative study of DE and PSO over document summarization, in Intelligent Computing, Communication and Devices, Springer, India, pp. 371 – 377 (2015), DOI: 10.1007/978-81-322-2012-1_38.

R. Rautray, R. C. Balabantaray and A. Bhardwaj, Document summarization usingsentence features, Int. J. Inf. Retrieval Res. 5 (1) (2015), 36 – 47, DOI: 10.4018/IJIRR.2015010103.

K. C. Santosh, g-DICE: graph mining-based document information content exploitation, International Journal on Document Analysis and Recognition 18(4), 9 September 2015, DOI: 10.1007/s10032-015-0253-z.

K. Sarkar, Syntactic trimming of extracted sentences for improving extractive multi-document summarization, Journal of Computing 2(7) (2010), 177 – 184.

D. Shen, J.-T. Sun, H. Li, Q. Yang and Z. Chen, Document summarization using conditional random fields, in Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, pp. 2862 – 2867 (2007).

W. Song, L. C. Choi, S. C. Park and X. F. Ding, Fuzzy evolutionary optimization modeling and its applications to unsupervised categorization and extractive summarization, Expert Systems with Applications 38(8) (2011), 9112 – 9121, DOI: 10.1016/j.eswa.2010.12.102.

H. Takamura and M. Okumura, Text summarization model based on maximum coverage problem and its variant, in Proceedings of the 12th Conference of the European Chapter of the ACL, Athens, Greece, pp. 781 – 789, (2009), https://dl.acm.org/citation.cfm?id=1609154.

C. Teng, N. Xiong, Y. He, L. T. Yang and D. Liu, A behavioural mode research on userfocus summarization, Mathematical and Computer Modelling 51(7-8) (2010), 985 – 994, DOI: 10.1016/j.mcm.2009.08.015.

X.Wan, Using only cross-document relationships for both generic and topic-focused multi-document summarizations, Information Retrieval 11(1) (2008), 25 – 49, DOI: 10.1007/s10791-007-9037-5.

D. Wang and T. Li, Weighted consensus multi-document summarization, Information Processing & Management 48(3) (2012), 513 – 523, DOI: 10.1016/j.ipm.2011.07.003.

D. Wang, S. Zhu, T. Li and Y. Gong, Multi-document summarization using sentence-based topic models, in Proceedings of the ACL-IJCNLP Conference, Singapore, pp. 297 – 300 (2009), DOI: 10.3115/1667583.1667675.

D. Wang, T. Li and C. Ding, Weighted feature subset non-negative matrix factorization and its applications to document understanding, in Proceedings of the 2010 IEEE International Conference on Data Mining, Sydney, Australia, pp. 541 – 550 (2010), DOI: 10.1109/ICDM.2010.47.

D. Wang, T. Li, S. Zhu and C. Ding, Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization, in Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore pp. 307 – 314 (2008), DOI: 10.1145/1390334.1390387.

C. C. Yang and F. L.Wang, Hierarchical summarization of large documents, Journal of the American Society for Information Science and Technology 59(6) (2008), 887 – 902, DOI: 10.1002/asi.20781.

D. M. Zajic, B. J. Dorr and J. Lin, Single-document and multi-document summarization techniques for email threads using sentence compression, Information Processing & Management 44(4) (2008), 1600 – 1610, DOI: 10.1016/j.ipm.2007.09.007.

J. Zhang, X. Ma, W. Li and W. Jin, Social Network Recommendation Based on Hybrid Suffix Tree Clustering, in Computer Science and its Applications, Springer, Berlin — Heidelberg, pp. 47 – 53 (2015), DOI: 10.1007/978-3-662-45402-2_8.




DOI: http://dx.doi.org/10.26713%2Fjims.v10i4.891

eISSN 0975-5748; pISSN 0974-875X