Analysis of microarray data by genetic algorithm

Narayan Behera; Shruti Sinha; Ankit K. Srivastava; Suboot Hairat; Neha Neha; Prasanta Prasanta

doi:10.53730/ijhs.v6nS6.11068

Authors

Narayan Behera
narayanbehera@svyasa.edu.in
Institute of Bioinformatics and Applied Biotechnology, Electronics City, Phase I,Bengaluru – 560100, India | Department of Applied Physics, School of Natural Sciences, Adama Science and Technology University, Adama, P O Box 1888, Ethiopia | SVYASA University, Eknath Bhavan, Kempegowda Nagar, Bengaluru 560019, India
Shruti Sinha Institute of Bioinformatics and Applied Biotechnology, Electronics City, Phase I, Bengaluru – 560100, India
Ankit K. Srivastava Department of Applied Physics, School of Natural Sciences, Adama Science and Technology University, Adama, P O Box 1888, Ethiopia | School of Science, Indrashil University, Mehsana 382740, India
Suboot Hairat Department of Biotechnology, Wachemo University, Hossana, Ethiopia
Neha Department of Biotechnology, Deenbandhu Chhoturam University of Science and Technology, Murthal, Sonipat, India
Prasanta Department of Biotechnology, Manav Rachna International Institute of Research and Studies, Surajkund, Faridabad, Haryana (India) -121004

Keywords:

genetic algorithm, grouping genes, classification of microarray samples, candidate genes for disease, microarray data, entropy, mutual information, optimization

Abstract

Microarray gene expression data is used to understand the actions of thousands of genes. Just a few genes out of thousands have a significant impact in any cancer process. Finding these defective genes using experimental data is impractical. To locate the relevant genes, computational techniques are required. A method to identifying cancer candidate genes from microarray data is created. Clustering of similar genes is necessary to find co-expressed genes in different biological conditions. It is important to develop methods to find the few candidate genes for cancer. An optimization process is used for such purpose. A genetic algorithm employs the principles of evolution: selection, recombination, and mutation to solve an optimization problem. Mutual information is used to find the dependency between genes. The two genes are similar if their expression levels are comparable. The similarity as well as positive and negative correlations between genes is considered while clustering them. Interdependence measure tells how the genes are correlated. The genes responsible for a sick state have higher interdependence measures. These genes are defective genes having cancer diagnostic information. Here microarray gene expression datasets from gastric cancer and colon cancer from the public domain are considered.

Downloads

Download data is not yet available.

References

Alanni R et al (2019). Deep gene selection method to select genes from microarray datasets for cancer classification, BMC Bioinformatics, 20, Article number 608.

Alon, U., et al. (1999) Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. Proc. Nat’l Academy of Sciences of the United States of America 96(12): 6745-6750

Amaral M L et al (2018) BART: bioinformatics array research tool. BMC Bioinformatics, 19, article no 296.

Au, W. et al. (2005) Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2(2): 83 – 101.

Aydadenta H and Adiwijaya (2017) On the classification techniques in data mining for microarray data classification. Journal of physics: Conference series 971, 012004.

Behera N & Nanjundiah V (1995) An Investigation into the Role of Phenotypic Plasticity in Evolution / Journal of Theoretical Biology Vol.172, No. 3, 225-234.

Behera N & Nanjundiah V (1996) The Consequence of phenotypic plasticity in cyclically varying environments: a genetic algorithm study / Journal of Theoretical biology, Vol.178, No.2, 135-144

Behera N & Nanjundiah V (1997) trans-Gene Regulation in Adaptive Evolution: a Genetic Algorithm Model / Journal of Theoretical Biology Vol. 188, 153-162.

Behera N & Nanjundiah V (2004) Phenotypic plasticity can potentiate rapid evolutionary change / Journal of Theoretical Biology, 226, 177-184.

Behera, N., Jeevitesh, M., Jose, J., Kant, K., Dey, A. & M Mazher (2017) / Higher accuracy

Behera, N., Sinha, S, Gupta, R, Geoncy, A., Dimitrova, N & Mazher M (2018) Analysis of gene expression data by evolutionary clustering algorithm IEEE Explore (DOI 10.1109/ICIT.2017.41 in 2018)

Behera, Narayan. 1997. "Effect of phenotypic plasticity on adaptation and evolution: a genetic algorithm analysis." Current Science 73:968-976

Berrar, D. P., Dubitzky, W & Granzow, M (2003) A Practical approach to Micro-array data analysis,Kluwer Academic Publishers, London.

Cheng, Y. and Church, G.M. (2000) Biclustering of expression data. Proc. Int. Conf. Intell. Syst. Mol.Biol,. 8, 93–103.

Cho, S.W., Kim, D H, Uhmn, S, Ko Y W, Cheong J Y and Kim, J (2007) Chronic Hepatitis and Cirrhosis Classification Using SNP Data, Decision Tree and Decision Rule. ICCSA (3): 585-596.

Cios, K.J., and Kurgan, L., (2004) Discretization Algorithm that Uses Class-Attribute Interdependence Maximization. IEEE/ACM Transactions on Knowledge and Data Engineering 16(2): 145 -153(2004).

discretization of continuous data. Intelligent Data Analysis 8(2): 151-170.

Eisen, M.B., Spellman P T, Brown P O, and Botstein D (1998) Cluster analysis and display of genome- wide expression patterns. Proc Natl Acad Sci U S A. 1998 Dec 8;95(25):14863-8

Elomaa, T.(1994) In defense of C4.5: Notes on learning one-level decision trees, Proc. of the 11th Int. Conf.on Machine Learning, Morgan Kaufmann, 62- 69.

Gandamayu, I. B. M., Antari, N. W. S., & Strisanti, I. A. S. (2022). The level of community compliance in implementing health protocols to prevent the spread of COVID-19. International Journal of Health & Medical Sciences, 5(2), 177-182. https://doi.org/10.21744/ijhms.v5n2.1897

Goldberg, D E (2008) Genetic algorithms in Search, Optimization and Machine learning, Pearson Education, India

Golub, T.R., et al. (1999) Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. PNAS 96, 2907-2912.

Hambali M A et al (2020) Microarray cancer feature selection: Review, challenges and research directions. International journal of cognitive computing in Engineering, Vol 1, pages 78-97.

Herbola A et al (2022) Chapter 27- Bioinformatics and biological data mining. Bioinformatics: methods and applications, Academic press, 457-471.

Heyer, L.J., Kruglyak S and Yooseph S (1999) Exploring expression data: identification and analysis of coexpressed genes. Genome Research 9, 1106-1115.

Holland, J H (1975) Adaptation in natural and artificial systems, University of Michigan Press, Ann Arbor, MI, USA.

Johnson S. C., (1967) Hierarchical Clustering Schemes Psychometrika, 2:241-254.

Kohonen, T., (1990) The self-organizing map Proc. IEEE 78,1464-1479.

Koschmieder A et al (2012) Tools for managing and analyzing microarray data. Briefings in Bioinformatics, 13, 46-60.

Krishna K, Murty M (1999) Genetic K-means algorithm. IEEE Transactions on Systems, Man and Cybernetics - Part B: Cybernetics 29:433-439.

Liu et al (2009) Computational data mining in cancer bioinformatics and cancer Epidemiology. BioMed Research international. DOI.org//1155//2009//582697.

Liu, Y., Shen M, Wen J F, and Hu Z L (2006) Expressions of TGIF, MMP9 and VEGF proteins and their clinicopathological relationship in gastric cancer. PUBMED Feb;31(1):70-4.

Lloyd, S.P., (1982) Least Squares Quantization in PCM. IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129-137

protein multiple sequence alignment by genetic algorithm, Procedia Computer Science, Vol

Quinlan, J.R. (1993) C4.5: Programs for machine learning. Morgan Kaufman, San Francisco.

Razak K (2013) Application in the domain of data mining. Indian Journal of Computer Science and Engineering, 1, 114-118.

Selvaraj S and Natarajan J (2011) Microarray data analysis and mining tools. Bioinformation, 6, 95-99.

Solmaz M et al (2019) Graphical data mining of cancer mechanisms with SEMA. Bioinformatics, 35, 4413-4418.

Suryasa, I. W., Rodríguez-Gámez, M., & Koldoris, T. (2021). The COVID-19 pandemic. International Journal of Health Sciences, 5(2), vi-ix. https://doi.org/10.53730/ijhs.v5n2.2937

Tsutsumi, S., Hippo Y, Taniguchi H, and Machida N (2002) Global gene expression analysis of gastric cancer by oligonucleotide microarrays. Cancer Res Jan 1;62(1):233-40.

Wang, L., Zhu J S, Song M Q, Chen G Q, and Chen J L(2006) Comparison of gene expression profiles between primary tumor and metastatic lesions in gastric cancer patients using laser microdissection and cDNA microarray. World J Gastroenterol November;12(43):6949-6954.

Wong, A.K.C., Liu, L L and Yang W (2004) A global optimal algorithm for class-dependent

Zhang Y et al (2009) Bioinformatics analysis of microarray data. Methods Mol. Biol. 573, 259-284