{"id":1834,"date":"2023-04-27T23:03:47","date_gmt":"2023-04-27T23:03:47","guid":{"rendered":"https:\/\/aceday.bushnell.edu\/?p=1834"},"modified":"2025-05-29T16:47:13","modified_gmt":"2025-05-29T16:47:13","slug":"shijo-john","status":"publish","type":"post","link":"https:\/\/aceday.bushnell.edu\/?p=1834","title":{"rendered":"Shijo John"},"content":{"rendered":"\n<p><strong>Creating Synthetic DNA Sequences to improve Deep Learning Network\u2019s accuracy of prediction<\/strong><\/p>\n\n\n\n<p>Advances in DNA sequencing technologies have led to the generation of vast amounts of genomic data that scientists could use to create specialized drugs and even predict disease with minimally invasive techniques. However, processing this data is still a challenging task due to its high dimensionality, complexity, and noise. In order to achieve high accuracy, deep learning models require well-preprocessed and normalized data. In many cases, there won\u2019t be enough training and validation data, lack of data cleaning and encoding requirements, and the presence of imbalanced labeled data \u2013 these specifically make it difficult for us to apply ML for DNA sequence datasets. <\/p>\n\n\n\n<p>These problems can be fixed by generating synthetic DNA sequence data. This presentation proposes an Extract-Transform-Load (ETL) data pipeline process to solve the above problems. It applies DNA sequence string cleaning and validation, label encoding, and the Synthetic Minority Over-sampling Technique algorithm (SMOTE). Our results show that the proposed preprocessing method significantly improves the accuracy of the deep learning models. This study highlights the importance of preprocessing DNA sequences to achieve accurate predictions and provides a valuable resource for researchers working with genomic data and deep learning networks.<\/p>\n\n\n\n<p><strong>SFTE 445 &#8211; Introduction to Machine Learning and AI<\/strong><\/p>\n\n\n\n<p>Dr. Ernest Bonat<\/p>\n\n\n\n<p><strong>3pm &#8211; L204<\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Creating Synthetic DNA Sequences to improve Deep Learning Network\u2019s accuracy of prediction Advances in DNA sequencing technologies have led to the generation of vast amounts of genomic data that scientists could use to create specialized drugs and even predict disease with minimally invasive techniques. However, processing this data is still a challenging task due to &hellip; <a href=\"https:\/\/aceday.bushnell.edu\/?p=1834\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Shijo John&#8221;<\/span><\/a><\/p>\n","protected":false},"author":6,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[416],"tags":[1064,1180,424,1246],"class_list":["post-1834","post","type-post","status-publish","format-standard","hentry","category-spring-2023","tag-bonat-e","tag-john-s","tag-sfte","tag-sfte-445"],"_links":{"self":[{"href":"https:\/\/aceday.bushnell.edu\/index.php?rest_route=\/wp\/v2\/posts\/1834","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/aceday.bushnell.edu\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aceday.bushnell.edu\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aceday.bushnell.edu\/index.php?rest_route=\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/aceday.bushnell.edu\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1834"}],"version-history":[{"count":3,"href":"https:\/\/aceday.bushnell.edu\/index.php?rest_route=\/wp\/v2\/posts\/1834\/revisions"}],"predecessor-version":[{"id":1963,"href":"https:\/\/aceday.bushnell.edu\/index.php?rest_route=\/wp\/v2\/posts\/1834\/revisions\/1963"}],"wp:attachment":[{"href":"https:\/\/aceday.bushnell.edu\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1834"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aceday.bushnell.edu\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1834"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aceday.bushnell.edu\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1834"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}