Molly Haley

Expense Tracker Website

I created an expense tracker website for my capstone project. The inspiration from this came from my own struggles with tracking my expenses, and I thought it would be neat to create a tool that I could use in the future. This website was created in VSCode and implements a full MERN stack utilizing MongoDB, Express, ReactJS, and Node_modules. My presentation will consist of an explanation of what it means to implement MERN stack, along with explanations of my database and front end developing process. I will then complete the presentation by showing how my webpage works. 

SFTE 499, Senior Capstone

Ernest Bonat

L204

10 – 10:30 AM

Return to schedule

Ian Woodcock

Synthetic Generation of Genomic Datasets using Synthetic Data Vault

Many wonder what the mysterious world of coding can allow you to do. The first things that come to mind are software UI (User Interface) or UX (User Experience), maybe game development, and many other things out there. But there is one field that may seem to be hidden from the world. All are found in some virtual underground dungeon. No, I am not taking you to the dark web. I am talking about data analysis and machine learning. Python is the best programming language that allows you to manipulate Excel datasets. From containing personal information of customers to numbers of statistics of a store and their items. We can use those kinds of datasets and use one to program it to run through an algorithm to give us simply a score. In this, we will be dealing with DNA genomic datatsets and we will put it through an algorithm that creates synthetic genomic data. The score will specifically focus on the broadness to unuiqueness of the type of genomic data in the original dataset and the new dataset.

SFTE 499, Senior Capstone

Ernest Bonat

Richardson 100

10 – 10:30 AM

Return to schedule

David Schwartz

Apply Machine Learning Convolutional Neural Network for Classification Genomic Datasets

In my presentation I will show how we can apply a convolutional neural network to classify genomic data. I will discuss CNN’s and how they work along with showing the application to genomic datasets.

SFTE 445 – Introduction to Machine Learning and AI

Dr. Ernest Bonat

3:30pm – L204

Ian Woodcock

Using Machine Learning Autoencoders Neural Networks for Dimensional Reduction of Genomic Datasets

“Machine learning is a well-known field in today’s society when it comes to technology. The first thing that probably comes to your mind is AI, which is a fair assumption, and I do have to admit, yes, machine learning and AI are related in some ways. But this is not involved with robotics similar to what you see in movies or science fairs, or robotics classes. It’s done more so in programming and is ever more present in the world of coding and software engineering. If you had a noisy image, a machine could make it less noisy on the image! If you had a file that was quite big in data size and you wanted to compress it down, a machine could do that! But what if I told you, you could condense down large dataset files, and the files took up so much of your storage on a small thumb drive, and it took forever for a machine to learn and output results? Well, now you can! This is how we use a neural network called an autoencoder and use the dimensionality reduction method to condense our data in Genomics!”

SFTE 445 – Introduction to Machine Learning and AI

Dr. Ernest Bonat

4pm – L204

Shijo John

Creating Synthetic DNA Sequences to improve Deep Learning Network’s accuracy of prediction

Advances in DNA sequencing technologies have led to the generation of vast amounts of genomic data that scientists could use to create specialized drugs and even predict disease with minimally invasive techniques. However, processing this data is still a challenging task due to its high dimensionality, complexity, and noise. In order to achieve high accuracy, deep learning models require well-preprocessed and normalized data. In many cases, there won’t be enough training and validation data, lack of data cleaning and encoding requirements, and the presence of imbalanced labeled data – these specifically make it difficult for us to apply ML for DNA sequence datasets.

These problems can be fixed by generating synthetic DNA sequence data. This presentation proposes an Extract-Transform-Load (ETL) data pipeline process to solve the above problems. It applies DNA sequence string cleaning and validation, label encoding, and the Synthetic Minority Over-sampling Technique algorithm (SMOTE). Our results show that the proposed preprocessing method significantly improves the accuracy of the deep learning models. This study highlights the importance of preprocessing DNA sequences to achieve accurate predictions and provides a valuable resource for researchers working with genomic data and deep learning networks.

SFTE 445 – Introduction to Machine Learning and AI

Dr. Ernest Bonat

3pm – L204