Joseph Gorak

Main Topics

My research primarily involves looking into machine learning and intelligence based systems. This ranges from proposing possible methods for improving a solution to optimizing existing systems. Priumarily, my interests lie in improving renewable energies and food-based systems for a better future. I believe machine learning and data analysis can be utilized to its fullest potential to benefit not only people, but the planet itself. We must respect our planet to ensure healthier crops and better general weather.

Automated Detection of Harmful Substances in Crops

For a computing topics research team, I performed group-based survey research on food safety found in the food supply chain. As this is a topic that has little research focusing on the whole supply chain and how it relates, my team wished to fill in gaps and find relations while proposing solutions to improve safety. This is important as with globalization increasing and networks becoming more interconnected, the importance of keeping food safe and minimizing the spread of disease is highlighted. My main research task was focusing on the beginning of the supply chain, agriculture.

Considering agriculture, it can be broken into both crops and livestock. For crops, the three main areas of concern are soil, water, and the crops themselves. I proposed a methodology of utilizing soil and water sensors alongside cameras to directly monitor farms for harmful substances such as: foodborne pathogens, pesticides, and toxins. The concept is training machine learning models based on different regional data in combination with ensemble-based stacking to improve generalization and standardization for a model. If these models can be trained on different regional data and pick up on the nuances of different farms, it will help with the detection of these harmful substances. At the same time, detection for soil, water, and crop based substances will be independent from one another, allowing for a more flexible implementation.

Poster for presentation

Placing Top Place in Kaggle Competition over WISDM Accelerometer Data

Diagram showing 1D CNN results in training.

In a Machine Learning course, I ended up ranking within the top two positions for the final Kaggle assessment. For the target data, there were both raw accelerometer readings and features extracted from these readings. The goal was to take the target data and produce a prediction over whether it is signifying walking, jogging, moving up and down stairs, standing, or sitting. Two predictions were allowed for the scenario, and it was recommended that there is one traditional machine learning approach and one deep learning approach. I took both sets of data and properly cleaned, imputed, and normalized the sets.

For the feature-based readings, I ran the data through several tests with several iterations each. Random search was utilized to find the best performing parameters. After running the data through RFC, SVM, GNB, and XG Boost, I found that XG Boost performed the best while RFC was shortly following. This assumption lead to the best results obtaining the top score.

For the raw signal data, a 1D CNN model was trained utilized due to it's proficiency in taking time sequence data. I wrote a system that would train and run the model through different sets of shuffled training and testing data, compiling the average results of each to compare the results for hyperparameter tuning. For the provided dataset, the model tended to result in high scores but seemed to result in around 65% accuracy for the kaggle dataset. I was unable to make a model to push pass this point, although, after the deadline and final kaggle data was integrated, the model still performed around this accuracy while the traditional models all dropped across the board for all participants. This showed me that while it did not result in the highest score, it performed with the highest consistency.

Optimizing the Placement of Components in Hybrid Wind-Solar Farms

My first research-based assignment, the topic was as broad as focus on an optimization problem. As my interests lie in sustainability and the environment, I looked into different systems for improving renewable energies. This lead me to renewable hybrid wind-solar farms which can have a variety of configurations. The main idea is to increase energy production while reducing costs. This involves where to place hybrid and solar wind components based on the geography and weather of the region. There is an installation fee as well as maintenance fee involved with each. The methodology I looked into involves using particle swarm optimization algorithms, genetic algorithms, and cuckoo search algorithms. I found that the cuckoo search algorithm performed the best and was the most robust in approach. To further look into the problem, using dynamic yet robust algorithms and include more regional based data could see improvements. It is a challenging concept as each region has different amounts of wind and sunlight.