Thousands of animals are present in shelter homes and much more are present on the streets. In order to stop things like cruelty and euthanization of these animals, we need to increase animal adoption rates. Animals with cute pictures are more likely to get adopted. Shelters need a way to estimate and increase “cuteness” of photos of these animals to get them adobted faster. The goal of our project is to use machine learning to make accurate predictions of “cuteness” and increase animal adoption rates from the shelters.
For CS7641, our project aims at estimating the cuteness/popularity of images of shelter animals. This is an open kaggle challenge. The dataset contains raw images of shelter animals along with metadata. The metadata consists of a set of binary features like presence of eyes, face, etc.
In this project, we use both supervised learning and unsupervised learning to estimate popularity/cuteness of images. In particular, we use representation learning to learn features from raw images along with PCA to select prominent features from the metadata. Finally, we plan on demonstrating the effectiveness of our solution by plotting training and validation loss along with an ablation study.
For each image in the trainng set, we also have a set of metadata available. The metadata contains information regarding the following binary feature: Focus, Eyes, Face, Near, Action, Accessory, Group, Collage, Human, Occlusion, Info, Blur
We have a training data set of close to\(10,000\) RGB images.
Each image has a pawpularity score as shown in [Fig 1]
The metadata was split into a 80-20 share for the purposes of training and testing the data. All data reported are for the test set.
Without PCA: We first ran Linear Regression on the metadata. However, the \(R^2\) score of the regressor turned out to be very poor at only \(0.003\). This meant that the variation in the input features did not explain the variation in target. Additionally, the RMSE score was \(20.4944\).
With PCA [Unsupervised Learning]: We next ran PCA on the meta-data with an intention to retain \(90%\) of the variance in data. We then again ran the transformed features against the target variables. This reduced the \(R^2\) of the model to an even lower value of \(0.0001\).
While manually inspecting the data we found that the dataset has a lot of noise.
Individually looking at the photos, we noticed that the popularity score did not always tally
with the cuteness/quality of the animal. In addition to this, we also noticed that there are several
duplicate images with different popularity scores in the dataset.
Given this new found knowledge, we tried multiple ways to wrote a small script to extract the
duplicate images by cosine similarity between the pairs of images. We chose to flatten the image and then
found the similarity between two images using the formula: \(a.b / |a||b|\)
The images below show a sample of the duplicate images with contradictory pawpularity scores that we were able
to find.
With this information we chose to exclude these images from our training and test set.
We used deep neural networks to perform regression on the image data. The final modelstructure is as below:
[Fig 5] Training loss change with epochs