Introduction

Food stands for any item that serves as a source of nutrition for a living organism. Being omnivores and apex predators, humans eat foods that come in all sorts of shapes, sizes, textures, and flavors [1]. Similarly, humans have vastly different preferences for which foods they’d rather consume. Still, there exists foods that are ubiquitously popular across the world. Ask any person if they are willing to pass up a sweet scoop of ice cream or perhaps a juicy, well-seasoned hot dog.

Generally, humans are quick to identify and fixate upon their favorite foods in their visual field. The Mattijs/snacks dataset is a collection of images of popular food items derived from the Google Open Images dataset, and is accompanied in textbooks such as Machine Learning by Tutorials [2, 4]. Give a human this dataset, and they can effortlessly classify which food is which. However, many people who are visually impaired may struggle with the crucial task of identifying what they are consuming. A program to identify everyday foods, such as apples or cookies, may prove to be useful for such individuals.

Problem Definition

Identifying food items purely from visual appearance is surprisingly non-trivial. Here are examples of “apples” in the snacks dataset:

Even with just one label, “apples” come in different forms, colors, and counts. Our objective is to make a machine learning model that can accurately classify the main type of food/snack apparent in an input image while generalizing to the variety of appearances that such an item can take.

Methods

For unsupervised learning, we propose to use Principal Component Analysis (PCA) to classify the snacks. PCA can reduce the dimensionality of data, allowing models to focus on more important data. We will then feed these results to a deep learning model. As we are performing image classification, we propose to use a Convolutional Neural Network (CNN) as a classifier. CNNs excel when dealing with images due to their convolutional layer, enabling efficient feature extraction and pattern recognition. In order to solve the vanishing gradient problem of deep neural networks, we will use a Residual Network [3]. These networks use skip connections to create models that have more layers without the vanishing gradient problem.

Potential Results & Conclusion

The potential result includes an efficient and accurate machine learning model to predict various snack items. The model can then be applied in various ways, such as an automatic food tracking application. In order to evaluate our model, we will focus primarily on F1 score and accuracy. F1 score will tell us how the model performs in regard to false positives and negatives, while accuracy will represent the number of correct predictions our model makes based on the input images. We will also use cross entropy loss to measure the performance of our model during training.

Checkpoints

  1. Midterm Report Checkpoint (November 10th): By this date, we will have decided on whether our project is a proper ML project. If required, we will switch to a different dataset.

  2. Final Report Checkpoint (December 1st): By this date, the goal is to have all finalized working models.

Timeline Chart

Or view the spreadsheet here

Contributions for Project Proposal

Member Contributions
Alex Tian Slides, Video Recording
Alwin Jin Methods, Results & Discussion
Daniel You Slides, Video Recording
Richard So Github Page, Intro & Background, Problem Definition
Varshini Chinta Project Timeline, Contribution Table, Checkpoint

References

[1] Roopnarine, P. D. (2014). Humans are apex predators. Proceedings of the National Academy of Sciences, 111(9), E796-E796.

[2] https://huggingface.co/datasets/Matthijs/snacks

[3] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).

[4] A. Kuznetsova, H. Rom, N. Alldrin, J. Uijlings, I. Krasin, J. Pont-Tuset, S. Kamali, S. Popov, M. Malloci, A. Kolesnikov, T. Duerig, and V. Ferrari. The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale. IJCV, 2020.