Training your own MNASNET (or not)

Motivation

We tried implementing the MNASNET network from this paper from scratch — and failed to achieve the claimed accuracy with the methods we used. Hm, probably this is the reason why in Cadene’s repository the NASNET-mobile is just ported from TF? Well, it is good to know that training networks on ImageNet is not as easy as it sounds).

Even though we failed to achieve the claimed 75–76% top1 accuracy on MNASNET, we believe that most likely is is with the way it is supposed (?) to be trained. In our experience with various training regimes, the networks just seemed to have peaked at around 35–40% top accuracy. Also maybe we should have waited longer than 15–30 ImageNET epochs to confirm that the training stalled, but who knows.

The thing is, Google in its paper claims to have used:

  • RMSPROP as optimizer;

But I have never seen people, who used RMSPROP with success. Also talking to a friend, who successfully trained MobileNet 1 and 2, he mostly said that:

  • ADAM and SGD both work with some form of decay;

See his training log for yourself:

What we tried and learned

What we learned:

  • Training heavily optimized networks on ImageNet from scratch is trickier than it sounds;

Some of our best runs

Note that in this case 1 epoch = 0.2 epochs on fill ImageNET.

What to watch out for

The paper was a bit vague on:

  • Using activations;

Reuse our code!

This code may shorten you path if you would like to train this network from scratch. You may reuse two main things:

If you would like to use the dataset, then you would need a pre-processed dataframe with the following columns:

  • class - imagenet class label;

Clusters I used:

self.cluster_dict = { 0: (384,512), 1: (512, 512), 2: (512, 384)}

Clusters can be used to train using rectangular crops instead of squares.
Also obviously, you would need the ImageNET dataset.

If you need our imnet_cluster_df_short.feather - you can just use this file.

Typical launch code

I will not go into detail about building your own environment (please reach out if you need the details or follow this link), but mostly we used:

  • PyTorch 0.4;

Typically I launch my networks with code like this:

CUDA_VISIBLE_DEVICES=0,2 python3 train.py \ --epochs 1000 --epoch_fraction 0.2 \ --batch-size 200 --workers 6 \ --dataset imagenet \ --size_ratio 0.5 --preprocessing_type 5 \ --fold 0 --arch mnasnet --multi_class False \ --num_classes 1000 --classifier_config 512 \ --lr 1e-3 --optimizer adam --lr_regime auto_decay \ --epochs_grow_size 0 \ --lognumber mnasnet_standard_adam_1e3_512_cut_last_nosampler \ --tensorboard True --tensorboard_images True --print-freq 5 \

Final remarks

Probably choosing a highly optimized network designed using some form of RL was not the best idea for a small side project, but it is valuable experience.

Nevertheless, if I was picking this repo as a starter for MNASNET, I would:

  • Review the code at first;

Originally published at spark-in.me on September 13, 2018.

Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store