June Feather
Caffe 官方demo训练学习-CIFAR-10
Caffe 官方demo训练学习-CIFAR-10

Caffe 官方demo训练学习-CIFAR-10

按照git中最简单的CIFAR-10

操作过程

进入caffe根目录 下载数据集并建立模型

cd $CAFFE_ROOT
./data/cifar10/get_cifar10.sh
./examples/cifar10/create_cifar10.sh

训练

cd $CAFFE_ROOT
    ./examples/cifar10/train_quick.sh

训练完成后如下

I0802 03:32:39.226095 42789 solver.cpp:243] Iteration 4900, loss = 0.416568
I0802 03:32:39.226122 42789 solver.cpp:259]     Train net output #0: loss = 0.416568 (* 1 = 0.416568 loss)
I0802 03:32:39.226132 42789 sgd_solver.cpp:138] Iteration 4900, lr = 0.0001
I0802 03:32:41.953788 42789 solver.cpp:606] Snapshotting to HDF5 file examples/cifar10/cifar10_quick_iter_5000.caffemodel.h5
I0802 03:32:41.966697 42789 sgd_solver.cpp:317] Snapshotting solver state to HDF5 file examples/cifar10/cifar10_quick_iter_5000.solverstate.h5
I0802 03:32:41.977524 42789 solver.cpp:332] Iteration 5000, loss = 0.544424
I0802 03:32:41.977547 42789 solver.cpp:358] Iteration 5000, Testing net (#0)
I0802 03:32:42.957173 42789 solver.cpp:425]     Test net output #0: accuracy = 0.7536
I0802 03:32:42.957247 42789 solver.cpp:425]     Test net output #1: loss = 0.736249 (* 1 = 0.736249 loss)
I0802 03:32:42.957262 42789 solver.cpp:337] Optimization Done.
I0802 03:32:42.957274 42789 caffe.cpp:254] Optimization Done.

可以看出测试集的准确度为 75%

原理
demo执行完了 我们来看下实际caffe到底是如何训练和测试的

建立训练集./examples/cifar10/create_cifar10.sh
caffe 的训练一般使用lmdb格式的文件

lmdb 是一种key value的数据库
看一下demo 的转换过程

#!/usr/bin/env sh
# This script converts the cifar data into leveldb format.
set -e

EXAMPLE=examples/cifar10
DATA=data/cifar10
DBTYPE=lmdb

echo "Creating $DBTYPE..."

rm -rf $EXAMPLE/cifar10_train_$DBTYPE $EXAMPLE/cifar10_test_$DBTYPE

./build/examples/cifar10/convert_cifar_data.bin $DATA $EXAMPLE $DBTYPE

echo "Computing image mean..."

./build/tools/compute_image_mean -backend=$DBTYPE \
  $EXAMPLE/cifar10_train_$DBTYPE $EXAMPLE/mean.binaryproto

echo "Done."

转换程序

解析

程序其实完成了如下的功能
程序的核心在 convert_dataset 函数内

const int kCIFARSize = 32;
const int kCIFARImageNBytes = 3072;
const int kCIFARBatchSize = 10000;
const int kCIFARTrainBatches = 5;

读取文件获取颜色 略

void read_image(std::ifstream* file, int* label, char* buffer) {
  char label_char;
  file->read(&label_char, 1);
  *label = label_char;
  file->read(buffer, kCIFARImageNBytes);
  return;
}


核心转换函数

void convert_dataset(const string& input_folder, const string& output_folder,
const string& db_type) {

载入lmdb类

scoped_ptr<db::DB> train_db(db::GetDB(db_type));
  train_db->Open(output_folder + "/cifar10_train_" + db_type, db::NEW);
  scoped_ptr<db::Transaction> txn(train_db->NewTransaction());

新建一贯datum 存储图像数据 长宽分别是kCIFARSize 通道数为 3 RGB

  // Data buffer
  int label;
  char str_buffer[kCIFARImageNBytes];
  Datum datum;
  datum.set_channels(3);
  datum.set_height(kCIFARSize);
  datum.set_width(kCIFARSize);

下面的程序依次为
载入原始数据 分为5个 部分(kCIFARTrainBatches)

每个部分依次循环载入图片
读取图片
写datum
写入txn记录
提交变更
关闭数据库

LOG(INFO) << "Writing Training data"; for (int fileid = 0; fileid < kCIFARTrainBatches; ++fileid) { // Open files LOG(INFO) << "Training Batch " << fileid + 1; string batchFileName = input_folder + "/data_batch_" + caffe::format_int(fileid+1) + ".bin"; std::ifstream data_file(batchFileName.c_str(), std::ios::in | std::ios::binary); CHECK(data_file) << "Unable to open train file #" << fileid + 1; for (int itemid = 0; itemid < kCIFARBatchSize; ++itemid) { read_image(&data_file, &label, str_buffer); datum.set_label(label); datum.set_data(str_buffer, kCIFARImageNBytes); string out; CHECK(datum.SerializeToString(&out)); txn->Put(caffe::format_int(fileid * kCIFARBatchSize + itemid, 5), out);
}
}
txn->Commit();
train_db->Close();

测试集也是和上面一样的流程

  LOG(INFO) << "Writing Testing data";
  scoped_ptr<db::DB> test_db(db::GetDB(db_type));
  test_db->Open(output_folder + "/cifar10_test_" + db_type, db::NEW);
  txn.reset(test_db->NewTransaction());
  // Open files
  std::ifstream data_file((input_folder + "/test_batch.bin").c_str(),
      std::ios::in | std::ios::binary);
  CHECK(data_file) << "Unable to open test file.";
  for (int itemid = 0; itemid < kCIFARBatchSize; ++itemid) {
    read_image(&data_file, &label, str_buffer);
    datum.set_label(label);
    datum.set_data(str_buffer, kCIFARImageNBytes);
    string out;
    CHECK(datum.SerializeToString(&out));
    txn->Put(caffe::format_int(itemid, 5), out);
  }
  txn->Commit();
  test_db->Close();
}

int main(int argc, char** argv) {
  FLAGS_alsologtostderr = 1;

  if (argc != 4) {
    printf("This script converts the CIFAR dataset to the leveldb format used\n"
           "by caffe to perform classification.\n"
           "Usage:\n"
           "    convert_cifar_data input_folder output_folder db_type\n"
           "Where the input folder should contain the binary batch files.\n"
           "The CIFAR dataset could be downloaded at\n"
           "    http://www.cs.toronto.edu/~kriz/cifar.html\n"
           "You should gunzip them after downloading.\n");
  } else {
    google::InitGoogleLogging(argv[0]);
    convert_dataset(string(argv[1]), string(argv[2]), string(argv[3]));
  }
  return 0;
}

训练过程
执行脚本如下
可以看到脚本分为2个部分
调用caffe train 载入 cifar10_quick_solver.prototxt

调用caffe train 载入 cifar10_quick_solver_lr1.prototxt 并加载前次的solverstate

# reduce the learning rate after 8 epochs (4000 iters) by a factor of 10

# The train/test net protocol buffer definition
net: "examples/cifar10/cifar10_quick_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.001
momentum: 0.9
weight_decay: 0.004
# The learning rate policy
lr_policy: "fixed"
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 4000
# snapshot intermediate results
snapshot: 4000
snapshot_prefix: "examples/cifar10/cifar10_quick"
# solver mode: CPU or GPU
solver_mode: GPU

cifar10_quick_solver_lr1.prototxt

# reduce the learning rate after 8 epochs (4000 iters) by a factor of 10

# The train/test net protocol buffer definition
net: "examples/cifar10/cifar10_quick_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.0001
momentum: 0.9
weight_decay: 0.004
# The learning rate policy
lr_policy: "fixed"
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 5000
# snapshot intermediate results
snapshot: 5000
snapshot_format: HDF5
snapshot_prefix: "examples/cifar10/cifar10_quick"
# solver mode: CPU or GPU
solver_mode: GPU

可以看到差异主要是在 4个地方
学习率 0.001 -> 0.0001
迭代数 4000 -> 5000
还有一个不太明白的
snapshot_format: HDF5

可以看到快速训练
是使用 0.001的学习率 先迭代4000次 在用0.0001的学习率 迭代5000次
至此 训练的demo已经完成

附:官方文档
title: CIFAR-10 tutorial
category: example
description: Train and test Caffe on CIFAR-10 data.
include_in_docs: true
priority: 5
Alex’s CIFAR-10 tutorial, Caffe style
Alex Krizhevsky’s cuda-convnet details the model definitions, parameters, and training procedure for good performance on CIFAR-10. This example reproduces his results in Caffe.

We will assume that you have Caffe successfully compiled. If not, please refer to the Installation page. In this tutorial, we will assume that your caffe installation is located at CAFFE_ROOT.

We thank @chyojn for the pull request that defined the model schemas and solver configurations.

This example is a work-in-progress. It would be nice to further explain details of the network and training choices and benchmark the full training.

Prepare the Dataset
You will first need to download and convert the data format from the CIFAR-10 website. To do this, simply run the following commands:

cd $CAFFE_ROOT
./data/cifar10/get_cifar10.sh
./examples/cifar10/create_cifar10.sh
1
2
3
If it complains that wget or gunzip are not installed, you need to install them respectively. After running the script there should be the dataset, ./cifar10-leveldb, and the data set image mean ./mean.binaryproto.

The Model
The CIFAR-10 model is a CNN that composes layers of convolution, pooling, rectified linear unit (ReLU) nonlinearities, and local contrast normalization with a linear classifier on top of it all. We have defined the model in the CAFFE_ROOT/examples/cifar10 directory’s cifar10_quick_train_test.prototxt.

Training and Testing the “Quick” Model
Training the model is simple after you have written the network definition protobuf and solver protobuf files (refer to MNIST Tutorial). Simply run train_quick.sh, or the following command directly:

cd $CAFFE_ROOT
./examples/cifar10/train_quick.sh
1
2
train_quick.sh is a simple script, so have a look inside. The main tool for training is caffe with the train action, and the solver protobuf text file as its argument.

When you run the code, you will see a lot of messages flying by like this:

I0317 21:52:48.945710 2008298256 net.cpp:74] Creating Layer conv1
I0317 21:52:48.945716 2008298256 net.cpp:84] conv1 <- data I0317 21:52:48.945725 2008298256 net.cpp:110] conv1 -> conv1
I0317 21:52:49.298691 2008298256 net.cpp:125] Top shape: 100 32 32 32 (3276800)
I0317 21:52:49.298719 2008298256 net.cpp:151] conv1 needs backward computation.
1
2
3
4
5
These messages tell you the details about each layer, its connections and its output shape, which may be helpful in debugging. After the initialization, the training will start:

I0317 21:52:49.309370 2008298256 net.cpp:166] Network initialization done.
I0317 21:52:49.309376 2008298256 net.cpp:167] Memory required for Data 23790808
I0317 21:52:49.309422 2008298256 solver.cpp:36] Solver scaffolding done.
I0317 21:52:49.309447 2008298256 solver.cpp:47] Solving CIFAR10_quick_train
1
2
3
4
Based on the solver setting, we will print the training loss function every 100 iterations, and test the network every 500 iterations. You will see messages like this:

I0317 21:53:12.179772 2008298256 solver.cpp:208] Iteration 100, lr = 0.001
I0317 21:53:12.185698 2008298256 solver.cpp:65] Iteration 100, loss = 1.73643

I0317 21:54:41.150030 2008298256 solver.cpp:87] Iteration 500, Testing net
I0317 21:54:47.129461 2008298256 solver.cpp:114] Test score #0: 0.5504
I0317 21:54:47.129500 2008298256 solver.cpp:114] Test score #1: 1.27805
1
2
3
4
5
6
For each training iteration, lr is the learning rate of that iteration, and loss is the training function. For the output of the testing phase, score 0 is the accuracy, and score 1 is the testing loss function.

And after making yourself a cup of coffee, you are done!

I0317 22:12:19.666914 2008298256 solver.cpp:87] Iteration 5000, Testing net
I0317 22:12:25.580330 2008298256 solver.cpp:114] Test score #0: 0.7533
I0317 22:12:25.580379 2008298256 solver.cpp:114] Test score #1: 0.739837
I0317 22:12:25.587262 2008298256 solver.cpp:130] Snapshotting to cifar10_quick_iter_5000
I0317 22:12:25.590215 2008298256 solver.cpp:137] Snapshotting solver state to cifar10_quick_iter_5000.solverstate
I0317 22:12:25.592813 2008298256 solver.cpp:81] Optimization Done.
1
2
3
4
5
6
Our model achieved ~75% test accuracy. The model parameters are stored in binary protobuf format in

cifar10_quick_iter_5000
1
which is ready-to-deploy in CPU or GPU mode! Refer to the CAFFE_ROOT/examples/cifar10/cifar10_quick.prototxt for the deployment model definition that can be called on new data.

Why train on a GPU?
CIFAR-10, while still small, has enough data to make GPU training attractive.

To compare CPU vs. GPU training speed, simply change one line in all the cifar*solver.prototxt:

solver mode: CPU or GPU

solver_mode: CPU
1
2
and you will be using CPU for training.

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注