59-05-032 Proceeding

78 Proceedings of the Princess Maha Chakri Sirindhorn Congress borrowing biological concepts to model how humans perceive objects, and system architecture from engineering aspects. Due to the significant performance of deep learning based methods and the promising potential in the fashion industry, the Rapid-RichObject Search (ROSE) Lab has dedicated resources to address fashion related product search problems, like handbag recognition, clothing retrieval, shoe tagging, etc. II. Hand-Crafted Features vs Deep Learning Features Traditional pipeline for the fashion product search uses the hand-crafted features likeHOG, SIFT to represent the characteristics of the objects. Even though they have demonstrated their powerfulness of representation capability on a variety of visual search tasks, they are still unable to carry high-level concepts of objects. To address this problem, in recent years, researchers proposed a neural network based deep learning architecture namedConvolutional Neural Networks (CNNs). The architecture [10] of classical deep learning based approaches contains eight layers inclusive of five convolutional and three fully-connected layers. The leftmost is an input image, and after traversing through all the eight layers, the network predicts it into one of the 1000 categories according to the ILSVRC settings. We adapt the number of categories according to the specific visual search tasks we are faced with. In this way, the architecture could be well fitted to the different fashion product visual search tasks. III. Deep Learning Resources There aremany existing tools which can facilitateGPUaccelerated deep learning, including Caffe [6], Cuda-convnet [12], MatConvNet [7] and Nvidia Digits [8]. Caffe is a c++ deep learning framework capable of custom network definition using Google protocol buffers. Cuda-convnet is a c++ -based project that allows the definition of deep neural network architectures using configuration files. MatConvNet is an implementation of Convolutional Neural Networks (CNNs) forMATLAB, which is designed with an emphasis on simplicity and extendibility. Nvidia Digits is a software based onCaffe, which provides an interface for data preparation, network configuration and visualization of training process. These tools have their own pros and cons. All of them allow GPU accelerated deep learning, which is much faster than using CPUs alone. Cuda-convnet and caffe have a relatively higher speed compared with MatConvNet, while MatConvNet is the most simple and easy tool for customized deep learning. Nvidia’s next generation GPUs have significant architectural improvements to facilitate deep learning. Nvidia’s next GPU architecture Pascal will support up to 32 GB of high bandwidth memory. With an increase in memory size, support for FP16 instructions and higher memory bandwidth, there will be significant performance improvements in deep learningmodels in future Nvidia GPUs. This performance will increase further by splitting work across multiple GPUs and having a dense system where GPUs are connected by NVLINK, Nvidia’s new point-to-point