perfectism's blog

物来顺应,未来不迎,当时不杂,既往不恋

0%

Detection of Secondary Driving Tasks

Detection of Secondary Driving Tasks

MIT—AVT

王飞跃等的相关工作

1.End-to-End Driving Activities and Secondary Tasks Recognition Using Deep Convolutional Neural Network and Transfer Learning

  • Specifically, seven common driving activities are identified, which are normal driving, right mirror checking, rear mirror checking, left mirror checking, using in-vehicle video device, texting, and answering mobile phone. Among these, the first four activities are regarded as normal driving tasks, while the rest three are divided into distraction group.
  • using a Gaussian mixture model (GMM) to extract the driver region from the background
  • AlexNet:directly takes the processed RGB images as the input and outputs the identified label.
  • to reduce the training cost, the transfer learning mechanism is applied
  • An average of 79% detection accurac

2.Identification and Analysis of Driver Postures for In-Vehicle Driving Activities and Secondary Tasks Recognition

  • the importance of these features to behaviour recognition is evaluated using Random Forests (RF) andMaximal Information Coefficient (MIC) methods.
  • Feedforward Neural Network (FFNN) is used to identify the seven tasks
  • image-20191204225003730

3.

image-20191205095652323

4.Non-Driving Activities Recognition using two-stream convolutional neural network and FlowNet 2.0

datasets

AUC Distracted Driver Dataset

原文见《Real-time Distracted Driver Posture Classification》

State Farm Distracted Driver Detection

Drive&Act

MDAD: A Multimodal and Multiview in-Vehicle Driver Action Dataset

R-DA

原文见:《Feature refinement for image-based driver action recognition via multi-scale attention convolutional neural network》

ideas

skeleton

  1. Skeleton-Based Action Recognition(基于骨骼关键点)
  2. openpose for skeleton for 1

直接分类

  1. 训练好图片分类器后,使用对指定数目的帧取平均的办法对视频分类

将传统图像处理的办法与深度学习相结合

图像的滤波、分裂、归并、分割以及形态学处理

传统的SVM,KNN仍有很强的应用前景

语义分割,然后定位(分离驾驶员和背景再输入网络)

全卷积,解决驾驶员有多个第二驾驶员任务时

利用人体骨骼做图像预处理

在模型的训练过程中,并没有对图像进行裁切,事实上很多图像元素是不需要的,我们只需要关注驾驶员的肢体及面部等特征,并不需要去过多关注车身等部分特征,一定程度上可提高训练效果。
通过基于人体骨骼点识别,根据人体骨骼点的完整程度,识别司机肢体,再进行凸包形态学处理,从而把人体相关的部分图片提取出来,完成剪切操作;把图片用白色填充成一个方形图片,此时,图片大小、特征减少,从而使图片可以正常用于训练。

每一帧图片增加两帧光流,再对连续几帧数据的CNN结果融合

Goal

降低错检率和漏检率

inception v3

xception

163/163 [==============================] - 4334s 27s/step - loss: 0.2972 - acc: 0.9080 - val_loss: 2.9806 - val_acc: 0.3750 Epoch 2/4

出现每一epoch,训练时的准确率很高,测试时却很低的情况

如何解决断点续训的问题,每一epoch都保存一次模型,下一次load?

NameError: name ‘backend’ is not defined

http://studyai.com/article/d3616cea

https://github.com/keras-team/keras/issues/5088

https://blog.csdn.net/Mr_green_bean/article/details/94575883

调用flow_from_directory()出现“Found 0 images belonging to 2 classes”问题

https://blog.csdn.net/space_dandy/article/details/88431421

创建一个子文件夹

StopIteration: cannot identify image file ‘./test/test/img_15308.jpg’

TODO

image classification

  • [x] Resnet 50 M1 NOT CAM
  • [x] Xception M11 CAM
  • [x] Xception M9 CAM,两个预测
  • [x] Xception M12 CAM,目前效果最好
  • [x] Xception M10 CAM,目前效果最差
  • [x] InceptionResNetV2 M13 ,not CAM
  • [x] M9+M11+M1+M13,效果待定
  • [x] save model
  • [x] cam visualization
  • [x] 统计不同模型对不同类别的检测准确度,Confusion matrix
  • [x] skeleton preprocessing,效果究竟如何?(骨架)
  • [ ] 使用语义分割分割驾驶员和背景再输入网络,提取头部、手部、脸部、特征
  • [x] 综合 Xception、InceptionV3和InceptionResNetV2
  • [ ] NASNetLarge
  • [x] InceptionResNetV2
  • [ ] Faster R-CNN
  • [x] mobilenet NASNETmobile轻量模型的尝试
  • [x] auto ML,已使用百度的DL
  • [x] 完善类激活图
  • [ ] 晚上可能要用红外摄像头
  • [x] 融合模型的方法:特征Concatenate ,每帧结果直接取平均(去掉最大和最小),SVM,遗传算法求加权平均,关键是确定可学习和不可学习参数
  • [ ] K-近邻算法投票(相邻几帧(滑窗0投票),时间对最终结果的影响
  • [ ] 权重可学习
  • [ ] 二阶梯度最优化器,SGD,Nadam
  • [ ] 使用元学习的方法,重新采样构建每个batch的support set和query set
  • [ ] 图卷积神经网络
  • [ ] 使用类别不平衡思想,增强数据
  • [x] 在图片上施加噪声,模拟光照变化,头发遮挡,帽子等
  • [x] 动态视频检测,体现实时性,时间变化和检测结果同时显现
  • [x] 每个epoch打乱,提高训练稳定性
  • [x] 加隐藏层,全连接单元数量下降慢点,可在论文中对比
  • [x] 与原论文结果进行对比
  • [x] 找到batch_size不能取更大的原因,如workers,采用128,256等更大的batch_size
  • [ ] 填周志
  • [x] 比较冻结与放开全部层数的结果 V1 Xception、InceptionV3、 V2 InceptionV3
  • [x] 重新训练 AUC V2的模型
  • [x] 使用batch_size减小继续训练AUC V1模型
  • [x] 在AUC V2上重新启用Resnet50 主要是改变图片的预处理方式,每个通道分别减去训练集该通道平均值
  • [x] 使用sklearn记录模型融合时的损失值,底层求准确率
  • [ ] 解决自己录制的视频中:V1版本模型右手打字、喝水、操作收音机三个类别识别准确率低,V2版本模型右手打字准确率低,喝水、整理头发和化妆易混淆:1.手动标定,加入训练集中训练(掩耳盗铃) 2.拍摄与数据集摄像头角度相似的数据 3.V2版本的模型泛化能力更好,在此基础上实验滑窗
  • [x] V1的loss和time,V2的loss
  • [x] 整理state-farm数据集结果,并使用其预测采集的视频
  • [x] 在camera2的demo视频上尝试视频滑窗预测,看在低分类准确率时会不会有提升
  • [x] 找到三个版本的数据集中摄像头的安装位置信息
  • [ ] 使用模型压缩和量化,丢帧处理等方式提高实时性
  • [ ] 标定好自己采集的视频,还需要采集强曝光、晚上的视频,并进行预测
  • [x] 尝试openpose获取的骨架输入CNN,并整理分类效果
  • [x] 开始写论文
  • [x] 整理好4.7和4.9的123视频滑窗预测效果
  • [x] 中期答辩ppt
  • [x] 整理中期答辩文件
  • [x] 模拟答辩一次,确认所有文件
  • [x] 理解交叉熵
  • [x] 实时性对比需要明确硬件
  • [ ] 小数点不要太多
  • [ ] 扩充数据集
  • [x] 可视化CNN每层提取的特征(滤波器)
  • [ ] 了解召回率
  • [ ] 使用增强后的图片做预测,测试模型的鲁棒性
  • [ ] 每个模型的权重占比与模糊数学的关系
  • [ ] 模型融合 stacking
  • [x] V1上InceptionV3和Xception 特征堆叠
  • [x] 提取关键帧,目前关键帧提取时间过长
  • [ ] 图片相似比较 ,哈希算法,SIFT,SURF特征
  • [ ] 使用卷积层代替最终的全连接层 错误:ValueError: Error when checking target: expected last_conv to have 4 dimensions, but got array with shape (128, 10)
  • [x] 检查和修改论文
  • [x] 图纸
  • [ ] 创新点
  • [ ] 提升分类准确率
  • [x] FCN
  • [x] CAM可视化 描述的更清楚点
  • [x] 丢帧处理 目的是提升实时性能
  • [x] 章与章之间要有过渡语句,章开始用将来时,结束用完成时
  • [x] CAM可视化 云图的颜色表达
  • [x] CAM 可视化视频制作
  • [x] 没有丢帧处理前的视频整理
  • [x] ppt制作
  • [ ] ppt讲解问题:
  • [ ] 优化器中的偏差校正作用
  • [ ] 固定极小常量的作用

video classification

  • [x] video input and predict video output
  • [x] simple model to classify video ,single frame in and out
  • [x] video dataset,Drive&Act
  • [ ] CNN Based Spatio-Temporal Approach,append flow frame到RGB,input Xception,然后取视频里的四帧进行预测平均,Motion Fused Frames (MFFs),model is ready
  • [ ] 融合特征,输入softmax或者SVM,需修改video_model
  • [ ] two-stream
  • [ ] I3D,model is ready, flip aug
  • [ ] C3D
  • [ ] P3D Resnet
  • [ ] CNN+LSTM model to classify video
  • [ ] AUTO ML?

model trained logs

M1 2epochs

M11 6epochs

M9 4epochs

video_model 5epochs

video_model_change_vd 8epochs

video_model_change_vd_aug 3epochs 不是一个动作的两帧!网络模型,第一个conv2d提取的特征不够

flow_xception 3epochs

flow_xception_lock70 2epochs

I3D 数据存在问题

尝试使用交叉验证

修改ini_lr,batch_size等超参数

mffs_model_dropout_more_shift_modified 5epochs

Websites

分心驾驶危害:

https://www.nhtsa.gov/risky-driving/distracted-driving/

Keras中的模型搭建和配置:

https://keras.io/zh/getting-started/sequential-model-guide/#lstm

kaggle比赛经验:

https://zhuanlan.zhihu.com/p/26309073

https://cloud.tencent.com/developer/article/1556210

https://towardsdatascience.com/distracted-driver-detection-using-deep-learning-e893715e02a4

https://mc.ai/distracted-driver-detection/

驾驶员行为检测:

https://xugang.ink/wiki/2018-09-14-%E9%A9%BE%E9%A9%B6%E5%91%98%E8%A1%8C%E4%B8%BA%E6%A3%80%E6%B5%8B/

人体姿态估计:

http://www.ilovepose.cn/tags

视频分类:

https://www.pyimagesearch.com/2019/07/15/video-classification-with-keras-and-deep-learning/

cnn+lstm:https://github.com/apachecn/ml-mastery-zh/blob/master/docs/lstm/cnn-long-short-term-memory-networks.md

杂:

https://www.youtube.com/watch?time_continue=1988&v=dwukLOPpaHw&feature=emb_title

现场演示:

https://github.com/luisarojas/distracted-driver-detection

wget远程下载数据集

https://blog.csdn.net/yanlisuo/article/details/81329920

keras函数式编程:

https://keras.io/getting-started/functional-api-guide/

https://blog.csdn.net/ting0922/article/details/94437540

减少过拟合的方法:

http://iot-fans.xyz/2017/11/20/deeplearning/overfitting/

https://www.zhihu.com/question/59201590

https://kaiyuanzhang.com/2018/05/02/overfitting-and-underfitting/

kaggle数据集无法在论文中使用:

https://www.kaggle.com/c/state-farm-distracted-driver-detection/discussion/20043

Auto ML平台

https://zhuanlan.zhihu.com/p/57896464

https://ai.baidu.com/easydl/app/1/models/50071/iterations?deployType=cloud

sklearn使用

https://www.cnblogs.com/lianyingteng/p/7811126.html

CSV文件编写:

https://www.cnblogs.com/jasontang369/p/9241334.html

ImageNet上预训练是否真的有用:

https://www.52cv.net/?p=1741

batchsize的选取:

https://www.zhihu.com/question/61607442

128(0.005)256(0.01)

交叉验证和过拟合:

https://www.cnblogs.com/solong1989/p/9415606.html

CNN调优总结:

https://flashgene.com/archives/64534.html

面试100问:

https://github.com/changgyhub/notes/blob/master/machine-learning/deep-learning/100-questions-part-1.md

模型评估,过拟合等:

https://flashgene.com/archives/27598.html

数据预处理:

https://zhuanlan.zhihu.com/p/29513760

模型融合:

https://machinelearningmastery.com/ensemble-methods-for-deep-learning-neural-networks/

求解损失值:

https://scikit-learn.org/stable/modules/model_evaluation.html#log-loss

有用的链接:

https://github.com/Apoorvajasti/Distracted-Driver-Detection 含视频预测和KNN(以vgg卷积层的输出作为KNN的输入)

https://github.com/nkjsy/Distracted-Driver-Detection 12个,有报告

https://github.com/godloveliang/MLND_distracted_driver_tetection 烂,中文,有报告

https://github.com/mwakaba2/Computer-Vision-Capstone-Project 英文,含有传统特征,有报告

https://github.com/ksasi/Machine-Learning-Capstone-Project 英文

https://gitee.com/ltg00/distracted_driver_detection?_from=gitee_search 全新的CAM,有报告

https://github.com/luisarojas/distracted-driver-detection 网络结构对比

https://github.com/nkcr7/Distracted-Driver-Detection 兰中文,有报告

https://github.com/jartantupjar/7-Distracted-Driver-Detection 兰英文,有报告

https://github.com/LiuKaixinHappy/distracted_driver_detection 中文,报告较好

https://github.com/NaughtyFlame/distracted_driver_detection 烂中文

https://github.com/HarshineeSriram/Distracted-Driver-Detection 含有光流 svm LSTM

https://github.com/Raj1036/ML_Distracted_Driver_Detection 含有随机森林,svm等,英文

https://github.com/claudehotline/distracted_driver_detection 中文

使用格式工厂修改视频分辨率时,输出配置里的屏幕大小和宽高比例都要修改,第一次采用的是mp4格式

https://zhuanlan.zhihu.com/p/53828405 CNN 模型压缩与加速算法综述

https://rss.paoluz.xyz/link/cEm96r3wztdHMrOc?sub=1

https://github.com/devyhia/action-annotation 视频标注

https://github.com/microsoft/VoTT#download-and-install-a-release-package-for-your-platform-recommended 微软的视频标注软件

https://www.pyimagesearch.com/2018/11/26/instance-segmentation-with-opencv/ 分离人体和背景

皮肤分割:

https://www.pyimagesearch.com/2014/08/18/skin-detection-step-step-example-using-python-opencv/

colab安装openpose:

https://github.com/CMU-Perceptual-Computing-Lab/openpose/issues/949

可视化CNN的滤波器:

https://keras-cn.readthedocs.io/en/latest/legacy/blog/cnn_see_world/

CNN可视化:

https://blog.csdn.net/xys430381_1/article/details/90413169

FFmpeg提取关键帧:

https://blog.csdn.net/u011394059/article/details/78728809

FLOPS和FLOPs的区别:

https://blog.csdn.net/zt1091574181/article/details/97393278

CNN的空间复杂度与时间复杂度:

https://zhuanlan.zhihu.com/p/31575074

CNN的发展:

https://zhuanlan.zhihu.com/p/38681805

图片相似性匹配:

https://blog.csdn.net/qq_16234613/article/details/83118222

三种哈希算法的实现:

https://blog.csdn.net/qq_32799915/article/details/81000437

https://www.cnblogs.com/Kalafinaian/p/11260808.html

https://content-blockchain.org/research/testing-different-image-hash-functions/

http://www.hackerfactor.com/blog/?/archives/529-Kind-of-Like-That.html