Detection of Secondary Driving Tasks
MIT—AVT
王飞跃等的相关工作
1.End-to-End Driving Activities and Secondary Tasks Recognition Using Deep Convolutional Neural Network and Transfer Learning
- Specifically, seven common driving activities are identified, which are normal driving, right mirror checking, rear mirror checking, left mirror checking, using in-vehicle video device, texting, and answering mobile phone. Among these, the first four activities are regarded as normal driving tasks, while the rest three are divided into distraction group.
- using a Gaussian mixture model (GMM) to extract the driver region from the background
- AlexNet:directly takes the processed RGB images as the input and outputs the identified label.
- to reduce the training cost, the transfer learning mechanism is applied
- An average of 79% detection accurac
2.Identification and Analysis of Driver Postures for In-Vehicle Driving Activities and Secondary Tasks Recognition
- the importance of these features to behaviour recognition is evaluated using Random Forests (RF) andMaximal Information Coefficient (MIC) methods.
- Feedforward Neural Network (FFNN) is used to identify the seven tasks
3.
4.Non-Driving Activities Recognition using two-stream convolutional neural network and FlowNet 2.0
datasets
原文见《Real-time Distracted Driver Posture Classification》
State Farm Distracted Driver Detection
MDAD: A Multimodal and Multiview in-Vehicle Driver Action Dataset
R-DA
原文见:《Feature refinement for image-based driver action recognition via multi-scale attention convolutional neural network》
ideas
skeleton
- Skeleton-Based Action Recognition(基于骨骼关键点)
- openpose for skeleton for 1
直接分类
- 训练好图片分类器后,使用对指定数目的帧取平均的办法对视频分类
将传统图像处理的办法与深度学习相结合
图像的滤波、分裂、归并、分割以及形态学处理
传统的SVM,KNN仍有很强的应用前景
语义分割,然后定位(分离驾驶员和背景再输入网络)
全卷积,解决驾驶员有多个第二驾驶员任务时
利用人体骨骼做图像预处理
在模型的训练过程中,并没有对图像进行裁切,事实上很多图像元素是不需要的,我们只需要关注驾驶员的肢体及面部等特征,并不需要去过多关注车身等部分特征,一定程度上可提高训练效果。
通过基于人体骨骼点识别,根据人体骨骼点的完整程度,识别司机肢体,再进行凸包形态学处理,从而把人体相关的部分图片提取出来,完成剪切操作;把图片用白色填充成一个方形图片,此时,图片大小、特征减少,从而使图片可以正常用于训练。
每一帧图片增加两帧光流,再对连续几帧数据的CNN结果融合
Goal
降低错检率和漏检率
inception v3
xception
163/163 [==============================] - 4334s 27s/step - loss: 0.2972 - acc: 0.9080 - val_loss: 2.9806 - val_acc: 0.3750 Epoch 2/4
出现每一epoch,训练时的准确率很高,测试时却很低的情况
如何解决断点续训的问题,每一epoch都保存一次模型,下一次load?
NameError: name ‘backend’ is not defined
http://studyai.com/article/d3616cea
https://github.com/keras-team/keras/issues/5088
https://blog.csdn.net/Mr_green_bean/article/details/94575883
调用flow_from_directory()出现“Found 0 images belonging to 2 classes”问题
https://blog.csdn.net/space_dandy/article/details/88431421
创建一个子文件夹
StopIteration: cannot identify image file ‘./test/test/img_15308.jpg’
TODO
image classification
- [x] Resnet 50 M1 NOT CAM
- [x] Xception M11 CAM
- [x] Xception M9 CAM,两个预测
- [x] Xception M12 CAM,目前效果最好
- [x] Xception M10 CAM,目前效果最差
- [x] InceptionResNetV2 M13 ,not CAM
- [x] M9+M11+M1+M13,效果待定
- [x] save model
- [x] cam visualization
- [x] 统计不同模型对不同类别的检测准确度,Confusion matrix
- [x] skeleton preprocessing,效果究竟如何?(骨架)
- [ ] 使用语义分割分割驾驶员和背景再输入网络,提取头部、手部、脸部、特征
- [x] 综合 Xception、InceptionV3和InceptionResNetV2
- [ ] NASNetLarge
- [x] InceptionResNetV2
- [ ] Faster R-CNN
- [x] mobilenet NASNETmobile轻量模型的尝试
- [x] auto ML,已使用百度的DL
- [x] 完善类激活图
- [ ] 晚上可能要用红外摄像头
- [x] 融合模型的方法:特征Concatenate ,每帧结果直接取平均(去掉最大和最小),SVM,遗传算法求加权平均,关键是确定可学习和不可学习参数
- [ ] K-近邻算法投票(相邻几帧(滑窗0投票),时间对最终结果的影响
- [ ] 权重可学习
- [ ] 二阶梯度最优化器,SGD,Nadam
- [ ] 使用元学习的方法,重新采样构建每个batch的support set和query set
- [ ] 图卷积神经网络
- [ ] 使用类别不平衡思想,增强数据
- [x] 在图片上施加噪声,模拟光照变化,头发遮挡,帽子等
- [x] 动态视频检测,体现实时性,时间变化和检测结果同时显现
- [x] 每个epoch打乱,提高训练稳定性
- [x] 加隐藏层,全连接单元数量下降慢点,可在论文中对比
- [x] 与原论文结果进行对比
- [x] 找到batch_size不能取更大的原因,如workers,采用128,256等更大的batch_size
- [ ] 填周志
- [x] 比较冻结与放开全部层数的结果 V1 Xception、InceptionV3、 V2 InceptionV3
- [x] 重新训练 AUC V2的模型
- [x] 使用batch_size减小继续训练AUC V1模型
- [x] 在AUC V2上重新启用Resnet50 主要是改变图片的预处理方式,每个通道分别减去训练集该通道平均值
- [x] 使用sklearn记录模型融合时的损失值,底层求准确率
- [ ] 解决自己录制的视频中:V1版本模型右手打字、喝水、操作收音机三个类别识别准确率低,V2版本模型右手打字准确率低,喝水、整理头发和化妆易混淆:1.手动标定,加入训练集中训练(掩耳盗铃) 2.拍摄与数据集摄像头角度相似的数据 3.V2版本的模型泛化能力更好,在此基础上实验滑窗
- [x] V1的loss和time,V2的loss
- [x] 整理state-farm数据集结果,并使用其预测采集的视频
- [x] 在camera2的demo视频上尝试视频滑窗预测,看在低分类准确率时会不会有提升
- [x] 找到三个版本的数据集中摄像头的安装位置信息
- [ ] 使用模型压缩和量化,丢帧处理等方式提高实时性
- [ ] 标定好自己采集的视频,还需要采集强曝光、晚上的视频,并进行预测
- [x] 尝试openpose获取的骨架输入CNN,并整理分类效果
- [x] 开始写论文
- [x] 整理好4.7和4.9的123视频滑窗预测效果
- [x] 中期答辩ppt
- [x] 整理中期答辩文件
- [x] 模拟答辩一次,确认所有文件
- [x] 理解交叉熵
- [x] 实时性对比需要明确硬件
- [ ] 小数点不要太多
- [ ] 扩充数据集
- [x] 可视化CNN每层提取的特征(滤波器)
- [ ] 了解召回率
- [ ] 使用增强后的图片做预测,测试模型的鲁棒性
- [ ] 每个模型的权重占比与模糊数学的关系
- [ ] 模型融合 stacking
- [x] V1上InceptionV3和Xception 特征堆叠
- [x] 提取关键帧,目前关键帧提取时间过长
- [ ] 图片相似比较 ,哈希算法,SIFT,SURF特征
- [ ] 使用卷积层代替最终的全连接层 错误:ValueError: Error when checking target: expected last_conv to have 4 dimensions, but got array with shape (128, 10)
- [x] 检查和修改论文
- [x] 图纸
- [ ] 创新点
- [ ] 提升分类准确率
- [x] FCN
- [x] CAM可视化 描述的更清楚点
- [x] 丢帧处理 目的是提升实时性能
- [x] 章与章之间要有过渡语句,章开始用将来时,结束用完成时
- [x] CAM可视化 云图的颜色表达
- [x] CAM 可视化视频制作
- [x] 没有丢帧处理前的视频整理
- [x] ppt制作
- [ ] ppt讲解问题:
- [ ] 优化器中的偏差校正作用
- [ ] 固定极小常量的作用
video classification
- [x] video input and predict video output
- [x] simple model to classify video ,single frame in and out
- [x] video dataset,Drive&Act
- [ ] CNN Based Spatio-Temporal Approach,append flow frame到RGB,input Xception,然后取视频里的四帧进行预测平均,Motion Fused Frames (MFFs),model is ready
- [ ] 融合特征,输入softmax或者SVM,需修改video_model
- [ ] two-stream
- [ ] I3D,model is ready, flip aug
- [ ] C3D
- [ ] P3D Resnet
- [ ] CNN+LSTM model to classify video
- [ ] AUTO ML?
model trained logs
M1 2epochs
M11 6epochs
M9 4epochs
video_model 5epochs
video_model_change_vd 8epochs
video_model_change_vd_aug 3epochs 不是一个动作的两帧!网络模型,第一个conv2d提取的特征不够
flow_xception 3epochs
flow_xception_lock70 2epochs
I3D 数据存在问题
尝试使用交叉验证
修改ini_lr,batch_size等超参数
mffs_model_dropout_more_shift_modified 5epochs
Websites
分心驾驶危害:
https://www.nhtsa.gov/risky-driving/distracted-driving/
Keras中的模型搭建和配置:
https://keras.io/zh/getting-started/sequential-model-guide/#lstm
kaggle比赛经验:
https://zhuanlan.zhihu.com/p/26309073
https://cloud.tencent.com/developer/article/1556210
https://towardsdatascience.com/distracted-driver-detection-using-deep-learning-e893715e02a4
https://mc.ai/distracted-driver-detection/
驾驶员行为检测:
https://xugang.ink/wiki/2018-09-14-%E9%A9%BE%E9%A9%B6%E5%91%98%E8%A1%8C%E4%B8%BA%E6%A3%80%E6%B5%8B/
人体姿态估计:
视频分类:
https://www.pyimagesearch.com/2019/07/15/video-classification-with-keras-and-deep-learning/
杂:
https://www.youtube.com/watch?time_continue=1988&v=dwukLOPpaHw&feature=emb_title
现场演示:
https://github.com/luisarojas/distracted-driver-detection
wget远程下载数据集
https://blog.csdn.net/yanlisuo/article/details/81329920
keras函数式编程:
https://keras.io/getting-started/functional-api-guide/
https://blog.csdn.net/ting0922/article/details/94437540
减少过拟合的方法:
http://iot-fans.xyz/2017/11/20/deeplearning/overfitting/
https://www.zhihu.com/question/59201590
https://kaiyuanzhang.com/2018/05/02/overfitting-and-underfitting/
kaggle数据集无法在论文中使用:
https://www.kaggle.com/c/state-farm-distracted-driver-detection/discussion/20043
Auto ML平台
https://zhuanlan.zhihu.com/p/57896464
https://ai.baidu.com/easydl/app/1/models/50071/iterations?deployType=cloud
sklearn使用
https://www.cnblogs.com/lianyingteng/p/7811126.html
CSV文件编写:
https://www.cnblogs.com/jasontang369/p/9241334.html
ImageNet上预训练是否真的有用:
batchsize的选取:
https://www.zhihu.com/question/61607442
128(0.005)256(0.01)
交叉验证和过拟合:
https://www.cnblogs.com/solong1989/p/9415606.html
CNN调优总结:
https://flashgene.com/archives/64534.html
面试100问:
模型评估,过拟合等:
https://flashgene.com/archives/27598.html
数据预处理:
https://zhuanlan.zhihu.com/p/29513760
模型融合:
https://machinelearningmastery.com/ensemble-methods-for-deep-learning-neural-networks/
求解损失值:
https://scikit-learn.org/stable/modules/model_evaluation.html#log-loss
有用的链接:
https://github.com/Apoorvajasti/Distracted-Driver-Detection 含视频预测和KNN(以vgg卷积层的输出作为KNN的输入)
https://github.com/nkjsy/Distracted-Driver-Detection 12个,有报告
https://github.com/godloveliang/MLND_distracted_driver_tetection 烂,中文,有报告
https://github.com/mwakaba2/Computer-Vision-Capstone-Project 英文,含有传统特征,有报告
https://github.com/ksasi/Machine-Learning-Capstone-Project 英文
https://gitee.com/ltg00/distracted_driver_detection?_from=gitee_search 全新的CAM,有报告
https://github.com/luisarojas/distracted-driver-detection 网络结构对比
https://github.com/nkcr7/Distracted-Driver-Detection 兰中文,有报告
https://github.com/jartantupjar/7-Distracted-Driver-Detection 兰英文,有报告
https://github.com/LiuKaixinHappy/distracted_driver_detection 中文,报告较好
https://github.com/NaughtyFlame/distracted_driver_detection 烂中文
https://github.com/HarshineeSriram/Distracted-Driver-Detection 含有光流 svm LSTM
https://github.com/Raj1036/ML_Distracted_Driver_Detection 含有随机森林,svm等,英文
https://github.com/claudehotline/distracted_driver_detection 中文
使用格式工厂修改视频分辨率时,输出配置里的屏幕大小和宽高比例都要修改,第一次采用的是mp4格式
https://zhuanlan.zhihu.com/p/53828405 CNN 模型压缩与加速算法综述
https://rss.paoluz.xyz/link/cEm96r3wztdHMrOc?sub=1
https://github.com/devyhia/action-annotation 视频标注
https://www.pyimagesearch.com/2018/11/26/instance-segmentation-with-opencv/ 分离人体和背景
皮肤分割:
https://www.pyimagesearch.com/2014/08/18/skin-detection-step-step-example-using-python-opencv/
colab安装openpose:
https://github.com/CMU-Perceptual-Computing-Lab/openpose/issues/949
可视化CNN的滤波器:
https://keras-cn.readthedocs.io/en/latest/legacy/blog/cnn_see_world/
CNN可视化:
https://blog.csdn.net/xys430381_1/article/details/90413169
FFmpeg提取关键帧:
https://blog.csdn.net/u011394059/article/details/78728809
FLOPS和FLOPs的区别:
https://blog.csdn.net/zt1091574181/article/details/97393278
CNN的空间复杂度与时间复杂度:
https://zhuanlan.zhihu.com/p/31575074
CNN的发展:
https://zhuanlan.zhihu.com/p/38681805
图片相似性匹配:
https://blog.csdn.net/qq_16234613/article/details/83118222
三种哈希算法的实现:
https://blog.csdn.net/qq_32799915/article/details/81000437
https://www.cnblogs.com/Kalafinaian/p/11260808.html
https://content-blockchain.org/research/testing-different-image-hash-functions/
http://www.hackerfactor.com/blog/?/archives/529-Kind-of-Like-That.html