Module 2: Planning and Decision Making

Reading tasks
A survey on motion prediction and risk assessment for intelligent vehicles [ Link ]
Motion Planning for Autonomous Driving: The State of the Art and Future Perspectives [ Link ]
Deep Reinforcement Learning for Autonomous Driving: A Survey [ Link ]
CARLA: An Open Urban Driving Simulator [ Link ]
Metadrive: Composing diverse driving scenarios for generalizable reinforcement learning [ Link ]
Robust Physical-World Attacks on Deep Learning Visual Classification [ Link ]
Runtime Stealthy Perception Attacks against DNN-based Adaptive Cruise Control System [ Link ]

Blog Post 12: CAP Attack
The paper "Runtime Stealthy Perception Attacks against DNN-based Adaptive Cruise Control Systems" investigates how adversarial attacks on deep neural networks can compromise the safety of Adaptive Cruise Control (ACC) in autonomous vehicles. The authors propose a dynamic, stealthy attack model that modifies live camera feeds to mislead perception modules without triggering safety interventions like emergency braking. Unlike offline attacks, this method adapts in real-time to changing driving conditions and optimizes the timing and magnitude of perturbations for maximum impact. Their CA-Opt attack outperforms baseline methods (CA-Random, CA-APGD) by increasing collision risks while remaining undetected. The study includes both simulation and real-world evaluations, confirming the attack’s transferability and ability to bypass defense mechanisms. Ultimately, the work highlights the importance of robust perception security, especially in high-risk environments like construction zones. [Read more ...]

Blog Post 11: RP2 Attack
This paper introduces RP2 (Robust Physical Perturbation), a novel algorithm that generates physical adversarial perturbations capable of deceiving deep neural network (DNN) classifiers under real-world conditions. The authors target road sign classification, demonstrating how small, graffiti-like stickers applied to stop signs can mislead DNNs into classifying them as different signs (e.g., Speed Limit 45). These attacks remain effective despite environmental changes in lighting, angle, and distance. The study proposes a two-stage evaluation—lab and drive-by tests—to simulate real-world scenarios. Results show up to 100% targeted misclassification in lab tests and 84.8% in field tests. The paper also extends the attack to general objects like microwaves, which are misclassified as phones. By highlighting vulnerabilities in vision-based AI systems, this work emphasizes the urgent need for robust defenses against physical-world adversarial examples in safety-critical applications like autonomous driving. [Read more ...]

Blog Post 10: Driving Simulator
This presentation compares two autonomous driving simulators: CARLA and MetaDrive. CARLA focuses on high realism using rich visuals, sensor data, and complex traffic environments to support perception-based and imitation learning approaches. In contrast, MetaDrive emphasizes fast, scalable, and modular simulation for training reinforcement learning agents in diverse and procedurally generated scenarios. Together, they highlight the trade-offs between visual realism and training efficiency in developing generalizable self-driving systems. [Read more ...]

Blog Post 9: Deep RL
Sujan Gyawali presents the paper Deep Reinforcement Learning for Autonomous Driving: A Survey. The paper summarizes deep reinforcement learning algorithms and provides a taxonomy of automated driving tasks where these methods have been employed. The paper discusses the role of simulators in training agents, as well as methods to validate, test, and robustify existing solutions in RL. [Read more ...]

Blog Post 8: Motion Planning
This paper reviews and compares current motion planning methods for autonomous driving. It discusses two main approaches: the pipeline planning method, which is modular and easy to interpret but depends on manual rules and can be resource heavy, and the end-to-end planning method, which uses deep learning for a unified solution yet struggles with explainability. The paper covers key techniques such as global route planning, local behavior/trajectory planning, imitation learning, reinforcement learning, and parallel learning. It also highlights the advantages and challenges of each approach, and offers insights into future research directions for building more robust, safe, and efficient autonomous driving systems. [Read more ...]

Blog Post 7: Motion Prediction
The paper presents a comprehensive survey of motion prediction and risk assessment techniques for intelligent vehicles, covering physics-based models, maneuver-based models, and interaction-aware models. It also discusses risk assessment methods, including binary collision prediction, probabilistic risk evaluation, and risk estimation in driving scenarios. [Read more ...]

Module 1: Perception

Reading tasks
Sensor and Sensor Fusion Technology in Autonomous Vehicles: A Review [ Link ]
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [ Link ]
You Only Look Once: Unified, Real-Time Object Detection [ Link ]
DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving [ Link ]
Probabilistic 3D Multi-Modal, Multi-Object Tracking for Autonomous Driving [ Link ]
Simple online and realtime tracking with a deep association metric [ Link ]

Blog Post 6: BEVFormer
BEVFormer is a spatiotemporal transformer model designed to generate Bird’s-Eye-View (BEV) representations from multi-camera images for autonomous driving. It aims to overcome the limitations of LiDAR-based systems (high cost) and camera-only methods (poor depth estimation). BEVFormer uses spatial cross-attention to extract features from multi-camera views and temporal self-attention to align and integrate historical BEV frames. These components help track object motion and build a unified BEV feature map for 3D detection and map segmentation. The model uses grid-shaped BEV queries tied to real-world coordinates to guide feature extraction. Evaluated on datasets like nuScenes and Waymo, BEVFormer outperforms previous camera-based methods like DETR3D, achieving a 56.9% NDS score. It approaches the performance of some LiDAR-based models, showing its effectiveness in camera-only setups. The architecture enables better object tracking, motion estimation, and recall for low-visibility objects. However, challenges remain in accurately inferring 3D geometry from 2D images. Overall, BEVFormer pushes camera-based 3D perception closer to practical deployment in autonomous vehicles. [Read more ...]

Blog Post 5: SORT
SORT (Simple Online and Realtime Tracking) introduces a fast, lightweight object tracking algorithm that works in real-time by using a tracking-by-detection paradigm. It combines object detection (e.g., using Faster R-CNN), motion prediction via the Kalman filter, and data association using the Hungarian algorithm. While SORT achieves impressive speed and accuracy, it struggles with identity switches, especially during occlusions or when objects are close together. Deep SORT extends SORT by incorporating appearance information through deep learning. It uses a convolutional neural network (CNN) trained on a re-identification dataset to extract appearance features, which are then combined with motion data for better data association. This significantly reduces identity switches and improves tracking through occlusions, although it introduces computational overhead and requires a modern GPU for real-time performance. [Read more ...]

Blog Post 4: DeepDriving
The paper "DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving" introduces a novel approach to autonomous driving by predicting key driving affordances rather than processing the entire scene or mapping images directly to commands. Traditional Mediated Perception methods require complex scene reconstruction, while Behavior Reflex methods lack interpretability. The proposed Direct Perception model extracts 13 key affordances such as lane distances, heading angles, and vehicle distances using a Convolutional Neural Network (CNN). This study highlights how Direct Perception improves efficiency, interpretability, and generalization to real-world driving scenarios by balancing perception-based and reflex-based approaches. [Read more ...]

Blog Post 3: Yolo
This paper presents YOLO, a real-time object detection method that achieves better accuracy with object detection than comparable real-time object detection methods. The authors found that when YOLO is used in conjunction with Faster CNN, it achieves better accuracy than without its implementation. [Read more ...]

Blog Post 2: Sensor Fusion
The paper "Sensor and Sensor Fusion Technology in Autonomous Vehicles: A Review" provides a comprehensive overview of the role of sensors in autonomous vehicles (AVs), emphasizing their importance in perception, localization, and decision-making. It examines key sensor technologies such as cameras, LiDAR, and radar, discussing their strengths, limitations, and performance under various environmental conditions. The paper highlights the necessity of sensor calibration as a prerequisite for accurate data fusion and object detection, reviewing available open-source calibration tools. Additionally, it categorizes sensor fusion approaches into high-level, mid-level, and low-level fusion, evaluating state-of-the-art algorithms that enhance object detection and overall driving safety. The review concludes by addressing challenges in sensor fusion, such as data synchronization and environmental adaptability, while proposing future research directions for improving autonomous vehicle technology. [Read more ...]

Blog Post 1: Faster R-CNN
The paper introduces Faster R-CNN, a deep learning-based object detection framework that improves upon previous region-based detection models by integrating a Region Proposal Network (RPN). Unlike earlier methods that relied on computationally expensive region proposal algorithms, Faster R-CNN shares convolutional features between region proposal and object detection networks, making the process nearly cost-free. The RPN generates region proposals efficiently, which are then refined by the Fast R-CNN detector. The experimental results demonstrate that Faster R-CNN significantly improves detection accuracy while achieving real-time processing speeds, making it a powerful tool for object detection tasks. [Read more ...]

Example: Machine Learning Applications

Reading tasks
Deep Residual Learning for Image Recognition [ Link ]
Attention Is All You Need [ Link ]

Blog Post Example: ResNet
As the number of layers of neural networks increases, the problems of overfitting, gradient vanishing, and gradient explosion often occur, so this article came into being. In this paper, the concept of deep residual networks (ResNets) is proposed. By introducing "shortcut connections," this study solves the problem of gradient vanishing in deep network training and has an important impact on the field of deep learning. The method of the paper explicitly redefines the network layers as learning residual functions relative to the inputs. By learning residuals, the network can be optimized more easily and can train deeper models more efficiently. Therefore, this method can help solve the performance degradation problem that may occur when the network layer increases. In addition, the article displays the experimental part. The model shows significant improvements in handling large-scale visual recognition tasks like ImageNet and CIFAR-10. The application of deep residual networks in major visual recognition competitions like ILSVRC and COCO 2015 further proves their power and wide applicability. [Read more ...]