CARLA: An Open Urban Driving Simulator

Authors: Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, Vladlen Koltun

MetaDrive: Composing Diverse Driving Scenarios for Generalizable Reinforcement Learning

Authors: Quanyi Li§*, Zhenghao Peng†*, Lan Feng‡, Qihang Zhang†, Zhenghai Xue†, Bolei Zhou

Presentation by: Ruslan Akbarzade

Time of Presentation: April 15, 2025

Blog post by: Sujan Gyawali

Link to Carla: Read the Paper Link to MetaDrive: Read the Paper

Summary of the Paper

This presentation compares two autonomous driving simulators: CARLA and MetaDrive. CARLA focuses on high realism using rich visuals, sensor data, and complex traffic environments to support perception-based and imitation learning approaches. In contrast, MetaDrive emphasizes fast, scalable, and modular simulation for training reinforcement learning agents in diverse and procedurally generated scenarios. Together, they highlight the trade-offs between visual realism and training efficiency in developing generalizable self-driving systems.

Presentation Breakdown

Introduction

The slide introduces two simulators used in autonomous driving research: CARLA and MetaDrive. CARLA is a realistic urban driving simulator created in 2017 to help researchers safely train and test self-driving models. MetaDrive, on the other hand, is designed to build diverse driving situations for reinforcement learning agents. It focuses more on speed, efficiency, and generalization. The slide sets the stage for a comparative study between these two tools, showing how each serves different purposes in advancing autonomous vehicle technology.

Introduction & Motivation

The slide explains the motivation behind developing CARLA. Testing autonomous vehicles on real roads is risky, especially for rare but dangerous events like children running into the street. Real-world driving is unpredictable, expensive, and full of unique situations that are hard to control. Deep learning models require massive and varied training data, which is difficult to gather from physical testing alone. CARLA addresses these issues by offering a realistic, controllable, and open simulation platform. It allows researchers to safely simulate complex environments, making it a powerful tool for training self-driving systems.

Literature Background – Simulators Before CARLA

The slide reviews older driving simulators used before CARLA and explains why they were not suitable for realistic autonomous driving research. TORCS, developed in 1977, was open-source and lightweight but only supported racetracks without pedestrians or traffic rules, making it unrealistic for urban driving studies. Commercial games like GTA V offered rich graphics and traffic environments, but lacked access to internal control systems and did not support sensor modeling or custom training tasks. Academic simulators were often built for a single paper and were hard to reuse or generalize. These limitations highlighted the need for a more flexible and realistic simulator like CARLA that could support diverse, safe, and reproducible research.

What is CARLA?

This slide introduces CARLA (Car Learning to Act), an open-source simulator built specifically for autonomous driving research. It was developed from scratch using Unreal Engine 4 to offer high-quality graphics and realistic physics. CARLA provides a variety of prebuilt urban environments with buildings, vehicles, pedestrians, and weather conditions. It supports many types of sensors like RGB cameras, depth sensors, GPS, and segmentation. What makes CARLA unique is its focus on self-driving systems—it’s not just a modified video game. It offers full control over scenes, supports different learning approaches like modular pipelines, imitation learning, and reinforcement learning, and allows precise testing through scripting and feedback on collisions or traffic rule violations.

Inside CARLA – Simulation Engine & Architecture

This slide explains the technical setup of CARLA, which is built on Unreal Engine 4 for realistic visuals and physics. CARLA uses a client-server architecture where the server handles the simulation and the client (written in Python) sends commands, changes the environment, and receives sensor data. Users can control everything from vehicle behavior to weather and sensor types. The system is fully programmable, supporting RGB cameras, depth sensors, GPS, and more. The diagram on the right shows how CARLA’s server communicates with Python scripts and how sensor outputs are collected. It also illustrates how different levels—like perception, planning, and control—work together to run an autonomous vehicle in simulation.

CARLA in Autonomous Driving – Performances (Modular Pipeline)

This slide shows how a modular pipeline works in CARLA for self-driving tasks. The system is divided into three parts: perception, planning, and control. In the perception module, the system uses RefineNet to understand the scene by labeling roads, lanes, cars, and people. It also uses AlexNet to detect intersections. The planning part uses rules to decide what the car should do next—like turning or stopping—based on its current situation. The control module then uses a PID controller to move the car by adjusting the steering, throttle, and brakes. This classic method is easy to understand and is often used in early self-driving car models.

CARLA in Autonomous Driving – Imitation Learning

This slide explains how imitation learning works in CARLA. The system learns by watching expert drivers. It takes input like camera images, speed, and commands, and uses a deep network to learn how to drive. The model has different parts to understand the input and control the car. It can decide to go straight, turn, or follow the lane—giving steering, throttle, and brake as outputs.

Experimental Setup Overview

This slide shows how the experiments were done. CARLA has two towns: Town 1 for training and Town 2 for testing. Each town has roads, sidewalks, and traffic. Three methods were tested: modular pipeline, imitation learning, and reinforcement learning. Each method was tested in different weather and locations using four tasks—driving straight, turning once, navigating with no traffic, and navigating with full traffic. Each setup ran 25 times.

Results / Discussion

This slide shows how well each method performed. Imitation Learning worked best in most cases. Modular Pipeline did well in training areas but got worse in new towns. Reinforcement Learning performed the worst, especially when there were obstacles or changes. The hardest setup for all methods was a new town with new weather, which tested how well they could adapt to new situations.

What Did We Learn From CARLA?

This slide shows what we learned by using CARLA. It tested three driving methods: modular pipeline, imitation learning, and reinforcement learning. Imitation learning did the best overall, even in new towns and weather. Reinforcement learning had trouble and needed a lot of training. CARLA helped test all methods in a safe, repeatable way, but it requires strong computers and time to run well.

What Did We Learn From CARLA?

The video shows the different simulation environments and features available in CARLA.

MetaDrive – Introduction & Motivation

This slide introduces MetaDrive, a simulator made in 2022 for training AI using reinforcement learning. It is small, fast, and easy to use. MetaDrive creates many driving scenarios automatically and can also use real-world maps. It helps test how well AI can learn safe driving, work with other cars, and handle new roads. It runs fast and works well with popular AI tools like OpenAI Gym and Stable Baselines.

MetaDrive – Motivation

This slide explains why MetaDrive was created. Older simulators like CARLA look realistic but are slow and hard to use for reinforcement learning. MetaDrive is faster and easier to scale. It creates many different road layouts and traffic setups using blocks. This helps train AI that can handle new situations. MetaDrive focuses on efficiency for research, not just visual quality.

MetaDrive – Features & System Design

This slide shows what makes MetaDrive powerful. It builds different types of roads using small blocks, instead of manually drawing them. It can also use real map data from sources like Waymo and Argoverse. We can change settings like actions, rewards, and when the driving ends. It runs fast, uses little memory, and supports many cars at once for testing. The top images show road layouts recreated from real datasets like Waymo and Argoverse. The bottom image shows how MetaDrive builds a driving map by combining different road pieces like curves, ramps, and roundabouts.

MetaDrive – Capabilities

This slide explains what kind of tasks MetaDrive can handle. It supports single-agent tasks like lane following, goal reaching, and intersection handling. It also allows multi-agent training, where cars share the road and learn to cooperate or compete. Users can adjust how hard the task is by changing traffic, road shapes, or number of vehicles. The main goals are to test how stable, efficient, and generalizable the AI is. The images show different views and sensors used in MetaDrive, like RGB cameras, point clouds, and bird-eye view. There's also a human-in-the-loop setup with a steering wheel for control.

MetaDrives Does it Close The Gaps?

This slide compares MetaDrive with other driving simulators. Many simulators are either realistic but slow, or fast but missing key features. MetaDrive offers a good balance. It supports many features like unlimited scenarios, multiple agents, custom maps, and real data importing—while staying lightweight and fast.The table compares popular simulators like CARLA, GTA V, TORCS, and SUMO. MetaDrive checks most boxes, showing it can close many gaps that other simulators leave open.

Training and Evaluation Setup

This slide explains how agents are trained and tested in MetaDrive. It uses common reinforcement learning methods like PPO, A3C, TD3, and SAC. The agents interact with the simulator using a standard OpenAI Gym format. MetaDrive can run many simulations at once to save time and uses rewards to guide the agents' behavior. The diagrams show how the environment and agent communicate using the PPO and A3C algorithms. The Gym logo shows compatibility with popular AI training tools.

Modular Managers and Scenario Composition

This slide explains how managers in MetaDrive control different parts of the simulation. These include the map, traffic, objects like cones or gates, and the ego (main) car. We can mix different managers to build rich and varied driving scenes. Everything can be controlled using simple Python commands. The top row shows how different managers (like multi-agent or object managers) change the layout of a roundabout. The bottom row shows similar setups using real-world Waymo maps.

MetaDrive - Creating Roads Using Blocks

This slide explains how MetaDrive creates road networks. Roads can be built in two ways: by using small pieces like curves and ramps (procedural generation), or by importing real maps. A smart method called BIG (Block Incremental Generation) adds one road piece at a time until a complete map is ready. The top image shows types of road blocks used—like curves, forks, and roundabouts. The bottom rows show examples of maps made with 5, 7, and 20 blocks. The right side shows how maps are described using simple code.

Ethical Statement/Limitations

This slide presents the ethical concerns and limitations of MetaDrive. It warns about the risks of using AI trained in simulation directly in real life and notes that MetaDrive is only for research. The visual comparison again shows that MetaDrive has lower visual realism than real-world driving data.

Conclusion

This slide compares MetaDrive and CARLA. MetaDrive is fast, flexible, and good for training reinforcement learning (RL) agents in many types of driving tasks. It works well for safe and scalable AI training. CARLA, on the other hand, is very realistic and great for testing vision-based models and sensor setups. It’s useful when trying to transfer simulation to real-world driving (Sim2Real). The MetaDrive images show simple, fast-running driving environments. The CARLA images show rich and detailed scenes with lighting and pedestrians, useful for testing camera-based AI.

Conclusion – Beyond the Paper

This slide looks at where the research is heading. One major goal is Sim2Real, which means moving trained AI from simulation to the real world. Another focus is corner case simulation—testing rare and risky situations. MetaDrive is also being improved for multi-agent learning, like toll gates or group driving.In the future, these simulators can help build safe self-driving cars, drones, and assistive robots. They reduce testing costs, improve safety, and allow AI to train in situations too dangerous to try in real life. The logos and diagrams show other tools like VISTA and Scenic, which also focus on training AI with safety in mind.

Conclusion - My Thoughts

This slide shares the presenter’s view. MetaDrive is great for fast and flexible training, especially for testing new ideas quickly. CARLA, on the other hand, is better for learning based on visuals and realism. MetaDrive is efficient, while CARLA is rich in detail and closer to real-world driving. Both tools have strengths and are best used based on the goal of the project. The slide highlights how both simulators serve clear but different purposes. One is for speed and scale; the other for realism and accuracy.

Discussion Questions

This slide asks key questions about choosing the right simulator. Should you prioritize realism (like CARLA) or speed and flexibility (like MetaDrive)? It also raises concerns about missing pedestrians in training and asks whether simulators should focus more on real-world accuracy or efficient learning.

Discussion and Class Insights

Q1: 1. You’re developing an AI system that autonomously drives ambulances through congested urban areas.The system must learn how to:– Handle unpredictable traffic– Make fast but safe decisions– React to edge cases like roadblocks or aggressive drivers Which simulator (CARLA or MetaDrive) would you choose to prototype and test this system, and why?What would be your testing priorities: realism, response time, or decision diversity?

Aleksandar: For the first case, I would choose CARLA. It lets me simulate different corner cases and hidden dangers using scripts. MetaDrive doesn’t support this kind of detailed scenario control, so it wouldn’t be as useful here.

George: I agree CARLA is a better choice for the first question because it has more visual detail and can simulate things like pedestrians more realistically.

Obiora: When I was listening to the presentation, I kept thinking, how can this be used in construction? I didn’t even know that CARLA has a construction environment that was new to me. I’ve actually been searching for simulation tools that are already set up for construction use.

Q2:MetaDrive doesn't include pedestrians or cyclists.CARLA has limited human-like behavior.If your agent is trained in a world where pedestrians don't exist, is it “fair” to expect it to handle them safely in the real world? Why? How can we include the effects of pedestrians in traffic?

Aleksandar: For the second question, I don’t think it’s fair to expect an agent trained without obstacles like pedestrians to perform safely in the real world. The real world is very different—if the agent has never seen those challenges during training, it won’t know how to react properly.

George: For the second question, if the training environment doesn’t include pedestrians, then the model won't know how to detect or react to them properly. That’s why it’s important to combine simulation data with real-world data, especially for rare or risky situations. This helps create a more balanced and complete training dataset. Ideally, the model should be trained on scenarios that include pedestrians so it can learn to handle them safely.

Audience Questions and Answers

No questions were asked during a presentation.