BDD100K: The Largest Open Sources Self-Driving Dataset from UC Berkeley!

Sept. 25, 2018, 2:23 a.m. By: Kirti Bakshi


Today, Self-driving cars are on the verge of transforming the way we travel. And contributing to the same, UC Berkeley’s Artificial Intelligence Research Lab (BAIR) has open-sourced their newest driving database which is being called as BDD100K. A dataset that has marked its entry with added rich annotations.

What is the BDD100K all about?

As the name BDD100K suggests, the dataset- Berkeley Deep Drive(BDD) comprises of 100,000 video sequences where each video sequence is approximately about 40 seconds long and comes with a moderately high definition, that is, 720p and 30 frames per second.

Also, in order to illustrate the rough driving trajectories, recorded GPS information from mobile phones is also available in the videos that have been collected from various locations in the United States at different times of the day. The recorded video sequences also include IMU data, and timestamps.

So, What makes this Dataset Unique and rich?

The key point that makes the dataset so unique and rich to work with is the different weather conditions, like rainy, overcast, sunny and haze that have been covered by it in different recorded videos. The dataset also manages a good balance between daytime as well as nighttime scenarios.

In order to further make the annotated images easily distinguishable, they have been divided into two types of lane markings as well where the vertical lanes are marked in red and parallel lanes are marked in blue. Drivable areas have been separated between the red-marked directly drivable paths and blue-marked alternative drivable paths.

Not limiting the possibilities, the uses of this dataset don't terminate at just building self-driving cars but extend way beyond. An individual can also make use of the data in order to detect pedestrians on the roads/pavements. To support the above said, the dataset at present has over 85,000 instances of pedestrians which make it ideal for this exercise.

How is BDD100K Better and the World's largest?

The database is backed by Berkeley Deep Drive (BDD) Industry Consortium, which studies computer vision and machine learning applications for vehicles and at present comprises of almost one million cars, more than 300k street signs, 130k pedestrians, and a lot more. BDD100K is especially considered to be suitable for computer vision training in order to detect and avoid pedestrians on the street, as it contains more people than other datasets.
In the image below, As it can be clearly seen, their claims of this being the largest ever and the most modifiable self-driving dataset are not exaggerated in the slightest.


Back in March, Baidu in this domain released the then considered largest dataset: AppoloScape. And this release by Berkeley is 800 times larger than Baidu’s AppoloScape, 4,800 times bigger than Mapillary’s dataset and an incredible 8,000 times bigger than KITTI.

Given below are the summarized key points of the dataset:

  • Road Object Detection

  • Instance Segmentation

  • Drivable Area

  • Lane Markings


Open sourcing datasets like BDD100K will massively help the autonomous driving field. An individual can also take part in three challenges set up by Berkeley for this data:

  • Road Object Detection,

  • Drivable Area Segmentation,

  • Domain Adaptation of Semantic Segmentation.

You just don't have enough data to start working on building your very own autonomous vehicle, but you can also compare your progress with the best data scientists in this domain! So, Download the dataset now and get started!

Link To The PDF: Click Here

Download Dataset: Click Here