angle-uparrow-clockwisearrow-counterclockwisearrow-down-uparrow-leftatcalendarcard-listchatcheckenvelopefolderhouseinfo-circlepencilpeoplepersonperson-fillperson-plusphoneplusquestion-circlesearchtagtrashx

Get a list of YouTube videos of a person

The YouTube API is the way to go to get a list of YouTube videos of a person, but it can cost you a lot.

7 September 2023 Updated 15 September 2023
post main image
https://pixabay.com/users/27707-27707

A few days ago I got the question: Can you download all the public YouTube videos of a person, that were uploaded between 2020 and today. The total number of videos was about two hundred. And no, I could not get access to this person's YouTube account.

In this post, I use the YouTube API to download the required metadata from the videos, one item per video. I looked in PyPI, but could not find a suitable package for this trivial problem, so I decided to write some code myself. You can find the code below.
All it does is retrieve data from the YouTube API and store it in an 'items file'. That's all. You can use this file to create, for example, a file with lines where each line contains a yt-dlp command that downloads the video. But that's up to you.

As always I am doing this on Ubuntu 22.04.

yt-dlp and yt-dlp-gui

yt-dlp is a command-line program that can be used to download files from many sources, including YouTube. We can install this and then also install yt-dlp-gui, which gives us a GUI.

This is how I downloaded a number of files. Go to YouTube, copy the links and paste them into yt-dlp-gui. But we don't want to copy-paste 200 video urls, which is the opposite of DRY (Don't Repeat Yourself)!

YouTube API

To automate the download, we need to retrieve the metadata of all the person's video files. We can retrieve these using the YouTube API. Looking at this API, the way to go appeared to be the "YouTube - Data API - Search" method, see links below.

Getting a YouTube API key

To use the YouTube API you need a YouTube API key. I am not going to bore you with this here. There are many instructions on the internet how to get this API key.

The person has no YouTube channel?

To use the YouTube API search method, we need the channelId, which is the id of the person's YouTube channel. But what if the person hasn't created a channel? Then there is still a channelId. One way to find the channelId for an account is to search the Internet:

youtube channel <name>

This will give a link containing the channelId.

The YouTube API search method

I must get the meta data for some two hundred videos over a period of three years. However, the number of items returned by the YouTube API search method is limited to 50, the documentation is not very clear about this. Fortunately the YouTube API search method allows us to search between dates:

  • published_after
  • published_before

I decided to split the search into monthly searches and then hope that the person did not upload more than - some limit - videos a month.

The YouTube API does not return all items at once but uses paging. The response contains a parameter 'nextPageToken' if there are more items. In this case, we add this to our next request, receive the response, etc. until this parameter is NULL.

Here is an example of a response:

{
    "kind": "youtube#searchListResponse",
    "etag": "Hlc-6V55ICoxEujG5nA274peA0o",
    "nextPageToken": "CAUQAA",
    "regionCode": "NL",
    "pageInfo": {
        "totalResults": 29,
        "resultsPerPage": 5
    },
    "items": [
        ...
    ]
}

And the items look look this:

[
    {
        'kind': 'youtube#searchResult', 
        'etag': <etag>, 
        'id': {
            'kind': 'youtube#video', 
            'videoId': <video id>
        }, 
        'snippet': {
            'publishedAt': '2023-07-12T01:55:21Z', 
            'channelId': <channel id>, 
            'title': <title>,
            'description': <description>, 
            'thumbnails': {
                ...
            },
            'channelTitle': <channel title>, 
            ...
        }
    },
    ...
]

Filename complications

This has nothing to do with the data retrieved from the YouTube API. I came across this when using this data to download video files with yt-dlp and wanted to share this with you.

A typical yt-dlp command to download a YouTube video into an mp4 is:

yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best" https://www.youtube.com/watch?v=lRlbcxXnMpQ -o "%(title)s.%(ext)s"

Here we let yt-dlp create the filename based on the title of the video. But depending on the operating system you use, not all characters are allowed in a filename.

To make it look as much like the video title as possible, yt-dlp converts certain characters to unicode, resulting in the filename looking almost like the title of the video, but often it is NOT the same! This is a nice feature, but totally unusable if you want to:

  • compare filenames, or
  • transfer files between different systems

In the end, I chose to create the filenames myself by replacing unwanted characters with an underscore. In addition, I created a text file with lines containing:

<filename> <video title>

In this way, it is possible to reconstruct the title of a filename starting from a filename. Note that it is even better to include a unique value in the filename, such as the published date and time, to avoid name clashes.

YouTube API credits: finished ... :-(

When you start using these APIs, you get free credits from Google so you can get started. Many people on the Internet already warned that the YouTube API search method consumed a lot of credits.

I did some short experiments and three full runs. During the third full run, my credits ran out. That's fast! Or, that's a lot of money or few simple runs!

Anyway, during the second run I had already collected all the data I wanted, so it was enough for this project. And don't panic, the free credits reset every day.

The code

If you want to try it yourself, here is the code, enter the person's channelId and your YouTube API key. We start at the most recent month, retrieve items, save items and move to the previous month until we get to the last month. We specify start and last as Tuples.
The 'items file' is loaded with the JSON items retrieved from the YouTube API. Items are added only for new videoIds. This means there is no need to delete this file between runs.

Install these first:

pip install python-dateutil
pip install requests

The code:

# get_video_list.py
import calendar
import datetime
import json
import logging
import os
import sys
import time
import urllib.parse

from dateutil import parser
from dateutil.relativedelta import relativedelta 
import requests

def get_logger(
    console_log_level=logging.DEBUG,
    file_log_level=logging.DEBUG,
    log_file=os.path.splitext(__file__)[0] + '.log',
):
    logger_format = '%(asctime)s %(levelname)s [%(filename)-30s%(funcName)30s():%(lineno)03s] %(message)s'
    logger = logging.getLogger(__name__)
    logger.setLevel(logging.DEBUG)
    if console_log_level:
        # console
        console_handler = logging.StreamHandler(sys.stdout)
        console_handler.setLevel(console_log_level)
        console_handler.setFormatter(logging.Formatter(logger_format))
        logger.addHandler(console_handler)
    if file_log_level:
        # file
        file_handler = logging.FileHandler(log_file)
        file_handler.setLevel(file_log_level)
        file_handler.setFormatter(logging.Formatter(logger_format))
        logger.addHandler(file_handler)
    return logger

logger = get_logger()


class YouTubeUtils:
    def __init__(
        self,
        logger=None,
        channel_id=None,
        api_key=None,
        yyyy_mm_start=None,
        yyyy_mm_last=None,
        items_file=None,
    ):
        self.logger = logger
        self.channel_id = channel_id
        self.api_key = api_key
        self.items_file = items_file

        # create empty items file if not exists
        if not os.path.exists(self.items_file):
            items = []
            json_data = json.dumps(items)
            with open(self.items_file, 'w') as fo:
                fo.write(json_data)

        self.year_month_dt_start = datetime.datetime(yyyy_mm_start[0], yyyy_mm_start[1], 1, 0, 0, 0)
        self.year_month_dt_current = self.year_month_dt_start
        self.year_month_dt_last = datetime.datetime(yyyy_mm_last[0], yyyy_mm_last[1], 1, 0, 0, 0)

        # api request
        self.request_delay = 3
        self.request_timeout = 6

    def get_previous_year_month(self):
        self.logger.debug(f'()')
        self.year_month_dt_current -= relativedelta(months=1)
        self.logger.debug(f'year_month_dt_current = {self.year_month_dt_current}')
        if self.year_month_dt_current < self.year_month_dt_last:
            return None, None
        yyyy = self.year_month_dt_current.year
        m = self.year_month_dt_current.month
        return yyyy, m

    def get_published_between(self, yyyy, m):
        last_day = calendar.monthrange(yyyy, m)[1]
        published_after = f'{yyyy}-{m:02}-01T00:00:00Z'
        published_before = f'{yyyy}-{m:02}-{last_day:02}T23:59:59Z'
        self.logger.debug(f'published_after = {published_after}, published_before = {published_before}')
        return published_after, published_before

    def get_data_from_youtube_api(self, url):
        self.logger.debug(f'(url = {url})')
        r = None
        try:
            r = requests.get(url, timeout=self.request_timeout)
        except Exception as e:
            self.logger.exception(f'url = {url}')
            raise
        self.logger.debug(f'status_code = {r.status_code}')
        if r.status_code != 200:
            raise Exception(f'url = {url}, status_code = {r.status_code}, r = {r.__dict__}')
        try:
            data = r.json()
            self.logger.debug(f'data = {data}')
        except Exception as e:
            raise Exception(f'url = {url}, converting json, status_code = {r.status_code}, r = {r.__dict__}')
            raise
        return data

    def add_items_to_items_file(self, items_to_add):
        self.logger.debug(f'(items_to_add = {items_to_add})')
        # read file + json to dict
        with open(self.items_file, 'r') as fo:
            json_data = fo.read()
        items = json.loads(json_data)
        self.logger.debug(f'items = {items}')
        # add only unique video_ids
        video_ids = []
        for item in items:
            id = item.get('id')
            if id is None:
                continue
            video_id = id.get('videoId')
            if video_id is None:
                continue
            video_ids.append(video_id)
        self.logger.debug(f'video_ids = {video_ids}')
        items_added_count = 0
        for item_to_add in items_to_add:
            self.logger.debug(f'item_to_add = {item_to_add})')
            kind_to_add = item_to_add['id']['kind']
            if kind_to_add != 'youtube#video':
                self.logger.debug(f'skipping kind_to_add = {kind_to_add})')
                continue
            video_id_to_add = item_to_add['id']['videoId']
            if video_id_to_add not in video_ids:
                self.logger.debug(f'adding video_id_to_add = {video_id_to_add})')
                items.append(item_to_add)
                items_added_count += 1
                video_ids.append(video_id_to_add)
        self.logger.debug(f'items_added_count = {items_added_count})')
        if items_added_count > 0:
            # dict to json + write file
            json_data = json.dumps(items)
            with open(self.items_file, 'w') as fo:
                fo.write(json_data)
        return items_added_count

    def fetch_year_month_videos(self, yyyy, m):
        self.logger.debug(f'(yyyy = {yyyy}, m = {m})')
        published_after, published_before = self.get_published_between(yyyy, m)
        url_base = 'https://youtube.googleapis.com/youtube/v3/search?'
        url_params = {
            'part': 'snippet,id',
            'channelId': self.channel_id,
            'publishedAfter': published_after,
            'publishedBefore': published_before,
            'sort': 'date',
            'key': self.api_key,
        }
        url = url_base + urllib.parse.urlencode(url_params)

        total_items_added_count = 0
        while True:
            time.sleep(self.request_delay)
            data = self.get_data_from_youtube_api(url)
            page_info = data.get('pageInfo')
            self.logger.debug(f'page_info = {page_info})')

            items = data.get('items')
            if items is None:
                break
            if not isinstance(items, list) or len(items) == 0:
                break
            # add items
            total_items_added_count += self.add_items_to_items_file(items)

            next_page_token = data.get('nextPageToken')
            self.logger.debug(f'next_page_token = {next_page_token})')
            if next_page_token is None:
                break
            # add next page token
            url_params['pageToken'] = next_page_token
            url = url_base + urllib.parse.urlencode(url_params)

        self.logger.debug(f'total_items_added_count = {total_items_added_count})')
        return total_items_added_count


def main():
    # replace CHANNEL_ID and API_KEY with your values
    yt_utils = YouTubeUtils(
        logger=logger,
        channel_id='CHANNEL_ID',
        api_key='API_KEY',
        # current month + 1
        yyyy_mm_start=(2023, 10),
        yyyy_mm_last=(2020, 1),
        items_file='./items.json',
    )

    while True:
        yyyy, m = yt_utils.get_previous_year_month()
        if yyyy is None or m is None:
            break
        logger.debug(f'fetching for {yyyy}-{m:02}')
        yt_utils.fetch_year_month_videos(yyyy, m)
        

if __name__ == '__main__':
    main() 

Summary

Retrieving data about the YouTube videos using the YouTube API is not very difficult. We created a separate 'items file' to use for further processing.

This was a fun project with a big surprise once I started downloading YouTube videos using the information in the 'items file'. yt-dlp can generate file names that are very close to the title of the video. It does this by inserting unicode characters and it looks great, but I found this very confusing.

Oh, and another surprise. Using the YouTube API can be very expensive. It didn't take much to use up the daily free credits.

Links / credits

YouTube - Data API - Search
https://developers.google.com/youtube/v3/docs/search

yt-dlp
https://github.com/yt-dlp/yt-dlp

yt-dlp-gui and others
https://www.reddit.com/r/youtubedl/wiki/info-guis

Leave a comment

Comment anonymously or log in to comment.

Comments

Leave a reply

Reply anonymously or log in to reply.