In this blog post, I'll show you how to create a Python script that extracts files from zip archives to a specific path on your hard drive. Google Takeout will allow users to extract all of their Google Photos to any number of large zip files. After downloading, you are still required to extract to your own file system. This script can help that by iterating over each zip and perform the extraction.
- June 23, 2024
When managing large amounts of data, especially backups from services like Google Photos, you may find yourself dealing with numerous zip files. Manually extracting specific files from these archives can be tedious. Fortunately, Python offers a powerful way to automate this process.
In this blog post, I'll show you how to create a Python script that extracts files from zip archives to a specific path on your hard drive. You can specify a particular path within each zip file to extract files from, making it ideal for organizing photos or other data from Google backups. The script below will iterate over each zip file within the directory and perform the same extraction from within each zip to another target directory. In our particular case we had 17 - 10 gb zip files and we need to iterate over each to extract the photos.
Before we dive into the script, ensure you have Python installed on your machine. You can download it from python.org. You'll also need the zipfile
and os
modules, which are part of Python's standard library, so no additional installations are required.
Here's the Python script to extract files from zip archives:
import os
import zipfile
def extract_specific_path_from_zip(source_dir, dest_dir, specific_path):
# Ensure destination directory exists
if not os.path.exists(dest_dir):
os.makedirs(dest_dir)
# Loop through all files in the source directory
for item in os.listdir(source_dir):
# Construct full file path
file_path = os.path.join(source_dir, item)
# Check if the file is a ZIP file
if zipfile.is_zipfile(file_path):
# Open the ZIP file
with zipfile.ZipFile(file_path, 'r') as zip_ref:
# Get the list of all archived file names from the zip
zip_file_names = zip_ref.namelist()
# Filter the files that match the specific path
files_to_extract = [f for f in zip_file_names if f.startswith(specific_path)]
# Extract the filtered files preserving the directory structure below the specific path
for file in files_to_extract:
try:
# Determine the relative path within the specific path
relative_path = os.path.relpath(file, specific_path)
# Determine the output file path
output_file_path = os.path.join(dest_dir, relative_path)
# Normalize the output file path to handle any issues with invalid characters or spaces
output_file_path = os.path.normpath(output_file_path)
# Ensure the directory exists
os.makedirs(os.path.dirname(output_file_path), exist_ok=True)
# Extract the file data and write to the destination directory
with zip_ref.open(file) as source_file:
with open(output_file_path, 'wb') as output_file:
output_file.write(source_file.read())
print(f"Extracted: {file} from {file_path} to {output_file_path}")
except Exception as e:
print(f"Failed to extract {file} from {file_path}. Error: {e}")
if __name__ == "__main__":
source_directory = "C:/Temp/May23_BrittmarieGoogleBackup" # Replace with the path to the source directory
destination_directory = "G:/Home_PC/Pictures/2023-06-23" # Replace with the path to the destination directory
specific_path_in_zip = "Takeout/Google Photos" # Replace with the specific path to extract from each ZIP file
extract_specific_path_from_zip(source_directory, destination_directory, specific_path_in_zip)
zipfile
and os
modules.extract_specific_path_from_zip
function takes three parameters:
source_dir
: The directory containing the zip files you want to extract files from.dest_dir
: The directory where you want the extracted files to be saved.specific_path
: The specific path within each zip file from which you want to extract files.To use this script, replace the placeholder values for source_directory
, destination_directory
, and specific_path_in_zip
with your specific paths. For example, if you want to extract all photos from a Google Photos backup, you might have:
if __name__ == "__main__":
source_directory = "C:/Temp/May23_BrittmarieGoogleBackup" # Replace with the path to the source directory
destination_directory = "G:/Home_PC/Pictures/2023-06-23" # Replace with the path to the destination directory
specific_path_in_zip = "Takeout/Google Photos" # Replace with the specific path to extract from each ZIP file
extract_specific_path_from_zip(source_directory, destination_directory, specific_path_in_zip)
Save the script as extract_photos.py
and run it using:
python extract_photos.py
This script will extract all files from the specified path within each zip archive and save them to your target directory, maintaining the directory structure.
This updated Python script is a handy tool for automating the extraction of specific files from zip archives. Whether you're organizing photos from Google or managing backups, this script can save you time and effort. Happy coding!