Some quick considerations about using IMatch to Site to delete images from The Quantum Garden Website vault.

Need

Protect against deleting image files that are in use elsewhere in the vault. These are direct image links, not links to the photo page. Examples are found in 100 Hours learning Affinity Photo.

Option 1

Images tagged for deletion IMatch are tagged with #status\deleted-image. This allows easy identification for manual checking.

  • Script has to make sure delete, update metadata and update image are exclusive and take no action.
  • Images that have the deleted image tag, but have not been processed in Obsidian will have that tag removed if subsequently updated on a second run
  • Only the photo page is tagged. There is no tagging on the webp image files themselves. Need to check each image file in turn to check for backlinks. Generally only _c is linked.

Option 2 (preferred)

Delete the photo page and all webp image files.

  • Risk of broken images for webp image files that have been removed from underneath the page that links to them.
  • Could do a brute force search for all .webp mentions of the image pattern in the vault. Slow, but protects against this problem. Anything not matched can be deleted.
    • Smart move, if there is any delete is to parse all files for included photo numbers (ignore the photo pages themselves), create a list and match against that. If not matched, ok to delete the files.
      • Keep the page links for each match so that if an image to be deleted is found in use, the script can identify the pages it is used on. Once manually checked or cleaned up, then on the next run the files will be deleted.

Basic code for Option 2

import os
import re
 
def scan_folder_with_subfolders(folder_path, pattern):
    regex = re.compile(pattern)
    matches = []
 
    for root, _, files in os.walk(folder_path):
        for file_name in files:
            file_path = os.path.join(root, file_name)
            try:
                with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
                    for line_num, line in enumerate(f, 1):
                        if regex.search(line):
                            matches.append((file_path, line_num, line.strip()))
            except Exception as e:
                print(f"Error reading {file_path}: {e}")
 
    return matches
 

Multiprocessing code for option 2

import os
import re
import multiprocessing
 
def scan_file(file_path, pattern):
    regex = re.compile(pattern)
    matches = []
    try:
        with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
            for line_num, line in enumerate(f, 1):
                if regex.search(line):
                    matches.append((file_path, line_num, line.strip()))
    except Exception as e:
        print(f"Error reading {file_path}: {e}")
    return matches
 
def scan_folder_with_subfolders(folder_path, pattern):
    # Collect all file paths
    file_paths = []
    for root, _, files in os.walk(folder_path):
        for file_name in files:
            file_paths.append(os.path.join(root, file_name))
    
    # Use multiprocessing to scan each file
    pool = multiprocessing.Pool()
    results = pool.starmap(scan_file, [(file_path, pattern) for file_path in file_paths])
    pool.close()
    pool.join()
    
    # Flatten the list of results
    matches = [match for sublist in results for match in sublist]
    
    return matches
 
# Example usage
if __name__ == "__main__":
    folder_path = '/path/to/parent_folder'
    pattern = r'your_regex_pattern'
    results = scan_folder_with_subfolders(folder_path, pattern)
    for file_path, line_num, line in results:
        print(f"{file_path} (Line {line_num}): {line}")
 

Timing wrapper

 
import time
 
start_time = time.time()
 
# Call your scanning function here
results = scan_folder_with_subfolders(folder_path, pattern)
 
end_time = time.time()
duration = end_time - start_time
 
print(f"Start Time: {time.ctime(start_time)}")
print(f"End Time:   {time.ctime(end_time)}")
print(f"Duration:   {duration:.2f} seconds")