Some quick considerations about using IMatch to Site to delete images from The Quantum Garden Website vault.
Need
Protect against deleting image files that are in use elsewhere in the vault. These are direct image links, not links to the photo page. Examples are found in 100 Hours learning Affinity Photo.
Option 1
Images tagged for deletion IMatch are tagged with #status\deleted-image. This allows easy identification for manual checking.
- Script has to make sure delete, update metadata and update image are exclusive and take no action.
- Images that have the deleted image tag, but have not been processed in Obsidian will have that tag removed if subsequently updated on a second run
- Only the photo page is tagged. There is no tagging on the webp image files themselves. Need to check each image file in turn to check for backlinks. Generally only
_cis linked.
Option 2 (preferred)
Delete the photo page and all webp image files.
- Risk of broken images for webp image files that have been removed from underneath the page that links to them.
- Could do a brute force search for all .webp mentions of the image pattern in the vault. Slow, but protects against this problem. Anything not matched can be deleted.
- Smart move, if there is any delete is to parse all files for included photo numbers (ignore the photo pages themselves), create a list and match against that. If not matched, ok to delete the files.
- Keep the page links for each match so that if an image to be deleted is found in use, the script can identify the pages it is used on. Once manually checked or cleaned up, then on the next run the files will be deleted.
- Smart move, if there is any delete is to parse all files for included photo numbers (ignore the photo pages themselves), create a list and match against that. If not matched, ok to delete the files.
Basic code for Option 2
import os
import re
def scan_folder_with_subfolders(folder_path, pattern):
regex = re.compile(pattern)
matches = []
for root, _, files in os.walk(folder_path):
for file_name in files:
file_path = os.path.join(root, file_name)
try:
with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
for line_num, line in enumerate(f, 1):
if regex.search(line):
matches.append((file_path, line_num, line.strip()))
except Exception as e:
print(f"Error reading {file_path}: {e}")
return matches
Multiprocessing code for option 2
import os
import re
import multiprocessing
def scan_file(file_path, pattern):
regex = re.compile(pattern)
matches = []
try:
with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
for line_num, line in enumerate(f, 1):
if regex.search(line):
matches.append((file_path, line_num, line.strip()))
except Exception as e:
print(f"Error reading {file_path}: {e}")
return matches
def scan_folder_with_subfolders(folder_path, pattern):
# Collect all file paths
file_paths = []
for root, _, files in os.walk(folder_path):
for file_name in files:
file_paths.append(os.path.join(root, file_name))
# Use multiprocessing to scan each file
pool = multiprocessing.Pool()
results = pool.starmap(scan_file, [(file_path, pattern) for file_path in file_paths])
pool.close()
pool.join()
# Flatten the list of results
matches = [match for sublist in results for match in sublist]
return matches
# Example usage
if __name__ == "__main__":
folder_path = '/path/to/parent_folder'
pattern = r'your_regex_pattern'
results = scan_folder_with_subfolders(folder_path, pattern)
for file_path, line_num, line in results:
print(f"{file_path} (Line {line_num}): {line}")
Timing wrapper
import time
start_time = time.time()
# Call your scanning function here
results = scan_folder_with_subfolders(folder_path, pattern)
end_time = time.time()
duration = end_time - start_time
print(f"Start Time: {time.ctime(start_time)}")
print(f"End Time: {time.ctime(end_time)}")
print(f"Duration: {duration:.2f} seconds")
