Uncategorized

Auto-saving Browser Documents to Augment Local AI Model Context

2023-09-212023-09-21 by Liv

I’ve written a bit about my interest in using local artificial intelligence for memory recall, and this week I finally made some progress on a project to start turning some of my earlier thinking into an actual part of my workflow. Welcome to the world, Memory Cache!

Memory Cache is a project that allows you to save a webpage as a PDF document, which is then used to augment context for a local instance of privateGPT. I’ve been thinking a lot about how to incorporate the things that I’ve been reading online into my own local language model for future reference, recall, and deriving insights from. Up until now, I’ve been manually saving content as source documents for privateGPT to ingest, but now I have a shortcut and I couldn’t be happier with this small step in the right direction.

Prep work:

Set up privateGPT
Create a symlink between a subdirectory in my default Downloads folder called ‘MemoryCache’ and a ‘MemoryCache’ directory created inside of /PrivateGPT/source_documents/MemoryCache
Attain a PDF Mage API key – demo keys will work for up to 100 requests to trial the service, which converts an HTML page into a PDF and provides a download link

The basic extension behavior:
I wanted something that would really quickly let me save a page that I wanted to “save” to my memory. The curernt extension works by having to click the icon and select the ‘Save’ button. This sends the current page to PDF Mage to convert it into a PDF, then return a downloadable link and save the file silently in the symlinked folder to be ingested into privateGPT.

The core extension code:


const DOWNLOAD_SUBDIRECTORY = "MemoryCache";

/** Temporary trial of PDF Mage for converting HTML to PDF */
const PDFMAGE_API_KEY = <Hidden>;
const PDFMAGE_API_ENDPOINT = "https://pdfmage.org/pdf-api/v1/process";

/** Test download file */
const TEST_PDF_URL = "https://www.dropbox.com/scl/fi/oxir1zk7gnakhkh1htktf/Good.AI-PDF-Download.pdf?rlkey=lcg8pe9avwbcv01pg7szr5vg8&dl=1"

var activeTabTitle = "";
var activeTabURL = "";

let downloadProperties = {
    toFileName: DOWNLOAD_SUBDIRECTORY + Math.random() + ".pdf"
}

function onError(error) {
    console.log(`Error: ${error}`);
}

/** Set up information for the current tab when the extension is opened */
function getCurrentPageDetails() {
    browser.tabs.query({currentWindow: true, active: true})
    .then((tabs) => {
        activeTabTitle = tabs[0].title;
        activeTabURL = tabs[0].url;
        document.querySelector("#title").innerHTML = "Title: " + activeTabTitle;
        document.querySelector("#url").innerHTML = "URL: " + activeTabURL;
    })
}

/* 
Generate a file name based on date and time
*/
function generateFileName() {
    return new Date().toISOString().concat(0,19).replaceAll(":", ".") + ".pdf";
}

/*
Save the active page as a PDF to the MemoryCache subdirectory
*/
function savePageAsPDF() {
    /*browser.tabs.saveAsPDF(downloadProperties).then((res) => {
      console.log(res);
    }); */
    fetch(PDFMAGE_API_ENDPOINT, {
        method : "POST", 
        body: JSON.stringify ({
            TargetUrl : activeTabURL
        }), 
        headers: {
            "Content-type" : "application/json; charset=UTF-8",
            "x-api-key" : PDFMAGE_API_KEY
        }
    })
    .then((response) => response.json())
    .then((json) => {
        console.log(json)
        var downloadURL = json.Data.DownloadUrl;
        if(downloadURL) {
            browser.downloads.download({
                url: downloadURL,
                filename: DOWNLOAD_SUBDIRECTORY + "/" + generateFileName()
             })
        }
        else {
            console.log("Error retrieving data");
        }
    })
};

function testDownload() {
    var fileName = DOWNLOAD_SUBDIRECTORY + "/" + generateFileName()
    browser.downloads.download({
        url: TEST_PDF_URL, 
        filename: fileName
    })
}

document.querySelector("#save").addEventListener("click", savePageAsPDF);
window.addEventListener("load", getCurrentPageDetails);

A note about the HTML->PDF Conversion Step:
The reason that I’m using PDF Mage for the time being is that the existing Firefox APIs allow me to either print a page to PDF (which opens the print dialogue) or to silently download a file. Unfortunately, there isn’t an existing way (to my knowledge) to quickly combine both of these into a single “downloadPageAsPDF” option, and I didn’t want to have to deal with the file system for this first pass. In the future, I would look at either running this as a service myself, or making a custom Firefox build that combines these two existing APIs so I can silently save pages as PDFs.

Looking ahead
Getting to this stage is very exciting for me. It’s been a long time since I wrote code that actually did something novel for myself, and each step uncovers a little bit more about what needs to be done to make my idea of converting my desktop computer into a truly personal AI agent that works for me and learns the way my brain does.