{"id":330,"date":"2023-09-21T17:40:06","date_gmt":"2023-09-21T17:40:06","guid":{"rendered":"https:\/\/liverickson.com\/blog\/?p=330"},"modified":"2023-09-21T17:44:04","modified_gmt":"2023-09-21T17:44:04","slug":"auto-saving-browser-documents-to-augment-local-ai-model-context","status":"publish","type":"post","link":"https:\/\/liverickson.com\/blog\/?p=330","title":{"rendered":"Auto-saving Browser Documents to Augment Local AI Model Context"},"content":{"rendered":"\n<p>I&#8217;ve written a bit about my interest in using <a href=\"https:\/\/liverickson.com\/blog\/?p=218\" data-type=\"post\" data-id=\"218\">local artificial intelligence<\/a> for <a href=\"https:\/\/liverickson.com\/blog\/?p=282\" data-type=\"post\" data-id=\"282\">memory recall<\/a>, and this week I finally made some progress on a project to start turning some of my <a href=\"https:\/\/liverickson.com\/blog\/?p=222\" data-type=\"post\" data-id=\"222\">earlier thinking<\/a> into an actual part of my workflow. Welcome to the world, Memory Cache!<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"453\" src=\"https:\/\/liverickson.com\/blog\/wp-content\/uploads\/2023\/09\/image-1024x453.png\" alt=\"\" class=\"wp-image-331\" srcset=\"https:\/\/liverickson.com\/blog\/wp-content\/uploads\/2023\/09\/image-1024x453.png 1024w, https:\/\/liverickson.com\/blog\/wp-content\/uploads\/2023\/09\/image-300x133.png 300w, https:\/\/liverickson.com\/blog\/wp-content\/uploads\/2023\/09\/image-768x340.png 768w, https:\/\/liverickson.com\/blog\/wp-content\/uploads\/2023\/09\/image.png 1497w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Memory Cache is a project that allows you to save a webpage as a PDF document, which is then used to augment context for a local instance of <a href=\"https:\/\/github.com\/imartinez\/privateGPT\">privateGPT<\/a>. I&#8217;ve been thinking a lot about how to incorporate the things that I&#8217;ve been reading online into my own local language model for future reference, recall, and deriving insights from. Up until now, I&#8217;ve been manually saving content as source documents for privateGPT to ingest, but now I have a shortcut and I couldn&#8217;t be happier with this small step in the right direction.<\/p>\n\n\n\n<p><strong>Prep work:<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Set up <a href=\"https:\/\/github.com\/imartinez\/privateGPT\">privateGPT<\/a><\/li>\n\n\n\n<li>Create a symlink between a subdirectory in my default Downloads folder called &#8216;MemoryCache&#8217; and a &#8216;MemoryCache&#8217; directory created inside of \/PrivateGPT\/source_documents\/MemoryCache<\/li>\n\n\n\n<li>Attain a <a href=\"https:\/\/pdfmage.org\">PDF Mage API key<\/a> &#8211; demo keys will work for up to 100 requests to trial the service, which converts an HTML page into a PDF and provides a download link<\/li>\n<\/ol>\n\n\n\n<p><strong>The basic extension behavior:<\/strong><br>I wanted something that would really quickly let me save a page that I wanted to &#8220;save&#8221; to my memory. The curernt extension works by having to click the icon and select the &#8216;Save&#8217; button. This sends the current page to PDF Mage to convert it into a PDF, then return a downloadable link and save the file silently in the symlinked folder to be ingested into privateGPT.<\/p>\n\n\n\n<p><strong>The core extension code:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\nconst DOWNLOAD_SUBDIRECTORY = \"MemoryCache\";\n\n\/** Temporary trial of PDF Mage for converting HTML to PDF *\/\nconst PDFMAGE_API_KEY = &lt;Hidden&gt;;\nconst PDFMAGE_API_ENDPOINT = \"https:\/\/pdfmage.org\/pdf-api\/v1\/process\";\n\n\/** Test download file *\/\nconst TEST_PDF_URL = \"https:\/\/www.dropbox.com\/scl\/fi\/oxir1zk7gnakhkh1htktf\/Good.AI-PDF-Download.pdf?rlkey=lcg8pe9avwbcv01pg7szr5vg8&amp;dl=1\"\n\nvar activeTabTitle = \"\";\nvar activeTabURL = \"\";\n\nlet downloadProperties = {\n    toFileName: DOWNLOAD_SUBDIRECTORY + Math.random() + \".pdf\"\n}\n\nfunction onError(error) {\n    console.log(`Error: ${error}`);\n}\n\n\/** Set up information for the current tab when the extension is opened *\/\nfunction getCurrentPageDetails() {\n    browser.tabs.query({currentWindow: true, active: true})\n    .then((tabs) =&gt; {\n        activeTabTitle = tabs&#91;0].title;\n        activeTabURL = tabs&#91;0].url;\n        document.querySelector(\"#title\").innerHTML = \"Title: \" + activeTabTitle;\n        document.querySelector(\"#url\").innerHTML = \"URL: \" + activeTabURL;\n    })\n}\n\n\/* \nGenerate a file name based on date and time\n*\/\nfunction generateFileName() {\n    return new Date().toISOString().concat(0,19).replaceAll(\":\", \".\") + \".pdf\";\n}\n\n\/*\nSave the active page as a PDF to the MemoryCache subdirectory\n*\/\nfunction savePageAsPDF() {\n    \/*browser.tabs.saveAsPDF(downloadProperties).then((res) =&gt; {\n      console.log(res);\n    }); *\/\n    fetch(PDFMAGE_API_ENDPOINT, {\n        method : \"POST\", \n        body: JSON.stringify ({\n            TargetUrl : activeTabURL\n        }), \n        headers: {\n            \"Content-type\" : \"application\/json; charset=UTF-8\",\n            \"x-api-key\" : PDFMAGE_API_KEY\n        }\n    })\n    .then((response) =&gt; response.json())\n    .then((json) =&gt; {\n        console.log(json)\n        var downloadURL = json.Data.DownloadUrl;\n        if(downloadURL) {\n            browser.downloads.download({\n                url: downloadURL,\n                filename: DOWNLOAD_SUBDIRECTORY + \"\/\" + generateFileName()\n             })\n        }\n        else {\n            console.log(\"Error retrieving data\");\n        }\n    })\n};\n\nfunction testDownload() {\n    var fileName = DOWNLOAD_SUBDIRECTORY + \"\/\" + generateFileName()\n    browser.downloads.download({\n        url: TEST_PDF_URL, \n        filename: fileName\n    })\n}\n\ndocument.querySelector(\"#save\").addEventListener(\"click\", savePageAsPDF);\nwindow.addEventListener(\"load\", getCurrentPageDetails);\n<\/code><\/pre>\n\n\n\n<p><strong>A note about the HTML-&gt;PDF Conversion Step:<\/strong><br>The reason that I&#8217;m using PDF Mage for the time being is that the existing Firefox APIs allow me to either print a page to PDF (which opens the print dialogue) or to silently download a file. Unfortunately, there isn&#8217;t an existing way (to my knowledge) to quickly combine both of these into a single &#8220;downloadPageAsPDF&#8221; option, and I didn&#8217;t want to have to deal with the file system for this first pass. In the future, I would look at either running this as a service myself, or making a custom Firefox build that combines these two existing APIs so I can silently save pages as PDFs. <\/p>\n\n\n\n<p><strong>Looking ahead<\/strong><br>Getting to this stage is very exciting for me. It&#8217;s been a long time since I wrote code that actually did something novel for myself, and each step uncovers a little bit more about what needs to be done to make my idea of converting my desktop computer into a truly personal AI agent that works for me and learns the way my brain does. <br><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve written a bit about my interest in using local artificial intelligence for memory recall, and this week I finally made some progress on a project to start turning some of my earlier thinking into an actual part of my workflow. Memory Cache is a project that allows you to save a webpage as a PDF document, which is then used to augment context for a local instance of privateGPT.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":0,"activitypub_interaction_policy_quote":"","activitypub_status":"","footnotes":""},"categories":[1],"tags":[],"class_list":["post-330","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/liverickson.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/330","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/liverickson.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/liverickson.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/liverickson.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/liverickson.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=330"}],"version-history":[{"count":3,"href":"https:\/\/liverickson.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/330\/revisions"}],"predecessor-version":[{"id":334,"href":"https:\/\/liverickson.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/330\/revisions\/334"}],"wp:attachment":[{"href":"https:\/\/liverickson.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=330"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/liverickson.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=330"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/liverickson.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=330"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}