Development, Machine Learning, Philosophy

Data Introspection & Reclaiming my Digital Self

For as long as I can remember, I have always felt as though I was seeing a slice of reality that no one else could – as though I was looking through a lens that sharpened everything, heightening my perception. No one understood the things that I saw, until I began expressing myself through what I could create digitally.

Through my work in tech, I began to discover the algorithms that defined our world, largely driven by my work on social VR. In games and immersive technology, I found myself able to explore and bring to life the worlds I lived in my mind. As I further mastered technology, I became more capable of understanding my physical reality and create pathways for showing my authentic self.

Each and every one of us has our own reality. The future of computing means nothing if we cannot use it to offer pathways or self discovery and action to everyone.

If you’ve been reading for a while, you already know that I hold a lot of value in the ‘self’ that I express through technology. The tools that we build make a profound impact on our culture, society, and us within those systems, creating a flywheel of behavioral changes that push investment and innovation in a particular direction. I would be lying if I said that I wasn’t concerned about the direction big tech is heading these days, but I also can’t pull myself away from the belief that things can be different. Better.

A colleague once referred to generative AI advancements as ‘pushing leverage to the edge’ – specifically in regards to local, on-device artificial intelligence. That phrase stuck with me, and I’ve been slowly building a home AI computing system to reduce my dependency on cloud AI providers like ChatGPT.

Policy vs. Property

Digital information is unique in that there is no limit on how many times it can be accessed from a purely technological standpoint. Once data is created, it can exist in infinite forms, in countless locations, by anyone with the right to access that information and resources to store it. Of course, putting legal protections on data right holders isn’t new. Copyright as a field emerged to protect intellectual property from being taken and used by anyone, but the policies surrounding digital content rights and ownership are increasingly murky with generative AI.

For most online platforms, using a given site or application means that you are granting an irrevocable right to data that you give the platform to use for their own purposes. While lawyers argue that this is necessary to provide services, these clauses have also been used to create a $550B+ digital advertising market. I checked the market cap of all of the public companies that have advertised to me based on Facebook’s ad targeting alone, and it’s a combined eleven trillion dollars. All that information about the minutes I spend flipping through reels is translating to ungodly amounts of revenue for companies – and Meta really profits from it.

Consent

I joined Facebook when I was 16 years old. I’m pretty sure legally I wasn’t allowed to consent to their terms of use at that point, but I’ve subsequently published an additional eighteen years of statuses, messages, and photos – at least some of which were after I knew about their egregious privacy practices – so I’m really the only one to blame.

It’s hard to opt out, though. My family talks via WhatsApp. I have friends who only use Facebook Messenger to stay in touch. We just moved to a new state, and the local communities heavily rely on Facebook to organize communities. Opting out of Meta’s ecosystem entirely means shutting off avenues of connection that people have come to view as the default. In this regard, we can consent only so far as that consent comes with a side of social coersion, which I would argue isn’t really consent at all.

And the thing about that ‘irrevocable use’ clause that we grant when we consent to use Facebook? They’re allowed to use all of our data to train their AI models because that’s legally covered, vaguely, as ‘providing their services’ to you. If you were to consider the terms of service through an actual consent-driven framework, this is laughably inappropriate – it’s like consenting to get your hair cut and the hairdresser then drives to your house and shaves your head in the middle of the night, because they need to get better at shaving and you gave them irrevocable access to the hair on your head.

Reclaiming Personal Data: A Quest

Despite the terms of service that govern Facebook and grant them that shiny, irrevocable license to use your information to grow their market cap from their $104B IPO to the $1.5T behemoth they are today, you technically still own your data on Facebook. So, I took it upon myself to start using that information for my own benefit.

(Here’s how to get your own archive, if you want to do this too.)

The first step to reclaiming your personal data is to request a complete download of your Facebook archive. After waiting a week or so, this will get you a compressed (“.zip”) folder that is full of other folders. In these folders are a bunch of HTML web page files that show you things like:

  • Every message you have ever sent or received
  • Your wall (RIP) and timeline posts
  • All of your photo albums and pictures that you’ve shared
  • Your connections
  • Ads that you’ve interacted with
  • Advertisers who have paid Meta to target you
  • Anyone you unfriended
  • The number of minutes you spent watching reels in the previous two weeks

Once I had that archive, I started to play. I started with ChatGPT to create a personalized memorial site of my first 17 years on Facebook, and was enamored at the possibility of using my past digital footprint as a tool for self-reflection and personal growth. This led me to start working on Archivist, a tool for using local AI to query my past messages, and building a browser extension to capture the most recent two weeks of my history into a text file that I can use for other AI analysis.

Data Introspection

When I started to talk about what I was doing, the term ‘data introspection’ emerged. ‘Data‘ as a term is a highly quantified one, while ‘introspection‘ evokes therapy, coaching, or meditation – practices that are decidedly not quantified – but I relish the bridging of our technological architectures with the fluidity that exists within our minds.

As a use case, it’s hard to articulate exactly why tools like Archivist will become so vital in our data-heavy ecosystem. With generative AI, it’s never been easier to create digital information in different forms, so why bother taking the time to pull back what is legally ours from the platforms that we’ve given that information to?

In a world where platforms optimize for attention and extraction, our personal data becomes a commodity rather than a site of reflection. How can we unlearn the default surveillance paradigms of big tech and reclaim our data through the development of non-extractive tools that nurture agency, memory, and meaning?

Mainstream platforms often present identity as a fixed profile and flatten the emotional experience of revisiting artifacts from our past, but we can unlearn surveillance culture by recognizing the act of remembering as an emotional one. With this lens, we can shift the design paradigms from extractive applications to tools that support multiplicity, neurodivergence, and introspection.

An excerpt from a MozFest 2025 proposal that I’ve submitted – keep your fingers crossed for me that it’s accepted!

Data introspection is a vital part of the quantified self, but beyond the basics of personal informatics, it can become a form of self-expression and personal advocacy. Analyzing my past messages has helped me reconnect with people, and to get back in touch with things that my past self loved. I get to observe the awkwardness of my teenage self, but also the joy and enthusiasm that she had for the world. Once I pull in my Twitter and LinkedIn archives, I’ll have a trove of information that shaped me throughout my life, to poke and prod and turn into art.

It is too easy to forget that AI exists in service of humanity. Reclaiming our personal data allows us to transcend our digital footprints beyond advertising and influence, and return it to a place of self-reflection and connection.