Developers building apps on ATproto, the technology behind social media apps like Bluesky and Leaflet, are promising us a future where we own our content and aren’t at the whims of a tiny handful of companies. Software engineer Zeu Capua talks to host Jason Lengstorf about the promise of social media where we can actually own our own data.
Developers building apps on ATproto, the technology behind social media apps like Bluesky and Leaflet, are promising us a future where we own our content and aren’t at the whims of a tiny handful of companies. Software engineer Zeu Capua talks to host Jason Lengstorf about the promise of social media where we can actually own our own data.
Read the transcript
Captions provided by White Coat Captioning (https://whitecoatcaptioning.com/). Communication Access Realtime Translation (CART) is provided in order to facilitate communication accessibility and may not be a totally verbatim record of the proceedings.
JASON: Hello, friends. Welcome to another episode of the WebDev podcast. Today on the show, we're going to dig into one of the pockets of building for the web that I think has more excitement than I've seen in quite a while, which is this idea of building with the AT Proto. This is behind BlueSky. It's an interesting time to be on the web. We're seeing this push toward let AI write the code. We don't need websites. I'm seeing that as almost like a drain on morale. I'm not seeing people excited to build. Then there's this pocket. There's this magical pocket happening where people are building on AT Proto, and I'm seeing energy again. I'm so excited to talk about it. Right at the heart of this is this really active community, and one of the most active community members out there is Zeu Capua, who I've brought to the show today. How are you?
ZEU: I'm doing pretty good. How about you?
JASON: I'm doing great. I'm so excited to have you on the show. I've been kind of following your work as you're building more and more stuff with AT Proto. You're one of my kind of main sources of information on what's going on in the community. So I'm thrilled that you took some time to join us today. So for folks who aren't following along, can you maybe start out by giving us a bit of background on who you are, what you do?
ZEU: Yeah, I'm Zeu. You can find me online at zeu.dev. That's my website as well as my handle in the atmosphere, I guess is what we're calling it. That's the network around the AT Protocol. I'm a software engineer, mainly talking about web, websites, and all that. So, I'm very excited.
JASON: I am very excited about this. I've talked about it a couple times, but I feel like throughout my career ‐‐ I've been doing this for a long time, over 20 years into the web ‐‐ I have seen these moment where is it feels like everybody is ‐‐ like, there's just energy. There's something happening. There's high potential. People are like moving, right. The last time that I remember this really happening was in the beginning of the framework wars, like around React and Angular and all this stuff when it was just sort of exploding. Everybody was building a React framework or some new JavaScript meta framework. There was so much energy around that, right. We kind of saw the same thing when like the Gatsby era and the Next and Svelte. But then it sort of tapered off, where a lot of the excitement doesn't seem to be about building. It's more like AI stuff doesn't feel like it's about building. It sort of feels like something different.
But AT Proto feels pretty familiar in that sense of there's art here. There's excitement. There's a lot of passionate community members out there really building cool stuff. So can you talk to me a little bit about what drew you to it? Why do you feel like this is the right place for you to be putting your energy right now?
ZEU: Yeah, so as you said, like right now in terms of tech in general, there's a lot of general cynicism around big tech, this big AI push, and in particular for more non‐technical dev people, the most visible thing that really affects people's lives is walled gardens. So that's your Facebooks, your Instagrams, your YouTubes, Twitter now, right. People have different ideas in terms of what moderation looks like, what policy looks like, and owning your data. So for example, I saw a while back that somebody on GitHub got accidentally banned just by accident. That meant that their entire code base, like all their repos spanning a decade, just gone. Now, granted, they had enough of an audience, enough people made backlash on Twitter that the support people were like, hey, that was a mistake. We're going to reinstate you. But it was kind of a wake‐up call to a lot of developers specifically. That's like owning, you know, like hosting your data in these platforms, it's not safe, especially when a lot of our lives, especially as web developers, a lot of our lives are online. So AT Protocol, it's kind of a shift in that dynamic.
JASON: Yeah.
ZEU: So AT Protocol, it's a protocol build on top of web standards like HTTP and all that. It basically allows people to own their data while still being connected to others. That's like the really special thing. We've had self‐hosted stuff before, right. Like for example, just going with the GitHub example, something like GitLab. Somebody can self‐host at GitLab ‐‐ and he's gone ‐‐
JASON: I'm still here.
ZEU: Yeah, you're still there.
JASON: I don't know what's going on with my camera. It's tweaking out. We're going to just go MacBook Air camera today. We're just going to live with that.
ZEU: Yeah, so going with the GitHub example, maybe they don't want to have Microsoft host their code anymore. Now we're going to go and self‐host GitLab. Now that repo or the users on that GitLab, they're not connected to other GitLabs. You know, now everything is kind of messed up, right. You don't have that social aspect of GitHub anymore. Something like AT Protocol kind of bridges that gap, where you can still self‐host and own your data and your own servers, but the network allows you to connect with other people. So you can still have your same social graphs, no matter where your data is stored.
JASON: I think you just kind of touched on the part that, like, it was hard for me to initially get my head around, so I want to spend a minute talking about this. It was the spark that, like ‐‐ I was at GitHub Universe this year. I ended up cornering Paul in a bar and just like firehosing ideas at him. I was like, oh, my god, then you could do this, then you could do this. I think he was like yes, I know. But I was having my realization. But the thing I find really exciting about this is we've talked about decentralization forever. You know, there's this idea of own your own website. So you build your own website, and then it's just kind of there. We've got a lot of like partial solutions, but they never felt complete. There is the ‐‐ I can't remember what it's actually called. Indie Pub or something like that, where you could sort of use this relay to get comments from one play to another place so you could sort
of have websites be connected, but it relied on a lot of services that were kind of odd to implement. It was hard to verify. There was things like Activity Pub, which is like ‐‐ the Mastodon stuff is so close, but it feels like you're still operating in these silos where they're connected until the silo owner decides to not connect them. Then everything is broken again, and you can't migrate between them. It's very strange. There was stuff in crypt o that was almost there. There was interesting stuff with the fed rated identity, but that put you in this weird spot because not everything is on crypto. The thing that felt magic to me about the AT Proto, I have a website, and no matter where you build, the AT Proto stuff isn't what you build on. It's ‐‐ I mean, actually, let's maybe clarify this. How is it different? Why am I excited about this when things like Activity Pub and other solutions didn't capture my imagination this way? Do you have any kind of differentiation?
ZEU: Yeah, so they do go on this in the ‐‐ appproto.com, the doc site, they have a page called the AT Proto ethos, which is a written version of a talk given by one of the engineers. Basically, the idea is that the differentiator with AT Proto is it's just built on web standards, so there's not a lot of crazy things happening. You just have servers that manage to do things. You have web sockets to query the entire network. You have things called app views, which are just APIs. So you have like a firehose that's just getting every event. You have something called a relay on top of that, which is another jet stream ‐‐ like another stream you can connect to. From there, you could collect it into an API called an app view, where you can filter through stuff. Maybe I only want tangled. That's the git social media website. Maybe I only want to ‐‐ I only care about Tangled repos. I only care about the dev side of the network. I'm only going to query that. I don't care about the rest of them. Then from there, you could just use that as a general query, like for your app. That, or you could connect directly to a user's PDS, that's a personal data server. That's where they actually host data. So if you don't want to go through all that stuff, if I know for a fact that Jason's post is on a specific server, I can just go and type it like a URL. Then it'll give it to me. And it's that open. It's that free. There's no gatekeeping really. Not at the moment, anyway. Permission data is coming soon, but that's a whole other debacle.
JASON: Right, right.
ZEU: Yeah, so it's just ‐‐ what do you call it? Activity Pub, the way I understand it, it's kind of like that GitLab thing, as you said, where it's like a server owner, not own, but hosts users. Then they have to personally go ahead and connect to other servers, right. The analogy I was given was they are emailing each other, and that requires whole server bits, right. So 10,000 servers connecting to like a one‐user server, when you as a person can just go in, kind of like a website. Just type it in. It's like, wow, it served it to me on its own.
JASON: And the thing that I think is interesting about this is that it's very emergent behavior. It's not one centralized group saying this is the protocol and this is how it works. It's sort of like here's the ‐‐ like, I know BlueSky was maybe the ‐‐ definitely the biggest implementation, if not the first. The structure of it was like, okay, so there's a BlueSky lexicon and a post looks like this. It's got these fields on it and all that kind of stuff. But there was never a rule that said you couldn't do other lexicons. The first ones I remember seeing were Tangled and Leaflet,
where they just sort of invented their own lexicons. You name space it and say, okay, this is what Leaflet uses to publish a blog post. Or this is what Tangled uses. You can sort of put whatever you want in there. As the consumer of the firehose, which is a generic data stream, like you said, you get to choose what you want. But you can choose anything. On my website, if I'm listening to the firehose, I can pull in what's going on, on Bluesky, or I could filter to just when people talk about my website or filter to Tangled stuff that's about a topic I'm interested in. Like if I'm writing about JavaScript, I could filter the Tangled feed for just JavaScript. Suddenly, it's ‐‐ and if I want to do something, I can publish my own lexicon name spaced to my own website and just put it out there in the world and say hey, this is what I got. If anybody wants to use it, they can. I find that fascinating because I'm not restricted to, like, what can I do with RSS as it's very strictly defined. I'm not restricted to what can I do with the Twitter API or what can I do ‐‐ you know what I mean? It seems like all these other approaches have sort of said, well, you can use this data, but you can't modify it. You can't add extra stuff. It just feels like the AT Proto ethos is do whatever you want as long as it follows the lexicon structure. We're not going to ‐‐ like, you're not going to be able to publish to Bluesky with my custom lexicon, but I can make it, and somebody else can subscribe to it if they want to.
ZEU: Yeah, exactly. I mean, the lexicon thing is an amazing piece of the infrastructure because as a community, everyone gets to decide, okay, this is the most useful lexicon for this data type, right. Mutual dame made a flushes.app, which is an app where you make status updates while you're in the toilet. So it's like, I'm pooping right now. Had a solid taco earlier. It was pretty great. I'm reading right now. It's pretty good. Right? (Laughter) You could have that. Leaflet, you're talking about blogs. Not only did Leaflet do that, but Leaflet actually teamed up with two other platforms on the AT Protocol, and they all came together like, hey, we're all making blogs. We're kind of like targeting different users, but we're all making blogs. So what if we make a standardized lexicon where all three of us platforms use it, and that's standard.site. So now, no matter what platform you use, whether it's Leaflet, PCKT, or Offprint ‐‐ in the future, I believe they're still on waitlist ‐‐ or even if you have an Astro site, there's currently a plugin that automatically turns your blogs into standard.site lexicons or records. There's also a CLI called Sequoia that does the same thing. So you don't need a platform if you already have an Astro blog. You already have all that markdown. You can just immediately transfer it. Now it's in a broader network. Now everyone in Leaflet can find you without even being on Leaflet.
JASON: And that, I think, is the coolest thing. I was thinking about use cases for what with I want to do with this tech, right. One of the things I was thinking about is we have CodeTV. On CodeTV, we have user profiles. You log in right now with O‐auth or use email password if you want. If we add an AT Proto login, we could start automatically pulling in people's publications. So if your blog is on Leaflet or PCKT or you're self‐publishing through standard.site, or whatever it is, we can pull in details about what you're writing right there on the CodeTV
site. We could pull in some of your Bluesky posts. We could pull in Tangled commits, if you're using Tangled. That's a thing we can do now. The CodeTV website has this ability to suddenly become almost like a social landing page for other people that's curated around the types of lexicons that are relevant to my best guess at what the CodeTV audience is. I don't have to build that integration. I just have to listen to the lexicons. To me, that's a really fascinating thing. I'm no longer scraping a bunch of APIs. I'm just saying turn on the firehose, filter for the CodeTV users, for these lexicon, and show that on the appropriate page. Holy shit, that's cool. Just unbelievably cool that's even an option. Imagine trying to do that with any other platform. It's a very custom integration as opposed to saying well, now that I can consume AT Proto, any lexicon is just a matter of being able to display the data, not necessarily building an entirely new API thing or getting new developer keys or whatever it is. It's one thing. Actually, that brings up a good question. How do API keys work on AT Proto?
ZEU: So right now, all the data is public. So as long as you know how to query the data, you can get any public data available.
JASON: So you can just straight up tap into the feed, and you're not letting, like, some centralized force decide whether or not you can access it. You literally have access to I think it's a web socket feed, where you're just getting realtime events and they dump through your server as fast as you can query them.
ZEU: Yeah, if you want to make your own app view, which is the fancy word for API, you can literally just have your own firehose, your own jet stream, filter through whatever you want, throw it in your database, and then query it like with GraphQL or something, and you're good to go. You don't need a third‐party app view. Now, sometimes you want a third‐party app view. For instance, if you're making a third‐party Bluesky client, you don't want to do moderation for yourself. Like if you're one solo dev. So maybe you use the Bluesky app view to use their moderation. Blacksky is a full alternative to Bluesky. They have their own moderation as well. So if you have any qualms with Bluesky, use Blacksky.
JASON: There was some controversy around that because there were people that got banned on Bluesky, but they're not banned on Blacksky, right?
ZEU: Exactly, yeah. So ‐‐
JASON: Can we talk about how freaking cool that is for a second? You don't ‐‐ you're not at the whims of, like, well, I've lost my account forever because some mod doesn't like you. You're straight up ‐‐ like, you can be on a bunch of sites at once. Obviously, if you're violating the terms of service of a site, you should get banned. But maybe you're not violating the terms on this other site. So you don't lose your history. I think that's really, really cool.
ZEU: Yeah, everyone just got the one identity. So if one platform says, hey, we don't like you, they'll throw a moderation label on you that says banned or suspended. But if another app view has their own moderation labels and it doesn't say that, you show up completely fine. Again, if you don't want to go through app views at all and know that person exactly and where their server is located, you can query straight to them, like a normal server and get the data directly from them. You don't need a moderator in the middle at all.
JASON: Yeah, so there's a couple questions in the chat I think are good to talk about. The first one is watching the entire network
all the time seems a little wild, and doesn't that put you at high risk of spam? How do you manage ‐‐ like, it does seem a little wild to have the firehose turned on at all times. So what is the approach right now for dealing with that?
ZEU: Yeah, so there are way smarter people about distribution and stuff, but my current idea, or my conception of it, is that you can turn off the firehose and just do back fills. So let's say, you know, I have the firehose on, maybe you have a spam detector of some sort. You could close it. Maybe after an indeterminant amount of time, after all that was taken out, or maybe you have this person is a spam, all their records, let's ignore that.
JASON: And that's moderation. That's you deciding to moderate your own view.
ZEU: Exactly. And from there, you could just backfill. So whatever you missed while your firehose was closed, it'll just comb through all that.
JASON: And a backfill, to restate that and make sure I understand it, is you're not watching the firehose, but you say my last ‐‐ like the last time I looked at the firehose was this time stamp or this unique ID. Give me everything that matches my filters between that timestamp and now, and it allows you to populate your database at whatever interval makes sense. So that could be ‐‐ I assume you could run it continuously, or you could run it once a day or once a month or whatever was important based on how much activity you're expecting on your given lexicons.
ZEU: Yeah, exactly. Like right now, I'm running a self‐hosted service called quick slice, which is like a self‐hosted app view thing. It has a jet stream, a firehose on there ‐‐ what do you call it? At some point in time, it broke. Obviously, I missed like two days worth of publications and blogs. I'm like, oh, I obviously need those. People are obviously writing. So I just pressed the get backfill button, and it combed through, did some validation as it was combing through the data, and then after a while, like, it was back up, ready to go.
JASON: That's extremely cool. There's a lot of discussion I've been following about how to manage if you are doing your own app view or trying to display data. It's not necessarily like, hey, pull up the firehose and run a backfill to the beginning of time every time you want to display data. You're kind of expected to store the things you need, like kind of build your own database so you can query quickly and index things and make it useful for your application.
ZEU: Yes, yes. That is the idea. If you are running your own app view, you are expected to pretty much just focus on your slice of the network. That way, you know, you're not just getting everyone talking every time.
JASON: Yeah. And then there was one more question that I thought was a good one. It got answered in the questions ‐‐ or in the comment, but I want to make sure we address it out loud. So for a blog with comments, if you're using AT Proto, everyone owns their own comment, right? This isn't populating a database on my website. This is ‐‐ how does it work?
ZEU: Yeah, so, you would do that with references, like strong references. So kind of like how on a web page, you have links. Same thing. You have back links. For example, I'm just going to use Bluesky because it has replies already. The Bluesky post, I made that. You come in, and you want to reply to my post. That reply post is on your server. It's just referencing my parent post. So you own the comment. That comment there is yours forever. I cannot touch that comment at all. But if an app was to query my post like, hey, I want to look at every reply referencing this post,
replying to this specific post, it'll go through the network and find yours. It's like, oh, my god, it's talking about this post by Zeu, and it's going to show up.
JASON: And I guess to repeat this back and just make sure I'm understanding it, what we're saying is each of us has a PDS, or personal data server. By default, I think most of us are using Bluesky service for this because that's what we signed up with. We've got the handles. Most of us aren't setting up our own PDS but we could. I've seen alternative PDS services pop up. But the general idea is ‐‐ so, I have created my account. I get my ID, a DID. My DID represents my collection of stuff. Inside of my PDS, referenced by my DID, anything that I do is like a lexicon entry. So it would be like, okay, app.bsky.post or whatever it is. It's in reply to this unique ID that references a post on your PDS. But then it has all the lexicon data to display a Bluesky post. I'm basically creating a database of my actions and any of the references I make might be to other people's PDSs, but the firehose knows anyone who's using AT Proto, it's consuming all of those. So that's maybe the one thing that's the hardest for me to get my head around. That's kind of the centralized point right now. To get into the firehose, you have to make sure the PDS you're using is recognized by the firehose as a valid PDS. So there's kind of like an initial getting accepted into the protocol, right?
ZEU: There's no accept, per se.
JASON: But I couldn't go stand up a local host and say, okay, now I'm a PDS, and I'm just going to put whatever I want into the jet stream and it would show up. It has to know to look at your PDS. Right?
ZEU: I believe ‐‐ so, okay. Throwing stuff into the dark here. I believe the way jet streams and stuff find identities or people to look at is via the PLC, which is a public ledger of credentials. I believe that's the acronym. You can basically think of it as a blockchain that has all of the identities available and lists every change made. So if you were to go to the PLC right now and look for my account, you could see that it started off at zeu.bsky.social. It was hosted on Bluesky. Eventually, I moved to another server. If other AT Proto people are in chat, please give more context.
JASON: There's a lot of comments happening here, but yeah.
ZEU: If you were to just rent a $5 VPS online or a raspberry pie and run the PDS admin CLI that stands up the PDS itself, once you make an account there, it's automatically into the network. You could just automatically make stuff. There's no central command that's like this PDS is a not.
JASON: Got it. But the app view can decide this PDS is a new, but the jet stream isn't worried about that. It's just saying this is a valid PDS so I'm going to consume it.
ZEU: Yes, yes.
JASON: Okay. Yeah, okay. Somebody is talking about a PDS on raspberry pie. Something is talked about Germ DM. Is there an answer to that? Is there private data yet in seems like there couldn't be because of the way the protocol works.
ZEU: Yeah, so right now, no, there's no private data on the protocol. There are musings. People have been talking about private data for the past year or two. A Bluesky engineer who works on the protocol just released a blog post talking about at least their reasons ‐‐ or not reasons but their thinking as to what they're thinking about as they look through permissioned data. That's what we're talking about here.
There are some musings about having a PDS and then having a secondary PDS and then having the app view scoped so that only authenticated app views can get the data. Right now it's all up in the air. Nothing is concrete, but there's a lot of talk about it. It is definitely one of the main goals for the protocol. As soon as that goes live, it's going to go haywire.
JASON: Got it. And a comment in the chat is saying there isn't "a" jet stream, it's a service one can run. Let me issue a correction there. I've been talking about it like it's a magical centralized thing. It's just a part of the stack, and you have to host one, which is also cool because that does mean, like you've been saying, there's not one person who can arbitrarily say I'm done with all this. It's over. The tech ‐‐ like, I guess you could cause a massive disruption if everybody was relying on one particular jetstream service and they shut it down. But a temporary disruption because you would be able to switch over to a new jetstream and everything would resume, which is pretty freaking cool. Okay. So I could literally talk about this all day, but I don't want to keep you too long. I also want to get into the companion episode, which we're going to do a Learn With Jason and actually build something. So let's maybe like dig into this a little bit and make it happen. I want to first, before I completely wrap us here, let's talk about where should people go if they want to learn more about this? What are the right resources if somebody wants to just get their head around this to begin with?
ZEU: Yeah, so the docs are a great place to start. Atproto.com is the official Protocol website. It has a lot of musing, tutorials, guides, and specs as to the infrastructure. If you want a more detailed but more of a introduction, I would recommend looking at a bunch of blogs by Dan Abramov, the React guy. His blog, overreacted.io, has a few posts that are really great introductions into the protocol. That would be where it's at, a social file system, and open social. Those three blogs are really great introductions for developers to understand.
JASON: I just linked ‐‐ we'll put a link to the blog in the description. If you're on the CodeTV Discord, you'll see it in the show links. Zeu, this has been amazing. Where should people go if they want to learn more about you?
ZEU: Yeah, if you want to follow me, you can find my website at zeu.dev. I also stream live on Twitch, twitch.TV/zeu_dev. Also at stream.place/zeu.dev. Of course, I talk about everything 24/7 on Bluesky. That'll be bsky.app/profile/zeu.dev.
JASON: Amazing. Zeu, thank you so much. Don't forget, we're doing a companion episode to this podcast that's going to be on Learn With Jason, where we're going to actually build something with AT Proto. So go find that on CodeTV.dev. You can find the web dev podcast on your favorite podcasting platform or head to codeTV.dev for the links or watch the replay. Zeu, thank you so much. We will catch you all next time.