Transcript: ‘Do 60-minute Coding Tasks in 60 Seconds—With AI’
‘AI & I’ with Val Town’s Steve Krouse
December 4, 2024
The transcript of AI & I with Steve Krouse is below. Watch on X or YouTube, or listen on Spotify or Apple Podcasts.
Timestamps
- Introduction: 00:00:55
- How programming changes the way you think: 00:03:24
- Building an app in less than 60 seconds: 00:11:22
- How Val Town’s AI assistant works: 00:17:19
- Steve’s contrarian take on the non-technical AI programmer: 00:23:05
- The nuances of building software that isn’t deterministic: 00:33:38
- How to design systems that can capitalize on the next leap in AI: 00:39:05
- What gives Val Town a competitive edge in a crowded market: 00:40:47
- The power of small, dense engineering teams: 00:47:34
- How Steve is positioning Val Town in a strategic niche: 00:52:26
Transcript
Dan Shipper (00:00:55)
Steve, welcome to the show.
Steve Krouse (00:00:56)
Thanks so much for having me, Dan. This is a dream to be on your podcast.
Dan Shipper (00:01:03)
So, for people who don't know, Steve and I have been friends for many years at this point. We met in college. Steve is the cofounder and CEO of Val Town and Steve will be able to describe Val Town better than I can, but it's sort of a social coding site. Steve, how do you describe Val Town?
Steve Krouse (00:01:23)
Yeah, it’s I think harder than most companies because it can do so many different things. The poem we have on our website, I think, holds up remarkably well, if you're a programmer. It’s “If GitHub Gists could run and AWS Lambda were fun.”
Dan Shipper (00:01:41)
Exactly. So, I mean, basically, it lets you code in your browser and share your code snippets and build and compose code snippets together. And you recently released Townie, which is a very cool AI coding agent, and also full disclosure, I'm a small investor in Val Town. I think it's awesome. You’re doing really, really amazing stuff and I will also say you are actually one of the biggest reasons I started this podcast. Because you were Dan, you write all the time, but I don't have time for that. I want to just listen to what you have to say. And you had sort of a vision for what this could be. And so I really appreciate you pushing me to do it because it's been super fun. And also just an honor to have you on the show.
Steve Krouse (00:02:31)
Thank you. Thanks for doing what I want. I think it's important for your audience to know that I started out as an audience member of yours in the first place—that's how we met. I stumbled upon Dan's blog when I was a freshman in college and he was a junior at Penn and I reached out and— I should have found that email because we probably still have it, the very cute like I'm a little freshman, please take pity on me and mentor me. I want to be like you one day. So, it's wonderful that I've found myself in the content that I was an audience member of.
Dan Shipper (00:03:06)
It is wonderful to have you here. I think one thing that people should know about you is Val Town is sort of the latest iteration of something that you've been obsessed with for a really long time, which is basically programming and programming languages and how programming languages change how you think and all that kind of stuff. Do you want to give people just a little bit of an overview of that part of your brain and where Val Town came from and why it's important to you.
Steve Krouse (00:03:39)
That'd be great. Yeah, it is a longer history, I think, than most startups. And there are a lot of startups, I think, that look like Val Town today, but hopefully we have some deeper historical perspective because of my background. So I think where I'll start the story is: I went to this kind of life-changing afterschool program that taught me how to code when I was in middle school and through that program, I fell in love with mathematics and fell in love with programming and overall, just felt like a smarter human. And I knew it had something to do with this coding program. And in college, I went to a hackathon, which was another whole eye-opening kind of experience. But one of the things that happened at that hackathon was that I was pointed at the work of Bret Victor and Seymour Papert and I read Seymour Papert and it turns out that it was all on purpose.
Dan Shipper (00:04:31)
Who is that? I know Bret Victor, but I don't know Seymour Papert.
Steve Krouse (00:04:36)
Yeah. So Seymour Papert was a mathematician and an educational theorist. He was a mathematician who then decided to study with Jean Piaget, like one of the fathers of the stages of developmental educational theories. And Seymour Papert had his question, which is like, why are some kids bad at math, but no kids are bad at French? If you're bad at French, it's because you didn't grow up in France, but we know that if you grew up in France, you'd speak French just fine. Some kids are just genetically bad at math. That didn't make sense to him. His conviction was that those kids, kids who seem like they're bad at math genetically, just didn't grow up in a math land. And kids who are good at math had some sort of math land experience growing up. And his experience was that. He got obsessed with gears growing up and other mathematical objects that were just in his house.
Dan Shipper (00:05:31)
Got it. Okay, cool. So that makes a lot of sense. So continue with the story.
Steve Krouse (00:05:39)
Yeah. So he wanted to see if we could make kids good at math? Can we make a virtual math land on this new computer thing to make kids good at math? And what he came up with was actually a programming language, but it was never to teach kids to code. There were no jobs for coding back in the sixties and seventies. It was, can we trick kids into being good at math through programming? And it totally worked for me. And so ever since I lived it and then read about how it was all on purpose, it totally blew my mind. And I became obsessed with the power of programming languages, programming environments, developer tools for not only the power that they give human beings to create stuff, but the way that they change the way you think and actually make you a smarter human in other aspects of your life. This has been the thread. So that moment when I was 18 years old has followed me for the last 10 years.
Dan Shipper (00:06:25)
How do they change how you think?
Steve Krouse (00:06:27)
Great question. The book Mindstorms kind of goes through this. One of my favorite examples is in Logo geometry. The programming language Seymour Papert made is called Logo. It’s the predecessor to Scratch. So if you've ever used Scratch, it was made by his student and there are a lot of similarities. And in both programming languages, the model is that you have a character on the screen, and you can give instructions to move forward or turn and move and turn and move. And then, in particular, with the Logo turtle, you would have a pen on your feet. So you could say, put the pen down, and then give instructions on how to move, and pick the pen up, and then there'd be a shape on the screen.
So an example of how I learned mathematical concepts— I grew up in math land through this. How do you draw a square? How do you draw a hexagon? How do you draw a circle? And you build up, as you kind of intuit how to do these things, you build up an intuition for what later in life you'll learn as a derivative in mathematics. So, I don't know, five years after I was drawing circles with my feet mentally you learn to anthropomorphize the problem. It's a technique you can use in a lot of problems, putting yourself into the code, imagine yourself. Programmers do that all the time. So you anthropomorphize the code. In this specific example, it's very geometric. So then a couple of years later, when I was learning derivatives, I was, oh, the derivative is just the way you're pointing. If you like walking along the curve, that's very easy. I've walked on curves for a long time. And I remember in the class, I was tutoring kids after that class and everyone was like, oh my god, Steve's a genius. He got it so fast. I was helping the teacher on the board, but I knew I wasn't born this way. I just had these very powerful experiences that nobody else in the class had.
Dan Shipper (00:08:12)
That's really interesting. So let me make sure I understand the example you just gave. So, basically, if you're programming in this environment, and you're trying to make a circle, the way that you end up having to do that is by breaking the circle into small little segments that are the angle of the segment changes, just a little bit each time. So you end up making a circle because circles aren't actually round; they're made of infinitely small segments and that's what then becomes a derivative. And so the concept of derivative was intuitive to you because of that.
Steve Krouse (00:08:49)
Yes, yes. It was so easy for me to put myself on the paper—like, my feet are on the paper and I'm walking along the curves.
Dan Shipper (00:08:55)
That makes sense. I mean, I think it's so interesting. This is a whole rabbit hole we can go down. But one of the really interesting things about the history of math is, in the 19th century, we started to get formalism in math where you're using all the equations and stuff that get people intimidated. If you read Newton or Galileo, they're interspersing geometry with actual written text. It's where the sort of renaissance man ideas come from. There’s a lot more of an integration between all the different senses and ways of thinking to help you understand math. And it was only later on that we sort of formalized it, which is helpful because formalism is really powerful, but you end up losing a lot of the intuition unless you're a super math genius or you're lucky enough to use a programming environment like this. And I think that's what makes it much harder and less appealing for people, which is interesting. I will also say, as someone who taught myself how to code, I'm still terrible at math. I think there's something that's possible I haven't learned. This shows up in other areas of my life too. There's something for me about just sequential processing. Processing sequences where each step of the sequence depends on the previous step. I'm very likely to a.) it costs me a lot to do that and b.) I'm very likely to skip a step or get them in the wrong order or do all that kind of stuff. So any math that I can do, that's more intuitive where I'm almost sizing things in my head as opposed to following a sequence of steps that I can do, but the sequence thing really messes me up. And that's why I kind of like the AI stuff because I kind of get things intuitively, so I kind of know what I want to do. And the AI stuff just figures out how to write the code, or do things step-by-step so I don't have to. It fills in that part of my brain.
Steve Krouse (00:11:10)
Yeah. Totally. It makes a lot of sense.
Dan Shipper (00:11:11)
So I know you’re thinking a lot about the future programming—that's kind of what you're building at Val Town. And I think a really interesting intro to this episode is, about a year ago when this podcast was first starting, I had this guy, Geoffrey Litt on, who I know is a friend of yours and also someone that you really look up to and respect. And the cool thing about that episode with him is we live-coded an app together and using ChatGPT and Replit. And, at the time that was pretty new and pretty cool and it was, to me, wild that you could make an app on a podcast just while talking, that was previously totally impossible. But it took us about an hour to actually do it. And there were false starts and whatever. So we got it done, but it was not super easy. And I think it would be fun to see how far we've progressed. That was your prompt for me in this episode, to see the difference between, last year with Geoffrey Litt about a year ago and where we are now is a benchmark of AI progress. And I think you're going to use Townie, which is Val Town's chatbot product, to do this. Do you want to introduce it to people?
Steve Krouse (00:12:36)
Yeah, sounds great. So yeah, Val Town is a website to write code and share code and deploy code, very importantly. And one way to think about Townie is like, it's taking what Dan and Geoffrey did a year ago and just automating the steps. We were seeing people doing this with Val Town for the last year. They would go to an LLM and ask for some code and paste it and then get some error and paste it back. And it was a crazy thing. And it seems like over the last year, we've all agreed that that's crazy and that they should be deeply integrated into a single product. And that's what Townie is. We've worked through a number of different iterations of how Townie works. The current iteration I think is most similar to Anthropic Artifacts if your audience is familiar with that. I think maybe let's start with just the Val Town homepage, so people can get a sense for what the product is.
I read everyone the poem, but I think I could start with— The description I give people when I meet them on the street, and they say what do you do? I say that I make a website for programmers to make websites. I think that kind of anchors it really well, that it's a tool really designed for programmers. And it's designed for you to make web things, particularly serve-side, where you can do front-end and full-stack stuff. And I think one of the most important distinctions is that it's in the browser, you write the code, and then it also scales with you to deploy and scale. So you don't have to then redeploy it somewhere else. So it's all integrated and drastically simplifies the programming experience.
So this is a Townie, where we can describe in English what we want to build. I'm going to paste in and start running the app that you and Geoffrey made and, as it's coding, I'll describe what's going on. So, we're making an app for this very podcast we're on. The idea from your episode with Geoffrey was that it's kind of throwaway software, disposable software. It's just so cheap to make software that we can make an app for this podcast and then never use it again. So there are three parts. One's a timer. Another one is for notes. And then the last one is like a ChatGPT integration, where it'll read the notes and enjoy more questions.
Dan Shipper (00:14:48)
And I'm laughing because it's. It took you until you finished the app before you were done describing what the app was to me. That's actually super cool. I love this. Man, I haven't had one of those real AI wow moments in a while. I feel like I'm getting a little jaded, but that was definitely a wow. Because we made this app with Geoffrey a year ago, it took us about an hour. and this and there was a lot of copy-pasting back and forth or whatever. And you just made it by just pasting in a description of what we did in the episode. And it's just here and it works. Wait, can you make sure all the stuff works? Does the generate questions button work?
Steve Krouse (00:15:31)
Yeah. So I'm Dan Shipper talking to Steve Krouse, founder of Val Town fast platform for whatever— Generate questions.
Dan Shipper (00:15:48)
So we have this whole podcast set up. You typed into the interview notes, I'm Dan Shipper talking to Steve Krouse, founder of Val Town. And then you press generate question and then it generates— It used, GPT-4, I assume, some AI model to generate questions. So it says like here are three questions to ask as a founder of Val Town, what was the original vision? So they’re actually reasonable questions. It's not totally off. And what's really interesting about this is, so obviously we just saw the whole thing, just made with a single prompt. But what's interesting is this requires a client and a server to be working together. Because in order to have the generate questions button work, it's sending it from the client to the server, the server sending it to OpenAI and then OpenAI sending it back. And that's hard. There's some complexity here. So I don't know. It’s cool.
Steve Krouse (00:16:52)
Thank you. Thank you. Yeah, I can kind of explain any part of it. Maybe you direct me what you're curious about, or I could just go piece by piece.
Dan Shipper (00:17:03)
Well, I guess tell me at a high level how this works and how it is able to fill in all the gaps there? So there's no copy and pasting, am I right? Does it have a client and a server? How does it all work?
Steve Krouse (00:17:22)
Okay, great. So from a high level, the way it works is, we take your prompt and then a big-ass system prompt that explains how Val Town works and we send that off to chat to Claude 3.5 Sonnet, and I think it's very important to underscore how much of an enabler Claude 3.5 Sonnet is. When you and Geoffrey were doing this with, I don't know, ChatGPT 3.5 or 4 or even the current ChatGPT models, we wouldn't get something this good probably, or it would take a lot longer. Claude 3.5 Sonnet is what's changed since when you and Geoffrey did it. A big part of it is it's all integrated in one tool. We spent a lot of time getting a good system prompt and then the most important thing that happened was Claude 3.5 Sonnet.
Dan Shipper (00:18:07)
Right. Right. That makes sense. Oh, let's look at the system prompt. So what are the core components of the system prompt that made it start to work?
Steve Krouse (00:18:13)
Yeah, there were a lot of iterations and if you want if anyone wants a really deep dive, here's how we built the original prototype and it's changed a lot since then. And then we just published a new article about how we're running it in production. So we have a lot of notes for people who want to build their own. But at a high level, how this system prompt works is it explains where the code that we're asking to write is going to live. We tell it that it's running in Deno in JavaScript, not client side JavaScript. We tell it that it doesn't have to worry about starting the server, deploying the server. We give it all the formatting. We have random bugs in our platform, we tell it about the bugs, about things that shouldn't be done because they're bugs in our platform, and then we give it all goodies, all the included platform features. So, for example, how to do OpenAI, which we just saw, we explained that it's available to it and here's how to use it.
Dan Shipper (00:19:17)
That's really interesting. And it's nice to have that stuff because that's what I would normally have to paste into ChatGPT or Claude or whatever. So it has the most updated up-to-date docs. I guess how the usage has been or, what have people been doing? The underlying thing that gets you excited has been how programming changes the way that people think and what, what they can make, what has been the effect of Townie on that for you?
Steve Krouse (00:19:48)
Yeah, it's been fantastic and also a little bit confusing. I didn't expect to have non-programmers be able to use Val Town so soon. That's been a long-term ambition, but there are people who really don't know the first thing about coding and come on to Val Town and make stuff like this in a minute or two, and their mind is blown and they're so excited and that's amazing. And I love that. Recently, now that bitcoin is doing so well, people are making bitcoin price trackers, wallets, things like that. All sorts of analyzers for crypto, and Val Town’s really good at that. It could just plug and play with the APIs. And on the one hand, I'm really excited about it. On the other hand, I'm getting questions that are so bad because they're not programmers—it’s a tool for programmers. I got last night was this, let's just say we remove this and say, save it—
Dan Shipper (00:20:54)
And by this, basically what you did is you went into the code and you removed a semicolon from the JSON object, or I guess it's just CSS.
Steve Krouse (00:21:07)
I removed a quote from the CSS. Yeah, I purposely put a syntax error in the code. A very simple syntax error that like a programmer would have no problem solving. But if you're not a programmer, you're like, this is the worst experience I've ever had in any software ever.
Dan Shipper (00:21:21)
Yeah. I mean what are you going to do with that? It looks so intimidating, like type error, and if you're a programmer, you're like, okay, I can fix that or whatever. I think that makes a lot of sense. That's really interesting. So, what has that been like for you? Because I've watched you over the last 10 years or so be thinking a lot about the ways to create a programming language that would be powerful but also simple enough that people who are non-technical can easily sort of onboard and learn to use it. And it seems like a lot of those language design initiatives are just completely changed by the fact that you can just code with English now. What has that been like for you?
Steve Krouse (00:22:16)
Yeah, it's been very surprising. I didn't expect this would be how things are. On the one hand, you kind of rethink things from scratch. For example, this type error is totally nonsensical to you and me. But, look at this little button here, ask Townie to fix it. And then it just sends it to Townie and Townie immediately figures out the issue. And then the only crappy part about this experience is that it's going to regenerate all the code from scratch. But besides that, the example I gave isn't necessarily a condemning thing. The solution to these problems is often just throwing more LLMs at it.
Dan Shipper (00:23:06)
So I guess, given that you’re making this for programmers and now it seems like you have like a whole wave of like non-programmers starting to use it, that seems like a big question for the business of who do you want to serve? What are you thinking?
Steve Krouse (00:23:22)
I'm really tempted to go after people who are non-programmers because it seems like there's just incredible pent up demand and there's just so many of them and they're not served by any other products. And at the same time, I really like serving the smartest customers and sophisticated customers. And building a pro tool, you could think of the difference between Figma and Canva. Are you going for a tool for everyone? Or are you building a tool for professionals? I think that's kind of where I'm drawing the line. And the advice I gave to the guy who emailed me about this thing right here, the first day he asked the question, I liked to solve the problem for him. And then the second day he asked literally the same exact question. He was like, what's happening? Why do I have this error again? And so my response the second time was, here's how I would solve it. I guess where I'm going is for people, I think there's a whole class of people who are allergic to code. They don't even want to look at it. They don't even want to know it exists—totally abstract away from me. And Val Town is not the product for those people. The product for those people is Claude Artifacts, I think, or GitHub Spark. I think there are products for people who don't want to look at the code, but Val Town is not that product.
Dan Shipper (00:24:40)
Yeah, I think first of all, I think it's really great that you have a sense for the kind of customer that you like to serve. I think that's actually quite rare and even if you have that sense it is often quite hard to allow yourself to be guided by it. But overall, I think you will make a better product and a better company by following that because it sucks to not like your customers or— It's much better to really love your customers and want to interact with them and all that kind of stuff. And so I think that makes sense. It's funny because, while you're answering, my response to this is, and I know very little about your business, so you take this with a grain of salt. But my feeling is, I think that there's this new type of programmer emerging, which I think a lot of people who are already really great programmers, some of them are using AI, but a lot of them aren't, or they're just, I can go faster just doing it myself. And there's this new version of programmers who are really AI-native and programming with AI is almost like a different skill set than programming without it. And right now those people look very unsophisticated, but they will be very, very sophisticated in 10 years. And so, the example I would draw is if you were talking to MrBeast 15 years ago when he first started making YouTube videos, you'd be like, this guy's dumb, he's 12 or whatever. But now he's one of the most sophisticated filmmakers in the world with some of the biggest budgets. And so my feeling about some of the stuff is, if you can reach those people now and grow with them and help kind of instill in them some of the taste that you've built up and experience you built up over many years, I think that there's just a chance to make it a platform that's sort of like YouTube, but it's for programming. It's for building stuff. And those people are not going to be sophisticated programmers. They're just going to be like kids with dreams.
Steve Krouse (00:27:08)
I would like it to be true that I can get the future MrBeast, in your analogy, who wants to put in the work for me, that's the distinction. I think there are people who want to turn off their brain and just use an LLM. Do you know what I'm talking about? I don't want to disparage because I really do believe with you that there's a new kind of AI engineer who's unbelievably powerful. And I love watching those people. And then there's another kind of LLM engineer who would like to ask the LLM to do something and then leave their computer, go do something else and then come back and see an error and just hit the fix it button and then go and leave and do something else and come back and hit the fix it button and they're allergic to the code. And I'm skeptical that those people are going to be giving themselves the right feedback loops to become the next MrBeast.
Dan Shipper (00:27:55)
Maybe. I mean, I'll just speak for myself—sometimes I do that. Because it's one of the things that I think is interesting about these tools. And I've talked about this a couple of times before on the show—pre-now programming requires a lot of focus and attention. And it still does. But what's really interesting about these tools is you actually can make progress even when you have fractured attention. So an example might be, I have kids and I get home from work and I want to work on my side project. But I need to kind of run back and forth between my kids and the thing I'm working on. Or an example is I'm a bit busy. I have a lot of work going on today, but I have this idea I want to make and, it's kind of fun to jump in spend five minutes, fixing a thing and then let it go off and do its thing and come back and that's not to say you should also get rid of focused work. It's just that programming becomes possible in a fractured attention state, which was always possible if you had enough money to hire people. but now everyone can do it, which I think is kind of cool. And I also think, I don't know if we extend the YouTube analogy, to get one MrBeast, you had to have 100 million people uploading just random stuff that they put no thought into. So I think it's kind of interesting. I totally see what you mean. You want to build a platform for people who care. And it also seems like we're at such an early stage of this that in order to get those people, you have to make it so accessible that tons and tons of people can try it, but maybe, again, you've thought about this way more than I have.
Steve Krouse (00:29:57)
Yeah. I think a big variable that we're like, not that we haven't yet talked about is, what's happening to the underlying models on what time scale? So Sonnet 3,5 has made all of this whole conversation possible. And if we don't have another step change from that in the next year or two, then I think maybe what I'm talking about is, people are going to be more limited by human focus and care, but if we have another— If Sonnet 4 is as big as Sonnet 3.5 was, relative, then, what you're talking about makes a lot more sense. Beginners are, or could get, even more leverage. So the rate is going to— But eventually I think we'll get to a point where what you're talking about makes a lot more sense. You’ll just have agents running off in the background. there'll be coming up with ideas. They'll be talking to your customers for you, coming up with ideas to make things better. Who knows?
Dan Shipper (00:30:51)
Well, I guess, what world are you planning for? Are you planning for a world where progress is starting to asymptote a bit, or are you planning for a world where we're going to continue to see the same kind of jumps in progress that we've been seeing?
Steve Krouse (00:31:02)
I would like to be neutral to underlying LLM changes and be prepared to if the LLM is the same, then our tool continues to work. And if the LLM gets better, in theory, we just change the model name and we get the benefits like everybody else did. And I think we saw this happen with a Cursor and websim, when Sonnet 3.5 came out, they were ready and they jumped. And so I think now Val Town is ready for the next model jump, if it happens.
Dan Shipper (00:31:34)
That's really interesting. What have you learned about software development and running software teams in the AI age? Because it's very different from programming before. And there's different sets of ideas and methodologies and it's all different to have squishy software. I was talking to Simon Last about this and I'm curious about your perspective on it. What has changed for you?
Steve Krouse (00:32:00)
For the Val Town team, we’re working with a serious engineering team. And it's just serious software vs. software you make in Val Town. That's like a lot squishier. Right now there's such a huge gulf. When you write code in Val Town, you have to get so much software configured on your computer. And every week or two, you have to make a tweak to that. And so like I am the one person on the team who doesn't have that set up on my— I can't make that change to our production app because it's just too much to keep that local software up to date with the team. So you make your change locally, you get all your databases, everything running locally, you have to do automated testing. There's just so much paperwork. It's really important to keep our service running smoothly. You submit a pull request, you get a review on the pull request. You do feedback, you deploy it. Deploying takes 10 minutes. It's much slower. It's a totally different field than Val Town where every save is a deploy. It happened in 50 milliseconds and you're live. You're off to the races.
Dan Shipper (00:33:09)
I guess what I'm saying is like doing the programming and ensuring a quality product and running tests and all that kind of stuff is significantly different when the code you're running is deterministic vs. stochastic. And I'm curious how that has unfolded for you or what you're learning about, like keeping an LLM app running well in production.
Steve Krouse (00:33:14)
Got it. The word there is evals. That’s how we as an industry have gotten some amount of predictability or understandability about how LLMs are performing. We didn't have real evals for the first 3–4 months of Townie and we would just YOLO changes and, we would make a change, we’d test out ourselves a bunch, we’d deploy it to production and then like it ended up, it totally seemed better. And then, we’d get anecdotal reports that it's worse and we wouldn't really know. And then we would YOLO another change. And it was, these things I feel like it really brings you face to face in contact with what a truly stochastic thing is like, because they're given multiple times when I go to Twitter and say our users are all reporting X and, is that, does anyone else see X? Has the model gotten stupider? Has it all gotten smarter? Is the model doing this for other people? Or is it just Val Town? I don't know if you saw this on Twitter, a couple of months ago, there was this weird panic where everyone was like, Claude 3.5 Sonnet got dumber this weekend. And everyone was trying to figure it out. And I think at the end of it all, it was just a panic that people like me helped contribute to. We were, we all just try to, we all just kind of convinced each other was super smart and then had a couple of bad interactions with it. And then there was a panic that it was dumb. It's really hard to reason about these things. And that's why I feel like evals are important for your mental health.
Dan Shipper (00:35:19)
Yeah, that's really interesting. I hadn't even thought about the social contagion aspect of the perception of quality. That's wild. I guess you can draw analogies to the stock market. But your product quality being like the stock market is a very new thing.
Steve Krouse (00:35:40)
Very new. And yeah, but evals let you point to something and just feel good about yourself because customers come to me all the time and they complain about a specific thing. And then I can look at the evals and see if I could find what they're talking about. And if I can't, I can say, I'm sorry. Try again one more time. It'll probably work—with confidence.
Dan Shipper (00:36:10)
Yeah. I guess then that sort of runs into the problem of, then you can only, you only take seriously things that you can measure or changes that you can measure. What do you think about that? You're starting to only see things where there is an eval for it.
Steve Krouse (00:36:27)
I think there's a lot we miss. It's a very squishy product and I think customers or users are constantly having less-than-optimal experiences and hopefully the tool allows for error correction. All the time, I'll talk to Jackson on our team who does the eval stuff, and he's like, yeah, we don't even have an eval for that. When Claude 3.5 Sonnet showed up, we were so excited to run evals on it. And the evals weren't any different because we didn't have any evals that were hard enough—it would show up that it got smarter.
Dan Shipper (00:37:00)
Yeah. I definitely vibe with that. I think like, For any product that we've made that I've been the one to start, there are no evals and we just sort of YOLOed it for a while.
Steve Krouse (00:37:18)
The official term is YOLOing it.
Dan Shipper (00:37:20)
I think that's great. I want to make YOLOing happen. And I think we've just now started to add evals to Spiral and we have a couple of internal incubations that are not released yet that definitely have evals. And one thing that's been interesting for us is with Spiral, for example, when you like to change the model, we also use the new model to change the default prompt that we use to generate new things. But then we didn't want to go and update the prompt for everyone. So we just have a thing that's like, do you want to upgrade the Spiral to the new version of the prompt and people, for us, upgrade, which has been an interesting thing to try.
Steve Krouse (00:38:00)
That's really cool. We've struggled with that internally, where we let you edit the system prompt, but then when we push an update to the system prompt, you don't get it.
Dan Shipper (00:38:14)
Yeah. It's complicated, and, if they've changed it, yeah, what do you do? Do you diff it? But I think at least letting people opt in—letting people know, hey, there's a new prompt, and letting people opt in seems to work so far. And then you basically make a copy of the Spiral, so you don't lose anything. And then maybe you can go back and forth if you want to.
Steve Krouse (00:38:42)
That's cool. So you put the system prompt as a property of a Spiral? Because right now you're the system prompt invalid time to the property of your account or of your settings.
Dan Shipper (00:38:49)
Yes. Each Spiral has its own prompt basically because Spiral is effectively a prompt builder, a fancy, fancy prompt builder. So yeah, each one has its own prompt.
One question I have for you is that you said previously that what you want to be doing is building an application where you're neutral to the rate of progress, and if the progress is to the upside, then it's a really nice surprise. How do you think about doing that? How do you think about architecting your product and your systems for that?
Steve Krouse (00:39:21)
Yeah, it's a good question. And it's interesting because it just happened. We built a system for Sonnet 3.5 and then Sonnet 3.5 New came out and Haiku came out. And in some sense, we're very well positioned for it. In beta flags for us internally, we've been playing around with the different models and they just work, but in practice, we haven't actually deployed them to customers yet. We're working on it now. So I think a model switcher is some pretty good infrastructure. Another thing we're building. We recently got a bunch of new Townie users which has made it extremely important to get proper usage-based pricing for Townie, because these things are really expensive and we just haven't taken the time because it's hard to build a whole new pricing model in place. So those sorts of info pieces, being able to switch models, being able to handle multiple models at a time, add new models in kind of more quickly, like you're doing when you add a new model. Do you have the infrastructure to tweak the prompt for that model? In practice, that hasn't been great. Claude 3.5 Sonnet New did a lot of bad stuff. And we had to tweak the prompt a lot for it. And ultimately we just rolled back to Claude 3.5 Old because we couldn't get the prompting right for the newer model.
Dan Shipper (00:40:45)
Interesting. What about strategically? Because there's thistension here where you're relying on Claude 3.5 Sonnet, but also Claude has Artifacts and there’s some overlap with Townie and the same is true for all these other apps or whatever. And so I think one of the games of being a startup is that benefiting from AI is sort of strategically thinking about how to benefit from the gains of these companies without also being eaten by them because they all have consumer or B2B offerings that are not just API offerings. So how do you think about that?
Steve Krouse (00:41:23)
Yeah, our differentiator has always been that we run back-end compute. We're a back-end-as-a-service provider. And I think these LLM companies don't want to run their own functions as a service infrastructure internally. They're going to want to partner with someone and that's the outsize dream that they partner with us to help them run back-end compute for their customers somehow. Even now, with the front-end apps that they make, it's pretty crazy that Anthropic lets you share the actual front-end app, like you can kind of deploy from within Anthropic. But yeah. Where are these companies going? It's unclear, but I think we're as safe as one could be in the back-end function of service infrastructure.
Dan Shipper (00:42:20)
I think that's pretty unique. You think that they're not going to build that, they're going to partner. And yeah, that's interesting. How are you feeling about things? You've been in this company, I guess it's been a year and a half-ish. How long have you been running this?
Steve Krouse (00:42:32)
Two years.
Dan Shipper (00:42:34)
Two years. You raised a round in February–March. How are things feeling? What has gone how you expected and what is not gone as you expected? And where are you right now?
Steve Krouse (00:42:47)
Yeah, I think things have been great. I have done better than expected in some departments and worse in others. We just are bringing on three— So we've been a team of four for the whole year and we just hired three more people who are starting in January. I'm kind of sitting on the edge of my seat, waiting to see how things like the team dynamic shifts, because that's almost doubling in terms of, things are growing well, but not as fast as I want. And it's interesting in this space to see there are a lot of competitors and nobody knows about or talks about, and we're excited to be doing better than them. And then there are crazy success stories that are like taking off to the moon overnight. And it's hard to not be one of those where we're in the middle, we're doing well, we're having steady progress, things aren't going as well as they could.
Dan Shipper (00:43:45)
What's an example? Replit or something?
Steve Krouse (00:43:48)
Replit isn't what I had on my mind. Bolt.new just announced that they did $4 million in revenue in their first four months of existence. So that's like, wow, amazing. So cool. And then Cursor, I think, is another kind of runaway success. Those are the two that I'd be jealous of. Replit agents, I think, got a lot of hype, but I don't really know people who use it a lot. Did you use it?
Dan Shipper (00:44:15)
Oh, I haven't not really used it. Honestly, I used Replit a lot when I was doing my course. I had this How to Build an AI Chatbot course. And it was amazing for that because being able to set up a project and have all the code in the project and students could fork it and then press run and just work was awesome. It was amazing. But for my day-to-day programming stuff, I just use Cursor right now. I found that like there was at least—Back when I was using it, this was before Replit agent, there were just some frustrating things about it and the AI was not as good. But I don't have an updated opinion on it.
Steve Krouse (00:44:56)
Yeah. Replit, I think, originally and still has product-market fit mostly in education. That's where it shines. So it makes sense that it worked for you in a course setting.
Dan Shipper (00:45:10)
Yeah. I mean, I know that feeling. It's going well, but it's not blowing up, but it's also not dead and sort of being in the middle can be— I think everyone spends a while there. It's called like the trough of sorrow or whatever for a reason. I don't know if you would call it the trough of sorrow because things are going well, but it's sometimes things going well, but not incredible is harder than things are not going well at all. Because you can at least just be like, well—
Steve Krouse (00:45:41)
You can be straight sad instead of mostly happy.
Dan Shipper (00:45:49)
Yeah, I definitely feel like we're in this sort of uptick right now.
Steve Krouse (00:45:59)
Every? Or one of your apps?
Dan Shipper (00:46:00)
Every overall. And I would say growth the last couple of months has been really good. This month is sort of leveling off a little bit. But I think we'll have a couple more launches in the next couple weeks that will sort of change that—we're not really doing any paid marketing or whatever. But it just feels like things are happening, which is really, really fun. But there were years where it was not like that. So I feel you. And I totally recognize the sort of team dynamic thing. We've grown the team a lot. And—
Steve Krouse (00:46:42)
Is this your office?
Dan Shipper (00:46:43)
This is not. I mean, this is the office I work at, but it's not mine. You know Jesse Beyroutey, right?
Steve Krouse (00:46:05)
No.
Dan Shipper (00:46:53)
He’s a really good friend of mine. Also went to Penn. He's a partner at IA Ventures and this is their office. And they graciously let us work out of here, crash here, until hopefully one day we'll get an office of our own.
Steve Krouse (00:47:07)
Nice. Have you been to our office in downtown Brooklyn? Oh yes. You walked up once.
Dan Shipper (00:47:10)
I walked up. We actually launched Sparkle there, if you remember. A very auspicious day but we were not able to spend that much time together because I was running around like a chicken with my head cut off. But yeah, I mean, things always break when you add new people. And it's always fun. What do you think about team sizing in this kind of AI age? How many people you need to hire and who you need to hire and what that says about your runway and your budget and all that kind of stuff?
Steve Krouse (00:47:52)
Haven't really changed my thinking about it. Because like my role models have always been like Instagram and Notion. A team of, on the order of 10, it's anywhere from 5–15 even Cursor is, I think a 15–20-person team, or it was when it exploded. I think this small, dense, but mighty engineering team has always been attractive to me and. I think AI just makes that more tractable for us mere mortals to be able to do that. But I don't think it means that I can hire four instead of seven. I think I still need the seven, but hopefully the seven of us will have an easier go of it then without AI.
Dan Shipper (00:48:40)
That makes sense. I agree. I feel like the last 10 years are the examples of the Instagrams and— Notion is a huge team now, but like they were pretty small for a while.
Steve Krouse (00:48:50)
Notion was like six people when they did databases. In 2019 when I was using Notion, when we were all using Notion, it was a real company, then it was six people or something crazy.
Dan Shipper (00:48:59)
Yeah. But I do think there was this whole movement for big teams and whatever. And I really love having a small team. I think it's super fun. And everyone knows each other and everyone's friends and it changes things when it's, 40 or 50 or a 100 people in a way that's less personal, but obviously you can get less done, but now depending on what you're doing, like you can actually get really far with smaller, smaller team, which sounds just more fun to me.
Steve Krouse (00:49:36)
Yeah. I don't think a small team— Software is one of those things, more people don't make it happen faster, the mythical man-month and everything, like having a really tight context, like everyone on the team knows what's going on, iterates really fast. Because that's what we're doing right now, for example, is. adding or creating like a virtual file systems kind of set up. So right now, Val Town is so good at writing single-file apps, but then you want to break it up into a couple of different files and have the AI just edit one of them for you while the other ones stay put, but Val Town can't handle that. It's just, we haven't built that infra, but now we're building it and there's no parallelizability of it. They’re just core infra decisions the whole team has to kind of be on board with, and we can't add more people to make it go faster. More people would honestly make it go slower.
Dan Shipper (00:50:36)
Right. Now you're getting into multi-file edits, how do you think about what the appropriate unit of work is for the AI to do before it comes back and is like, I did this. And what does that imply for how much in the loop should you be vs. not?
Steve Krouse (00:50:50)
Yeah I think for better or for worse, we've set ourselves up to be spectators in this race and we watch Cursor and Anthropic and Bolt and whoever comes up with these new UI interaction patterns and AI interaction patterns, and then we take the best ones and we like to adopt them. I think that's what's worked best for us to be fast copycatters because our core business and what we want to be best at and innovate on is a function-as-a-service platform that's the simplest and fastest, most pleasurable to use. And then we layer on these AI features that other people will innovate for us. It's weird to be like that, because I feel like my instinct is like, no, no, no, we have to be the best. We're going to come up with this stuff. But the industry is just moving so fast. If we just wait a month or two, we can see what other people do and copy it. That's what we did with Townie. Before Anthropic Artifacts, Townie was a tool use thing that looked just like ChatGPT at the time. We were just kind of copying ChatGPT and it was pretty bad. And then Anthropic Artifacts launched and we're like, ah, there we go. That's the correct UI pattern. And now we're on that. But now we're watching Cursor invent all these new UI patterns. We're like, ah, yes, those are the ones we want now.
Dan Shipper (00:52:18)
That's interesting. How does it sit with you? It seems like you have a very crisp idea of your strategy of what you want to be best at, which is backend-as-a-service functions. And that strategy was formulated even before all the stuff came out or was really popular. And it seems like you're getting this pull with Townie from people who maybe are not even programmers and maybe don't really understand what backend-as-a-service even is. And so how are you dealing with the excitement of that? But also you're able to say like, that's cool,that's important. But we're going to be sort of fast followers because the core strategy is this other thing. What do you think about that?
Steve Krouse (00:53:13)
You really hit the nail on the head. In the last week or two, I've had a lot of discussions with our main investor, Dan Levine from Accel. He's like a spiritual cofounder of this company. And a lot of the core strategy pieces about what we are and what we aren't come from him. And we had a meeting last week where I was kind of showing him this alternative vision, like we could pivot from the backend of the strategy to a full-stack app development tool. Maybe you're like YouTube for programmers' vision or Shopify, kind of like a full-stack app platform. Shopify for SaaS apps, is one idea. We could rebrand. We could call ourselves SaaSify or something and go after Bolt and try and get hundreds of thousands of non-programmers to just be buying credits and credits and credits making apps. I think that's a real pivot we could plausibly do. And we can have the database included and all sorts of other services included, and then we become probably a tool more like Canva, where we're for everyone and unsophisticated folks and they're making apps so kind of small, medium-sized apps, no serious engineer would use. And again, I had a long chat about it and the various trade-offs. We had a long chat and I wrote a long strategy document about what our strategy is, what and very importantly, what it isn't and why we're giving it up and how we get pigeonholed if we do the other strategy. One of the big downsides of being a full-stack app is that you have to do the whole stack. And I think it seems simple, and it's always seemed simple, and it's just so much more complicated than everybody thinks. In order to do front-end and back-end and monitoring and errors and logging and databases and database migrations and database backups. Back to your question from earlier, a mature engineering organization has a dozen or two dozen tools. In order to include a local text editor and local version control, get out so many tools. And so I want to replace that whole stack of two dozen tools. It just necessarily has to be a toy version of all of them. And a kind of bad version of all of them. And then that customers necessarily become unsophisticated. It leads to a weird local maxima that we're scared of. And then the alternative is that we do fewer things, but do them really sharply and better and integrate with all those other things. We could become just the functions of a service piece for a lot of people, and then once that's kind of nailed, we can expand more confidently while remaining quite good at each thing we do instead of mediocre at all of them.
Dan Shipper (00:56:17)
That makes a ton of sense. What makes you think that that's the part of the stack that you want to sit in? Why choose that one?
Steve Krouse (00:56:2o)
Why backend-as-a-service or why functions-as-a-service? It's a really, really good question. In some sense, the most honest answer is that it's what Dan Levine thought there was an opportunity for. It's what our investors had as an opportunity for what was left. Dan's also an investor in Vercel and they're doing front-end and they're doing such a good job at front-end. and then there's a back-end shaped opportunity. That's one way to look at it. Another way to look at it is as long as I've been a programmer, I've really wanted something like this to exist. I've always hated how complicated backend stuff is. And there are like three dozen tools that I've used over the years trying to make my life easier on the backend and never found one that was good enough. And so wanting it to exist— I guess like we could talk about who else is in this market. There's AWS Lambda, which nobody would ever use directly. You have to use it intermediated through all sorts of stuff. It's a terrible, terrible user experience compared to normal programming. And then there's Cloudflare Workers, which is actually a wonderful product. And I've learned a lot from the Cloudflare team, but still it's not social, it's not GitHub. It's not like collaborative, open-source, productive coding, like GitHub is. When you make a Cloudflare Worker, it's kind of invisible code that nobody else can then leverage and use like, and fork it. It's not an Artifact and it's not a browser-based thing. It's a thing you develop like a normal engineering artifact.
Dan Shipper (00:58:14)
Yeah. It's sort of a last generation type thing for people working in more enterprising environments where there's less flexibility. I think that that makes sense. I just got to say I think it's awesome that you're thinking through these things so crisply and you have a really clean perspective on it which I think it's awesome, 1.) because that's just rare, and 2.) you're giving yourself the ability to be wrong, which also gives you the ability to be right. Whereas I think there's a tendency to try to do everything and be everything—says the man who has the studio where you have five different products or whatever.
Steve Krouse (00:59:08)
It’s literally called Every. You’re trying to do everything.
Dan Shipper (00:59:13)
So, I think I admire that because I know internally how hard that is for me. I just really appreciate getting to hear it and that you're willing to share it. I think a lot of people are not willing to share it. I think you're just being incredibly honest and I think you have a very clear vision for what you're doing. And I'm just psyched to see what you do with it. What happens in the next year or two.
Steve Krouse (00:59:39)
Thank you. Yeah, me too. I think the vision's been the same since day one, but it's remarkable how many twists and turns there are, even staying true to an original vision.
Dan Shipper (00:59:52)
Totally. Well, this is awesome. I had a great time. I'm really glad you came on the show. We got to do it more often. If people want to find you and find Val Town, where can they find you on the internet?
Steve Krouse (01:00:04)
Man, I normally say Twitter, but I'm trying to move off of it these days vaguely. I'm Steve Krouse, kind of everywhere on the internet, and val.town is where you can find Val Town.
Dan Shipper (01:00:19)
Awesome. Thanks, Steve.
Steve Krouse (01:00:20)
Thanks, Dan.
Thanks to Scott Nover for editorial support.
Dan Shipper is the cofounder and CEO of Every, where he writes the Chain of Thought column and hosts the podcast AI & I. You can follow him on X at @danshipper and on LinkedIn, and Every on X at @every and on LinkedIn.
We also build AI tools for readers like you. Automate repeat writing with Spiral. Organize files automatically with Sparkle. Write something great with Lex.
Find Out What
Comes Next in Tech.
Start your free trial.
New ideas to help you build the future—in your inbox, every day. Trusted by over 75,000 readers.
SubscribeAlready have an account? Sign in
What's included?
- Unlimited access to our daily essays by Dan Shipper, Evan Armstrong, and a roster of the best tech writers on the internet
- Full access to an archive of hundreds of in-depth articles
- Priority access and subscriber-only discounts to courses, events, and more
- Ad-free experience
- Access to our Discord community
Comments
Don't have an account? Sign up!