# Data Analysis: Claude Code for Economists with Paul Goldsmith-Pinkham | Markus Academy | Ep. 162-2

https://www.youtube.com/watch?v=Rp17XUPxa4I

[00:09] Welcome back everybody for another session of the mini video series with Paul Goldsmith income on cloud code for applied economists.
[00:17] Today we go to the second video.
[00:20] And um
[00:22] We talked last time we talked about how to get started with cloud code we talked about the difference between cloud code and cloud code works and how the difference things are to how to install it and set up the terminal.
[00:35] Today we do part two which is about data analysis workflow and some simple examples before we then move on to data collections, web scraping and many many other tasks coming down the road for different videos.
[00:46] Paul
[00:48] great to have you here with us again and uh let's start with part two.
[00:52] Great, thanks so much.
[00:54] I'm really looking forward to this one.
[00:55] So this was fun.
[00:55] So like like you said Marcus, you know, last time we really started with the idea of a um basically a
[01:06] Let me open this up here.
[01:08] We we really were kind of focused on the idea of just installation and we didn't really kind
[01:10] of see what could actually be done.
[01:14] And so what I want to show you now is I want to show you just actually kind of a simple very simple task.
[01:20] So in some ways this is not going to be um this is not going to be a kind of complicated one.
[01:25] This is I think a pretty trivial example.
[01:27] But then we'll kind of get into more complicated ones.
[01:29] So if you're not familiar with things, this might be a good entry point.
[01:32] This is you know often when we do analyses you kind of assume you have the data.
[01:38] Well and so what I'm going to do is I just want to kind of come in.
[01:41] Often I have these very fun data tasks where I kind of want to just you have a research idea.
[01:46] So like I think Marcus you would be the same as this if you're somebody who you want to have a you have an empirical question.
[01:53] It's a non-trivial question.
[01:55] It's maybe not totally set up in in data set.
[01:57] Like it's not a perfect graph.
[01:58] You maybe Google it.
[02:00] It's you know, they haven't made it yet.
[02:01] You say, "Okay, well, how do I I want to just see that?"
[02:03] You might ask an RA if you're lucky enough to have an RA or you might spend an hour doing it yourself.
[02:06] I want to kind of just show you that this is very easy way to do to do this.
[02:10] So, the question that I kind of want to pose
[02:12] is you know, let's make a graph and I've actually posted this recently on social media to show this is what is the age distribution of homeowners in the United States change over the last 50 years.
[02:22] This is something I've worked on with Kelly Shue and other projects.
[02:26] You know, and we all know kind of inherently there's been this discussion about there's a shift.
[02:29] Younger folks are not owning homes as much as as older folks.
[02:34] So, what I'm going to start with is I'm going to have Claude get the data.
[02:38] And I'm not actually when I started this, I've done it now once as a test project.
[02:41] I don't really remember exactly how to get it and I don't really remember.
[02:45] It's kind of just like you'd ask an RA to do something.
[02:49] So, I'm going to say, "Look, I'm starting a project from scratch.
[02:50] I want to analyze how the age distribution of homeowners in the United States has changed over the last 50 years.
[02:56] I think the Census Bureau publishes homeownership rates by age of homeowners.
[03:01] Potentially the data could be on the Census website or it could be on FRED.
[03:05] So, what I want you to do is download the homeownership rates by age going back as far as possible.
[03:09] Write this as a as a script and then save the
[03:13] raw data as a CSV.
[03:13] Okay?
[03:15] So, it's just a data collection task.
[03:18] So, let's actually just Yeah.
[03:19] work on code the same way.
[03:21] You could do this in either of these.
[03:22] Yeah, or you could honestly even do this on the website.
[03:24] So, this is the benefit here is I'm going to kind of just do this locally on a
[03:27] I will have this on my laptop after it's done and I kind of don't have to worry about downloading and moving it around.
[03:33] It's just something that will run locally.
[03:35] Co-work, it would be the same exercise.
[03:37] Um
[03:38] I'm going to do this when we when we do it here, I'm going to write this this example here.
[03:42] So, what I'll do now is I'm going to share my screen with the terminal just so that you can have an example of what that looks like.
[03:48] We'll open this up.
[03:49] We have this from last time, so remember I like often to what I'm going to be doing is I'm going to run Claude Claude, so I'm going to run this here.
[03:57] Um
[04:01] and so you you're um we're in this folder here.
[04:03] This is a So, this is a folder I'm already in here.
[04:05] So, I made a test
[04:09] I written this once before, but now we're going to write make a new uh folder in here.
[04:10] We're going to call
[04:13] it test two, which is the same idea.
[04:17] And this is a folder inside this um
[04:20] Basically, I have a set of folders about
[04:23] um these video series that I made a test
[04:25] two. There's nothing in it.
[04:27] And so, I'm just going to launch Claude
[04:28] inside of it.
[04:30] And it says, "Okay."
[04:32] actually have to launch Claude again.
[04:34] Yeah, so I've you know, so I just run
[04:36] the program and it launches it it
[04:38] launches it up here, you know, when I
[04:40] when I quit. So, I can quit by hitting
[04:42] control C control C.
[04:44] Okay.
[04:45] And what happens?
[04:45] So, a great example of this would be I'd say So, I I started
[04:49] every time I want to be in the right
[04:50] folder just so that I'm doing this. I
[04:51] just don't want it to kind of look at
[04:53] the other stuff necessarily. I'm going
[04:55] to start from here. So, I'm going to say
[04:57] um
[04:58] So, I can copy and paste or I can just
[05:00] write it myself, but the way that I'm
[05:01] going to do is I'm going to copy and
[05:02] paste. So, it's not actually going to
[05:04] show it. I'll say
[05:07] It will show it once I once I do this
[05:08] here and I'll say, "I'm starting a new
[05:10] project from scratch, blah blah blah
[05:11] blah blah."
[05:12] So, now it's saying So, ignore this
[05:14] warning thing.
[05:16] That's just a skill that I have installed and it says, "Let me research what data is available and then write the download script."
[05:20] So, now it's doing a bunch of calls that we talked about.
[05:24] Remember how we talked about that Claude can use tools.
[05:26] So, it is now using an agent to run tools, which are these web search things.
[05:31] It's searching for these things.
[05:33] And it's asking me if I if it's allowed to do this.
[05:37] Remember we talked about its ability to do things.
[05:39] It wants to go search the web.
[05:41] It's not allowed to until I give it permissions.
[05:43] So, it's going to do a lot of searching.
[05:45] So, rather than say yes a bunch of times, I'm just going to say yes, you can always use web searches.
[05:51] So, this is this is I'm giving it permission to search for anything now.
[05:54] Um in this in this particular folder.
[05:58] And so, it's now searching for all these things.
[05:59] And it will show what it's thinking.
[06:01] So, this prestidigitating, this is just nonsense that it puts when it's working.
[06:05] If you want to see what it's doing, you can hit control O.
[06:12] It's saying, "Okay, I ran an agent."
[06:14] So, an agent um we haven't talked about this
[06:16] yet, Marcus, but um when you asked about cursor, for example, what an LLM will often do is it will spawn a an agent, a sub agent, which has its own context window.
[06:31] And so, what that agent is doing is it's running on its own.
[06:32] It's also a Claude LLM.
[06:34] But, our main agent has said to it, "Hey, you need to go look for census home ownership data."
[06:37] And so, the prompt it gives it is, "Research how to download this by age."
[06:42] This is the kind of thing I should have said originally.
[06:46] It says, "I need to find this.
[06:47] What is available?" and so forth.
[06:49] You should search for all these things, and here's how you're going to do it.
[06:53] And so now, it wants to um look at this content.
[06:55] It's going to fetch data from this.
[06:58] So, it's going to actually try and download stuff from here.
[07:01] And I'm going to say, "Yes, you can download things."
[07:03] So, it's thinking.
[07:05] So, you think about the task that we just gave it, right?
[07:06] The first thing we said is, "Go find this data."
[07:07] So, that was a task that I let it do.
[07:10] I'm letting it do all these things.
[07:12] I'm going to let it do it so that I don't have to keep hitting enter.
[07:14] I'm going to say, "Yes,
[07:17] you can look at stuff on Fred.
[07:19] So, it's doing lots of tool use.
[07:20] So, you'll see it's continuing.
[07:22] It's searching.
[07:24] It's looking for this, home ownership rate under 35 by home ownership tables, and so forth.
[07:29] There's a lot of things you can see that it's doing this.
[07:31] And remember we talked about what's an LLM doing, these tokens you are basically its way of saying how much up and down thinking is going on in what's going on.
[07:40] And so, yes, I'm Now it's you notice it switched gears.
[07:42] It said, "Well, it looks like it looks like it switched gears from Fred."
[07:49] So, it says, "Okay, let's look at Let's look at the census."
[07:53] So, we're now going to look at the census.
[07:54] Yeah.
[07:55] Can't you have one agent doing Fred and the other one doing
[07:57] Yes, so it's not very efficient.
[07:57] So, we could have if we were really in a hurry,
[08:01] we could have said
[08:03] "Spawn one agent that searches in one and spawn one agent that searches"
[08:06] That would have been faster.
[08:06] Instead, it's doing it in sequence.
[08:08] That's exactly right.
[08:10] So, often you might have in parallel, you have to tell it to do it in
[08:14] You don't always.
[08:14] Sometimes it will know to do that, but this I was so vague, I
[08:18] sort of said, "Well, look through Fred, look through this."
[08:22] Often it can be better to do this.
[08:23] So, like the more specific you are in your um query, often the better it will do on these types of tasks.
[08:29] Um So, you can see here it's continuing to search.
[08:32] It's trying a bunch of different things.
[08:33] Um but it really isn't sure kind of what's going on.
[08:38] So, the fact that we were very vague has made this a little slower.
[08:39] Of course, if I knew where the data was and I knew what data I wanted to do even if it wasn't clean, it wouldn't take this long.
[08:46] I just want to kind of give you an example of Sometimes you have an idea of what you want to do.
[08:50] So, now it's found the right data set.
[08:52] So, this is the housing vacancy survey.
[08:55] Um and it's pulling it from in.
[08:58] It says, "Oh, I finally You know, it's finally found some examples of what it's looking for."
[09:01] But, this is the kind of thing that if you just let it go, this is the same thing when you use a research function or something else on Cohort, there's no reason you couldn't have inherently um done this uh for you know, on the web, too, right?
[09:15] These are just This is just the same idea of what's going on.
[09:17] So, we're
[09:19] We're letting it search.
[09:23] It takes a little while.
[09:25] Um and then we say, "Yes, don't ask again."
[09:27] It's basically almost found it all.
[09:31] So, what's beneficial here, right, is that if you do this on the web and you gave it kind of permissions, it would just search and then come back to you and you wouldn't have to keep kind of mon-
[09:38] There's no reason we have to monitor this beyond telling it it's okay to search for stuff.
[09:43] So, here there's no reasoning going on in a sense.
[09:45] Because it's not like there's reasoning going on in the sense that what it does is once it fetches stuff, it's it's basically trying to find something that satisfies the prompt.
[09:54] Okay.
[09:54] And then it's going to eventually get to a point where it's going to say, "Okay, I found all the data that's relevant.
[09:58] Now we're going to download it and write it down like write a research script.
[10:01] So, the agent is really doing a lot of tool searching to kind of search the web.
[10:05] And this could be faster if you were using other skills.
[10:07] This is the we're really like living in kind of the bare-bones version of this.
[10:10] Um
[10:13] Have you done this repeatedly the same thing and it will come up with the same data or at every time or it comes up with different data sources?
[10:20] It Well, we'll see cuz this is going to be the second time.
[10:24] My guess is there is really one very good answer for this.
[10:28] It's There's one answer.
[10:30] So, really it should be using the Housing Vacancy Survey, which is in the It's basically the Census has data broken down at the national level by Um in fact, I can show you the picture of what this should look like.
[10:41] There's one video like there's one data set that kind of gets you the gist of what you want it to look like.
[10:45] Um it will look like this.
[10:49] Um So, I'll just show you while we're waiting while we're waiting in the background we can I can show you what this looks like.
[10:59] So, this is the final product.
[11:01] This is what I you know, I used a I did this on my own to make sure that it would work and this is the kind of thing that you get.
[11:08] This comes from the Housing Vacancy Survey.
[11:10] This is kind of the right answer of what it looks like.
[11:12] Okay.
[11:12] Um So, this is the end result.
[11:16] And you know, what we'll get if we just let it let it cruise for a little bit.
[11:21] Well, that's then done the the figure is done also with Claude?
[11:25] Yeah, I run all of it.
[11:25] It was done with Claude.
[11:27] So, I mean it uses I used R in the background cuz I'm I tend to use R, but it wrote the R code for everything to do this.
[11:34] I basically just told it what I wanted it to look like they like roughly speaking and it did everything else.
[11:41] What are what are the gray things doing?
[11:44] What are the gray things?
[11:44] You mean the lines?
[11:47] So in this data there is annual data every year of what things look like.
[11:52] But of course that's sort of there's different ways to show what's going on.
[11:55] So what it did was it just showed all the years but then it's it showed that basically the every decade what the what the relative change was you could see that it's kind of shifting discontinuously or not continuously rather.
[12:06] And you told it to do it this way.
[12:08] Yeah, you know, I kind of I kind of hinted at it.
[12:10] It's kind of a natural way to make that graph but I would be a stretch to say that I told it to make it look exactly like that.
[12:14] That's that would be too much.
[12:16] So you see so now it's done.
[12:18] So what it says is it found two good sources.
[12:22] Mhm.
[12:23] Um it says.
[12:24] So it had a lot of thinking.
[12:26] It says two good sources.
[12:27] One is the Census Bureau table 19 quarterly home ownership rates by age from 1994 to the present.
[12:33] I'll use the Census Bureau data since it's the standard home ownership rate measure.
[12:36] Let me first check the environment and check the file um Excel file structure.
[12:40] So now it's doing stuff.
[12:42] So now it's it's basically wants to write um this Python command that's basically checking a bunch of things.
[12:50] It's basically checking versions to kind of understand what's going on.
[12:51] I'm going to say yes you can do that.
[12:54] So that took it about three or four minutes.
[12:56] It found the data.
[12:57] It found some examples.
[12:59] If I wanted I could tell it to use Fred um but it would use a totally different survey.
[13:02] So now it's just going to start inspecting the data.
[13:05] It's running code to kind of look at one of the the files.
[13:09] This is this file here which is an Excel file.
[13:10] It's reading it in Python.
[13:13] And I'm going to say yes you can do that and now it's going to um.
[13:17] It basically uh ran into an error.
[13:21] So likely what you need to do.
[13:24] So the Census doesn't like what it's doing cuz I'm I'm just querying it and trying to pull it and so it throws something called a 403 error.
[13:32] A 403 error is basically Census wants you to pretend to be a person.
[13:37] Like you can't just it doesn't want just bots that are doing it.
[13:39] So I'm saying, "Hey, I'm a user."
[13:41] It puts a little header on it and now it should and now it should um be able to do this using the requests package.
[13:48] Um this becomes an issue as you start to scrape stuff.
[13:50] If you're a person who's interested in scraping, you have to be very um careful and diligent about the way that you interact with websites because um a lot of websites set stuff up so that they aren't just basically bombarded by requests from uh LLMs or just from web scraping programs.
[14:10] That's what we do in the next video, no?
[14:12] Yes, exactly.
[14:13] And so we'll talk about that for here and so you can see here is it's printing out stuff from the data.
[14:17] It's saying, "Here is what you know, it wrote this code.
[14:19] It pulled the data."
[14:21] It basically pulled this data from these various places.
[14:23] Here's 1994 the first
[14:26] has quarterly data on home ownership rates at the national level and so now it's going um to
[14:37] It's also going to check FRED now.
[14:40] So it's checking FRED as well.
[14:42] FRED is only has I think previous data has only a little bit of data.
[14:47] It's not great.
[14:48] So it's basically trying out all the different It's exactly what you'd want an RA to do.
[14:52] And now it's going to say, "Oh, let's see if there's older Census data."
[14:55] Cuz we only got back to 1994 and actually we'd like to have more data.
[14:59] So what it's going to do is let's table seven might have annual data by age going back further.
[15:04] So this is the kind of thing you'd exactly want your RA to I didn't as you can see I didn't tell it to do any of this.
[15:08] No, I just said go back as far as you can.
[15:13] Cool work.
[15:13] It will put the same.
[15:15] It does not ask you to prompt.
[15:17] It's the same as
[15:17] So what we can do, why don't we do that while Let me just uh quickly copy that over so that we can do it in parallel.
[15:23] Um very quickly, let me just move this over while we're here.
[15:25] 1 second.
[15:28] Um I'm just going to copy this again.
[15:30] We'll go to Cogram.
[15:32] Um since just so for you people who weren't with us on the last video, let me just show you on Cogram what this would look like, what Marcus is asking.
[15:41] So it would be here.
[15:43] So here's Cogram, which is an application that's kind of doing a similar thing.
[15:47] I'm going to make a new folder for this to work in just very quickly so that we have a space for it.
[15:51] I'm going to call that um um we'll call it temp three in there.
[16:00] And if we go here, we'll tell it to work in a folder.
[16:03] We'll say work in the Marcus folder for video two in temp three.
[16:10] Um and we'll say yes, you can work you can change things in here.
[16:14] And then we'll say let's go.
[16:14] So we'll just let it run and we'll see what it does.
[16:20] Um I don't work in nothing.
[16:21] same amount of time.
[16:21] It's not faster or slower.
[16:23] No, it shouldn't be.
[16:23] I mean, the only way that it would be any faster or slower.
[16:26] So it's you know, we'll just let it go here and we'll see.
[16:28] It may
[16:29] actually be faster in some ways because
[16:31] you know, there's kind of
[16:33] uh with the terminal, we've kind of
[16:37] we may have a could there may have to be
[16:39] more things that are approved. But um so
[16:42] here we're going to just say yes, you can do this.
[16:44] I could have made this much faster by the way.
[16:45] If I had just said, "Yes, you can run Python things."
[16:48] and it would do this faster.
[16:50] So you can see it's basically reading through the code.
[16:51] It's trying to see what's going on here.
[16:52] Um this is just a bunch of data.
[16:55] It basically is parsing all the data that's in this structure.
[16:59] Mhm.
[16:59] Table 12 has household counts by age so I can compute rates from that.
[17:02] Um so it's really gotten a lot of
[17:07] it understands the structure of the data really, really well.
[17:08] Um
[17:11] so I'll keep you apprised of what's going on with Cogram at the same time
[17:13] this is going on.
[17:15] Cogram is searching the web just like with the the other one.
[17:16] So, it should be very similar.
[17:18] There's no reason that it shouldn't be
[17:19] um
[17:22] So, I'm just going to say yes, you can run all the Python scripts that you desire.
[17:24] That way it will I don't have to keep saying yes.
[17:26] So, now it's just trying to
[17:30] basically see the structure of the data.
[17:32] and part Now, it's the same way that you if you were working with this, Marcus, you would kind of look at each file and you try to understand what the structure is and then you'd write a little script that kind of parses everything carefully, right?
[17:46] But, it takes it takes 6 minutes, 21 seconds to start.
[17:49] Yeah, it's I mean it's So, now it says, "Okay, I have a clear picture of both data sources. Let me write the script."
[17:54] So, this is, you know, faster than I would have done it, for sure.
[17:57] Yes.
[17:59] Um So, for uh interestingly, Claude uh Claude Co-work really struggled because it it uh it ran into that same user agent problem.
[18:13] So, it struggled with this idea that, you know, it couldn't log in.
[18:19] Do you want to go back to the Co-work or Yeah, let me just let me quickly just show you what it's doing here.
[18:22] So, it wrote it wrote the script.
[18:23] So, now it understands the data.
[18:26] So, remember just, Marcus, the way we talked about the context windows, it has all that information. It's kind of like
[18:31] it can keep all of that in its memory.
[18:33] and now it's like, "All right, let me write a script that parses stuff that looks like this."
[18:37] And so, it says, "Okay."
[18:37] And so, it's going to write this file in here.
[18:41] Um it's going to write a file that pretends to be a user agent.
[18:43] It's going to pull from these age groups.
[18:46] It's going to have these age patterns.
[18:47] It's going to download files.
[18:48] It's going to parse them.
[18:50] And it's even it's going to parse two different tables.
[18:53] And then it's going to download the data and then it's going to construct these files and then it's and then it's done and you'll say, "Yes."
[18:59] And so, then um Oops, sorry.
[19:01] Then, I'll just go into uh
[19:06] Just to make sure that I understand.
[19:08] Yeah.
[19:08] So, you already downloaded already, but then is it all right a script and download it again?
[19:13] No, cuz it didn't really download all of it.
[19:15] I mean, it did a little bit, but really what it did was it downloaded and just read a little bit of it cuz it doesn't want to download and look at everything because that will blow out its context window.
[19:24] So, often these things are It's the same way that you might not, you know, imagine you had a perfect memory, Marcus.
[19:27] You wouldn't like take a thing and then look at every row and then be like, "Okay, I'm now going to look at the every row."
[19:32] going to say, "Let me just understand.
[19:34] It kind of knows it.
[19:36] And so, what it did.
[19:36] So, you notice here is.
[19:39] on the right you can see what the file is.
[19:40] It wrote the It wrote the code in my thing.
[19:42] So, I could just rerun that code any one time I want, and it made a file,
[19:46] which is called home ownership age uh by age annual.
[19:50] And it made a CSV file that has everything here.
[19:52] Every year has what the home ownership rate is.
[19:55] So, they both They all come from this and they do this.
[19:57] So, now that's great.
[19:59] So, you were asking So, that just downloaded the data.
[20:01] The thing that I would typically do here is.
[20:07] So, just to give you a sense of what I would do next is I would typically um.
[20:13] you know, take this.
[20:15] And this is just a discussion here.
[20:15] So, it does this, clean data, blah blah blah.
[20:20] So, now um.
[20:22] what I'm going to do is I want to make this graph.
[20:23] So, I'm going to I want to sort of show you how you would make this graph.
[20:27] So, I'm going to do it in the following way.
[20:30] And so, by the way, like actually let's quickly check in on um.
[20:33] So, what's kind of interesting here is
[20:35] it will kind of tell you the different
[20:36] setups, what the different data Wait,
[20:38] for example, given that there's
[20:39] quarterly data, you could do a quarterly
[20:42] version of the graph that we're going to
[20:43] talk about if you wanted to study that.
[20:45] I'm going to focus on the annual one
[20:46] just for the purposes there.
[20:49] Let's check in very quickly on um
[20:53] what
[20:54] uh the co-work is doing just since you
[20:57] asked about that.
[20:58] Um co-work is here.
[21:01] And it's still struggling because mainly
[21:04] because
[21:06] it's still doing the search process.
[21:07] Now, struggling is the wrong word, but
[21:10] >> What kind of data do
[21:10] >> It's it's working on the data question.
[21:12] So, it says, "Okay, here
[21:15] um
[21:16] let's do it." So, it's doing all the
[21:17] exercise, same thing as before. It's
[21:19] really It can't get to the census is one
[21:21] of the big problems here.
[21:23] So, now it's going to say, "Let me try
[21:25] and find this."
[21:27] Um
[21:28] So, it's still doing basically the same
[21:30] exercise. I'm trying to find this. It
[21:32] has historical data. And so, what it's
[21:34] doing is it's going to try and download
[21:36] directly.
[21:38] And so, then it It would The problem
[21:40] with it is is that
[21:42] it's running into the same issue that
[21:44] happened in the other one. So, it's the
[21:45] same type of issues, which is that the
[21:46] census doesn't let you go there unless
[21:48] you write a script where you have a user
[21:50] agent. It doesn't seem like Co-work
[21:52] wants to do that. And so, it seems like
[21:53] it they're kind of having slightly
[21:55] different approaches. We'll see where
[21:56] this one gets to. Um
[21:59] Yeah, you can see
[22:00] >> doing never get into the census data.
[22:02] >> Well, we'll see we'll see what happens.
[22:03] I don't usually use Co-work for this
[22:04] reason. So, it looks like he it might.
[22:09] So, let's see if it can get there. Let's
[22:11] see if it gets there. Let's give it the
[22:13] benefit of the doubt. While I While we
[22:14] make this figure, we'll see if it can
[22:15] get to the same place.
[22:16] >> Okay.
[22:17] >> Um
[22:19] So, what we're going to do now is This
[22:21] is going to be very fast by comparison.
[22:23] So, we're going to say, "All right,
[22:25] um we have these two data sets. We're
[22:26] going to say
[22:28] um please make a graph that plots the
[22:33] um
[22:34] distribution of home ownership
[22:36] of home ownership
[22:40] um by age um
[22:43] for each year in the So, I'm going to
[22:48] tell it what file I want It It doesn't
[22:49] really need this, but I want to say that
[22:51] I want to use the annual. So, here you
[22:52] can say at home ownership by age
[22:55] annual.csv just to tell it I mean this
[22:57] file.
[22:59] Um
[23:00] Um
[23:01] >> tell it that you want to use all?
[23:03] >> I'm going to in just a second. Yes. So,
[23:05] please
[23:06] make a script to generate a figure in R
[23:11] with this data.
[23:13] Um
[23:14] And then so this is my cheat. So,
[23:16] Marcus, you're now getting the number
[23:18] one cheat that I have found that has
[23:19] been very helpful. So, there is a a
[23:22] professor of sociology, his name is
[23:24] Kieran Healy. He has a really wonderful
[23:26] book that I will um
[23:28] pitch here that's called data
[23:29] visualization in R.
[23:32] It makes beautiful graphs. And so, what
[23:35] you can do, you can say, "Please follow
[23:38] the best practices
[23:41] of making figures from
[23:44] Kieran Healy."
[23:46] >> Okay.
[23:46] >> And so now, we can tell it to do that.
[23:49] Um
[23:50] >> The fact that you use "please" all the
[23:52] time, does it make a difference?
[23:53] >> I don't know. Maybe I'm just very
[23:54] polite. I So, you know, I It's hard for
[23:58] me not to say please. I know that there
[24:00] are people who
[24:01] There is literature that talks about
[24:03] ways in terms of um So, here now, by the
[24:05] way, it's asking like what R scripts it
[24:07] is and what's available and so on. And
[24:09] it says, "Good, everything's available.
[24:10] Let's write this following Healy's
[24:12] principles: clean ggplot2, direct
[24:14] labeling, minimal chart junk, thoughtful
[24:16] color, and clear typography."
[24:18] Um
[24:19] So, I say "please" a lot, mainly cuz I
[24:22] just say I'm overly polite probably
[24:24] anyway. And then um
[24:27] >> I'm sorry.
[24:27] >> There is literature about the extent to
[24:29] which being polite or rude or
[24:31] threatening these LLMs can potentially
[24:34] change behavior. I guess I It's very
[24:37] hard for me to not just be polite. So,
[24:40] >> Okay.
[24:40] >> I I don't know how much it will affect
[24:42] your results. So, um So, now I wrote
[24:45] this, you know, 77 lines of this. And
[24:47] then I'm telling it to run it. It will
[24:49] also run it for me. So, I said, "Yes,
[24:50] you can run it." And
[24:53] >> Wait, it was 77 lines? Wow, okay.
[24:56] >> Um Well, cuz there's a lot of preambles.
[24:59] So here you can see what it looks like.
[25:00] This is it on the right. So here you
[25:03] have it just makes this nice It's this
[25:05] long plot.
[25:06] Um
[25:08] Cairo is an Oh, yes. So
[25:11] So it's running into this problem of it
[25:13] So this is very funny in that it's
[25:15] really struggling with um
[25:17] fonts.
[25:18] Because because
[25:21] um
[25:23] it So Kieran Healy has very preferences
[25:25] for fonts.
[25:27] >> Mhm.
[25:27] >> And so it's trying to do his fonts. And
[25:29] so now it's debugging this issue that
[25:31] comes up for the way that it's tried to
[25:32] structure it. And it looks like it
[25:33] should work. Which is kind of amazing.
[25:36] This [snorts] is always a huge issue.
[25:37] There's a whole blog post from
[25:39] Kieran Healy about this. So now it looks
[25:42] like
[25:43] it should be done. So what we can do it
[25:45] looks good. I see there's a one Unicode
[25:48] isn't supported well. So we'll fix that
[25:50] very quickly. Um
[25:53] and then we'll run it again.
[25:55] And once it's done we can see
[25:58] how happy it is or if it's happy.
[26:02] >> So all this doing get defying and all
[26:04] this things
[26:06] these are all well known terms or they
[26:08] just made it up?
[26:09] >> Which which terms? The
[26:11] >> The cogitating
[26:13] or
[26:13] >> Oh, cogitating. Yes, these are just like
[26:15] it's
[26:16] I think there's a You can actually
[26:17] change that. So you can
[26:19] cogitated just means to to have thought,
[26:21] but it uses all these ridiculous terms
[26:22] and you can change those. They're kind
[26:24] of meaningless. They're not No, they
[26:25] they're meaningless. They're kind of
[26:26] nonsense words. So now we can open this.
[26:29] So um let me open this file. So I'll say
[26:32] open home ownership by age
[26:34] um dot PDF. And let's see if it did
[26:37] exactly the same thing. It should be
[26:38] similar. So I'll open it. Oh, no, it
[26:41] didn't do that. Okay, very interesting.
[26:43] Um
[26:45] So and you know the reason for that it's
[26:47] actually kind of interesting. So it did
[26:48] a different graph. So here I'll I'll you
[26:50] what it looks like. This is what it did.
[26:52] Um
[26:55] it did this one.
[26:57] So
[26:58] Oh.
[26:59] >> That's a different one. That's very
[27:00] different.
[27:00] >> Yeah, sorry. Let me just It opened the
[27:02] wrong one for me. One sec. It's It looks
[27:04] very different. And part of the reason
[27:06] for that is it
[27:09] it only has these bucketed ones. So it's
[27:12] it clearly doesn't have as many um
[27:13] subcategories in terms of age. Well, now
[27:15] this is still very helpful. You can kind
[27:17] of see the relative home ownership rate,
[27:19] but it's not the same kind of graph. So
[27:20] now we could ask the question of um
[27:25] how can we get something that looks
[27:27] closer to these other these other ones?
[27:29] Um
[27:30] It's interesting in that this is sort of
[27:32] more aggregated.
[27:34] >> x axis. You want the timeline on the x
[27:35] axis.
[27:35] >> Yeah, exactly. So let's let's let's kind
[27:38] of change that and see if we can get
[27:39] there. And it what it might say is um
[27:43] you know, it might say, "Hey,
[27:45] uh
[27:47] you know, the data doesn't work for
[27:48] this." And then so then we'd have to be
[27:49] like, "Okay, well, you need to go back
[27:50] and figure it out." Um I
[27:53] So I have a tendency of being polite
[27:55] here. I say, "This is great, but I want
[27:58] a graph where age
[28:02] is on the x axis and the death
[28:06] distribution
[28:08] on the y axis
[28:10] and there are different
[28:14] lines for each year.
[28:17] Do you have sufficiently fine-grained
[28:20] age data for this?
[28:22] Can you go back and find something
[28:26] with age buckets?
[28:28] If not
[28:33] So we'll see if this kind of
[28:36] Of course, it is kind of cool to look at
[28:37] that graph in the sense that it kind of
[28:39] exactly explains
[28:41] what has happened during this period,
[28:43] right? So if we while we wait for this
[28:44] to kind of run, it's thinking. So it
[28:47] says, "Oh, yes, good news. It actually
[28:49] all There are finer age buckets. We only
[28:50] got the five big ones."
[28:52] So, this time it didn't do the big ones.
[28:54] I wanted more. And so, now it's going to
[28:56] re-download the data.
[28:59] Now, refactor this
[29:00] >> it will re-download it? The CSV file
[29:03] probably has it already or not?
[29:05] >> No, cuz I think when it parsed There's a
[29:06] large much larger Excel file that it
[29:10] kind of read.
[29:11] Now, I think it just downloads and then
[29:13] kind of gets it it pulls directly from
[29:15] that. So, what it's doing is it's
[29:16] downloading, reading into memory, and
[29:18] then parsing it. Think if this was a
[29:19] bigger file. These are pretty small
[29:21] files. If they were bigger files, we'd
[29:23] have a raw folder where we download all
[29:25] this stuff, and then we wouldn't have to
[29:26] re-download. These are tiny by, you
[29:28] know,
[29:29] Um so, now it's going to rerun it.
[29:32] And
[29:35] um
[29:36] And you notice, by the way, Marcus, this
[29:38] is like
[29:39] This obviously it's a little strange to
[29:40] be working in the command line and
[29:42] everything else. For the most part, this
[29:44] is all
[29:45] um pretty straightforward English. Yeah,
[29:48] and you can just It's the same thing
[29:49] you'd say for talking to someone like
[29:51] you and I both here. Let me quickly I'll
[29:53] show you what co-worker is doing. Yeah,
[29:56] co-worker is doing something very
[29:57] similar.
[29:58] It looks like it has managed to find a
[30:00] data set. So, it says,
[30:03] "It looks like" So, it's writing the
[30:06] download.api download.data.py.
[30:09] I'm curious which data set it's going to
[30:10] use.
[30:11] Um
[30:13] It's been a bunch of time thinking about
[30:15] this, and now it is finally the script.
[30:18] So, we'll see what it's doing. So, it
[30:20] looks like it's writing it here. So,
[30:22] what
[30:23] uh it's a interesting question of what
[30:25] file. So, download fr- Oh, so it found
[30:26] the same one.
[30:28] So, it's doing the same one here,
[30:30] but it's only doing it from table 19.
[30:34] Um
[30:35] Oh, no, and here's 17 as well. So, it
[30:37] found exactly the same one. So, you
[30:39] know, co-
[30:41] um
[30:43] So, oh, this is interesting. So, the
[30:44] challenge here, and so I didn't quite
[30:46] realize this about co-work. So, if you
[30:48] look at this here,
[30:49] it's the same thing that would happen on
[30:50] the website often. So, it wrote a it
[30:52] wrote a script,
[30:54] but the script,
[30:56] um
[30:57] basically, it's not able to write it and
[30:59] then query something because it's in
[31:01] something called a sandbox. So, it's
[31:03] really working just within this one
[31:05] sandbox, which is something we'll talk
[31:06] about for Claude code going forward.
[31:08] We let Claude go nuts, right? We said,
[31:11] "Hey, yeah, Claude, go access the web,
[31:13] do whatever you want."
[31:15] >> Yeah.
[31:15] >> Um
[31:17] And so, it's struggling inside Claude
[31:19] co-work to kind of work in the way that
[31:21] you would if you ran your own script. It
[31:23] can't really act like you, per se, if
[31:25] that makes sense. So, there's it's
[31:27] getting
[31:28] >> in the sandbox.
[31:29] >> It's really much more sandbox, which of
[31:31] course then removes a lot of the privacy
[31:33] and other concerns that you might have,
[31:35] right? So, there's there's kind of, um a
[31:38] lot of
[31:39] um pluses and minuses. Um So, let me
[31:43] kind of, to finish this up and then
[31:46] we can wrap up, is so here now we fix
[31:48] this, we've told it all the things, it's
[31:49] kind of working.
[31:52] Um let's run this again.
[31:55] And we'll see if it's kind of
[31:58] fixed.
[32:02] This looks like it's it's reading this,
[32:03] da da da.
[32:09] So,
[32:11] it's still working. So, gesticulating
[32:13] just means it's working. It doesn't mean
[32:14] anything else. And so, now it's going to
[32:16] do a little bit more. The year it says
[32:19] the shape is right, but the year labels
[32:20] are overlapping. Let me use this thing
[32:22] to fine-tune the spacing. So, it really
[32:24] is like like a good RA in the sense that
[32:26] it looks to make nice graphs for you.
[32:28] Um
[32:30] >> So, it even discovers problems.
[32:32] >> Yeah. I mean, it tries. It doesn't
[32:34] always. Sometimes you have to tell it to
[32:36] go look. This is a pretty
[32:37] straightforward problem that I've asked
[32:39] it to do. On more complicated things,
[32:41] like if you were writing a macro model
[32:42] and you wanted to code it up, you might
[32:44] say, "Okay, now make sure you go review
[32:46] that code and make sure like, you know,
[32:48] everything makes sense. Are the
[32:49] intertemporal constraints being
[32:51] satisfied, etc., etc."
[32:54] >> So, what's what's done on your computer
[32:56] and what's done in the cloud or
[32:59] >> So, in the cloud is anything that has to
[33:01] do with text in terms of what it's
[33:02] writing. Like it's writing stuff like
[33:04] writing a like telling you what script
[33:06] to do. The program being run, the
[33:09] commands are coming from the cloud, but
[33:11] the the actual execution of the R script
[33:14] is being done on my computer. So,
[33:16] >> So, you have R on your computer. If you
[33:18] don't have R on your computer, it
[33:19] wouldn't work.
[33:20] >> No, and then you could ask it to install
[33:22] it if you wanted or you use Python or
[33:24] something else. That's right. But that's
[33:25] true. Um yes, that's right. That's like
[33:28] basically you need to have the same um
[33:29] thing here. So, let's look at what it
[33:31] does. So, just to give you a sense,
[33:32] you'll notice um
[33:36] This is what it's gotten to so far,
[33:37] although it you notice it keeps thinking
[33:39] and it keeps wanting to do stuff. I just
[33:40] want to show you what this picture looks
[33:42] like.
[33:42] >> So, all this software you have to have
[33:44] on your local computer essentially.
[33:45] >> Yeah, but that's true. I think of yes,
[33:47] but that's right. So, here's what it's
[33:48] made now, but it doesn't love what it's
[33:50] done yet. So, very similar, but still
[33:53] not identical. So, this is what it's
[33:54] made.
[33:56] Um and now it will let it we'll see what
[33:59] it
[34:00] um
[34:01] We'll just kind of last thing here.
[34:04] We'll just let it do one more piece,
[34:07] which is to say, "Okay, you want to do a
[34:08] little bit more." It's going to change
[34:10] the colors. It has lots of strong
[34:11] feelings right now about what things
[34:12] need to look like.
[34:14] And
[34:16] hopefully
[34:17] This is because I I really gave it a
[34:19] very dogmatic viewpoint on it. If I had
[34:21] said, "Just make a graph," it would have
[34:23] said, "Great. Here's a graph." But
[34:25] instead I said, "You need to make, you
[34:26] know, follow these very particular
[34:28] protocols of how to do things." It's
[34:30] it's iterating to kind of make sure it
[34:32] looks kind of nice. So, now it's done.
[34:34] Says, "Great. It looks to this.
[34:37] It's using this data. Here's some
[34:39] features of it."
[34:41] Um of what's going on. And now let's
[34:43] just look at that graph and we'll be
[34:45] done. So, it didn't do the gray lines
[34:46] and like in the one that I showed you.
[34:49] Um and I'll but I'll show you this and
[34:51] then we'll be done. So, here it is.
[34:55] So, this is the version of it. It didn't
[34:56] do the gray lines, but you can kind of
[34:58] see it goes I see. See the different
[35:01] versions. I kind of prefer the other one
[35:02] that was made, but this is kind of
[35:04] getting you a sense of what it looks
[35:05] like. Um you have kind of this is the
[35:08] time series of what it looks like. You
[35:10] have the kind of the peak. I think
[35:12] there's different ways you could make
[35:12] this, but this is kind of a This is a
[35:14] pretty good start. You can kind of
[35:15] clearly see like this is what it looked
[35:17] like at certain points. I would say the
[35:19] crisis looks very similar to 2024 and 20
[35:22] you know, 2004 is when obviously
[35:24] homeownership was the highest across the
[35:25] board. There are better ways to do this,
[35:27] but you know, we could iterate here
[35:28] forever kind of of what it would look
[35:30] like.
[35:31] Um so, that's kind of Yeah. So, that's
[35:33] kind of just to kind of sum up on this
[35:35] cuz I don't I think like you could do a
[35:36] lot of ideas here if you wanted. Is
[35:41] you know, the idea here is that when you
[35:43] work um sorry, not this one. We'll do if
[35:47] we think about
[35:49] um
[35:52] you know, if we could make this graph,
[35:53] you get a lot of really nice things
[35:55] here. Kind of what's nice before is that
[35:56] you had to kind of run a lot of things
[35:58] and go back and forth. Instead, you can
[35:59] kind of just tell it what you want the
[36:01] graph to look like and it will do it
[36:02] yourself. That's really kind of lovely
[36:04] part of it.
[36:05] And then you can just iterate
[36:06] accordingly. You can add different
[36:08] things. You can add stylings if you want
[36:10] it to be satisfying certain
[36:12] requirements. And in the end, you know,
[36:14] now we have all this stuff in one folder
[36:16] where we can kind of it creates a kind
[36:18] of repository that looks like a project,
[36:20] right? You started to actually make
[36:22] stuff that looks like code for a
[36:23] project. And that's kind of the key
[36:25] benefit is like you have Claude do these
[36:28] things. It works with you to do the data
[36:30] acquisition. You can talk about what you
[36:31] want to look like. You always are
[36:33] generating scripts so that you have can
[36:35] actually do it yourself and you can look
[36:36] at the code later.
[36:38] And then you can kind of do judgment on
[36:40] what you need things to look like. So
[36:42] that's kind of the key thing and I think
[36:46] next time what we'll do is we'll try to
[36:47] do something a lot more complicated and
[36:48] it'll kind of be more clear of why you
[36:50] need to be what kind of what what really
[36:52] unlocks.
[36:55] >> Fantastic. Okay.
[36:58] So that concludes this video but uh
[37:01] don't hold off to go to the next one.
[37:03] It's coming. We're ready to full force
[37:06] data scrapping.
[37:07] >> Perfect. All right. Thank Thanks so
[37:10] much.
