How my AI report on housework started well, then went off the rails
I’ve long been interested in the topic of housework, as you can see from this Crooked Timber post, which produced a long and unusually productive discussion thread [fn1]. The issue came up again in relation to the prospects for humanoid robots. It’s also at the edge of bunch of debates going on (mostly on Substack) about living standards and birth rates.
I’m also interested (like nearly everyone, one way or another) in “Artificial Intelligence” (scare quotes intentional). My current position is, broadly, that it’s what Google should have become instead of being steadily enshittified in the pursuit of advertising dollars. But I’m alert to other possibilities, including that more investment will deliver something that genuinely justifies the name AI. And I think a lot of the concerns about power and water use, the spread of AI slop and so on are either overstated or (as with deepfakes) are mostly new iterations of concerns that always arise with new IT and communications technology, and can be addressed with existing conceptual and legal tools.
With this background, I thought it would be interesting to try out ChatGPTs Deep Research (DR) on the question of what has happened to housework and why. As I may have mentioned before, I’ve trained DR on a big volume of my own writing. That produces a passable imitation of my style, and means I don’t worry about the ethical issues of plagiarising the writing style of others (of course, standard norms of citation and attribution still apply).
I decided to focus on single-person households, to abstract away from the issues of child-raising (which I want to look at separately) and the allocation of work between partners (about which there is a vast literature to which I can’t add anything new).
Everything went really well to start with. I prompted DR for time use data, then pushed further on with more detailed questions like the impact of air fryers on male cooking habits (I was given one recently and was impressed enough that I promptly bought a second). I asked for a literature search and got references to Judy Wajcman and Michael Bittman, both of whom I knew and a couple of people I didn’t. DR missed Ruth Schwartz Cowan’s classic More Work for Mother.
On the other hand, I wasn’t aware of Wajcman’s recent Pressed for Time and hadn’t thought about the broader issue of life administration, which DR pointed out. I gave it a more economistic take, trying to divide labour-saving innovation (electronic bill paying) from the labour costs of more digital consumption (retrieving passwords for streaming services etc).
I got DR to produce a LaTeX file, and was nearly ready to go to digital press when I noticed that the references were incomplete. At this stage, the whole process spiralled into disaster. Every draft seemed to lose more material, and to be worse written. Finally, I demanded an explanation\

Source: Enkiai. com
DR graciously explained that this is the way it does things. Here’s its response with original formatting (more detailed version here)
Why Deep Research Fails
The core issue: The “illusion of continuity” in long-context LLMs
Large-context models appear to preserve every part of the conversation, but in reality:
They do not reliably retain or retrieve long-form outputs they themselves generated.
Even though the conversation window may be large, the model often cannot:
- accurately locate the version you consider the “full text,”
- reliably differentiate between multiple drafts (short, medium, long) created earlier,
- or accurately regenerate from earlier output unless it is verbatim re-supplied by the user.
This is because the internal contextual representation compresses earlier tokens. Information is not stored sequentially; it is represented in a latent space, which cannot always reconstruct past outputs faithfully.
Fortunately, my LaTeX program took continuous backups, so I’ve been able to retrieve the last good version, but I’m going to keep it away from DR for now.
I was going to go on with more detail about the actual report, but my op-ed training leads me to feel that a post should have 700 words, and I am at 675 as I type this.
fn1. I can take a victory lap on my jihad/crusade against ironing, which has disappeared almost entirely, contradicting the expectations of many commenters.
La arruga es bella (iirc … ) I don’t remember though, if they meant wrinkles in clothing, or wrinkles on people!! It could be either. Yes? No?
That’s interesting what you say about air fryers. Hmm. Maybe we need one.
Is housework really so awful? I deal with it by having low-ish standards. (I am not sure that Americans aren’t a bit too clean.) My problem is clutter. Specifically, paper.
There will be plenty more on the substantive topic in the next post
I stopped ironing decades ago. My wife does very minimal ironing: a few items of her clothing and none of mine.
Housework is awful as is much drudgery-type work. I would reduce housework drudgery by being more minimalist and owning much less stuff, if I could. But my wife likes lots of stuff though not outright bric-a-brac.
I partly agree with the saying, “The things you own end up owning you”. However, there is a necessary minimum which includes one’s own house if possible. Without a house, landlords own you or the elements own you. And councils start acting like Moreton Shire:
https://www.abc.net.au/news/2025-12-08/moreton-bay-council-faces-court-over-homeless-camp-destruction/106114178
I will say, I seem to miss the smell of starch. Or was it the smell of the ironing itself? (I haven’t ironed in … I can’t remember the last time.) There was something that smelled good! We use some kind of sensitive-skin detergent now, so the laundry doesn’t smell “good.” Overall, I guess this is a win.
Maybe on occasion, I might iron a tablecloth – but really not even those most of the time.
Air fryers? phht, they do need seriously cleaning – what ever happened to the dutch oven?
Roger: silicone liners are what you need