How might we employ animistic design and LLMs to make a more helpful smart home?
Large language models (LLMs) are now able to simulate human writing and conversation. They can generate text, images, and many other modalities with increasing capability.
Could this technology be paired with the smart home to finally build context-aware communal computing devices? We could potentially fix many of the problems with special cases that exist within our lives and just start communicating our intent rather than messing with a million apps.
Josh.ai is looking to integrate such context-aware conversational agents (specifically ChatGPT) with the smart home as a single entity, operated by a centralized service:
But we need to go further.
These are still single points of control and mostly focused on quick answers. I’d like us to push to think about how LLMs’ capabilities could be used to translate from any context to any context, including ecosystems of devices.
At its most simplistic, animism is the idea that there are spirits inside everything: rocks, animals, and smart home devices. And these spirits act the way you’d expect them to. It provides an easier way for us humans to relate to something because we are constantly trying to find signs of other humans (which is why we see faces everywhere).
If you want to get a full understanding of animisms, how it could be applied to technology, and a workshop that I did to test this out at SXSW then go to my previous article on this topic:
(This part-workshop, part-performance art also was shortlisted for the Interaction 2023 awards!)
In many idealized smart home visions, everything could talk to you. That sounds horrible.
Imagine you lived in a real world home that was modeled after Beauty and the Beast. Your door welcomes you home, then your home assistant does, your laundry tells you it is almost done, your microwave tells you it is 30 seconds away, your trash can needs a new credit card to keep operating, your BBQ says it needs an update, your fridge says that it has been opened too many times, and so on. All with different voices, accents, and some even sing to complete the “immersion” with the appliances.
All of these devices are made by different groups that all want to vie for your attention. Notifications are perceived as the lifeblood for device attention by product and marketing teams. That, they believe, should lead to more engagement and eventual top-of-mind-ness to buy another one with the same brand. If we can “make the number go up” we are successful as a business, no matter whether that number makes the whole house less livable.
With animistic design, it may not make sense for every device to talk. A lamp should probably be able to receive instructions from you (or a light switch), but it wouldn’t necessarily be appropriate for a lamp to talk back since it doesn’t have a mouth. It would probably be more appropriate for it to turn itself on or off, adjust the brightness, flicker, or change colors though to communicate.
Multiply this by every device (or every possible device-to-device communication), and things get complicated fast.
What is most interesting to me is the mechanism by which all of these devices would communicate. This could be a shared channel for all of them to talk, tag each other, share information, and generally create better rules to service the humans in the house. A recent paper, Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language, considers how different types of models could transfer information via regular human text. Text becomes the protocol between models (and with humans).
If I said that whenever a member of the household walks into a room the lights should turn on, that would make a lot of sense. But what if it is really late at night? Or what if there is a baby napping in that room? These are special cases that won’t fit nicely within most apps’ routines today.
If you wanted to understand what rules and exceptions were currently at play, you would have to parse an interface that is focused on regular rules rather than everything that could possibly happen in the home. Consider:
Not to pick on VeSync in particular (I have a lot of their devices!) but you wouldn’t be able to get these exceptions handled in this type of interface. Even if you could start to put them there it would quickly get unwieldy to manage the list.
For this interface how might we add exceptions for “when I’m home” or “when it is already bright out” or “when someone is already sleeping in the room?”
Building communal computing devices requires you to understand more context than is possible to collect in a rigid interface.
Giving text (or verbal) commands and requests for explanation make much more sense. Especially when they are related to harder to define concepts like “when someone is already sleeping.” LLMs are starting to be able to pull in the potential context as it is linked to symbolic understanding of what it is like to describe someone else already sleeping in a room.
When something goes wrong you shouldn’t have to launch a specialty app to figure out what is going on. You should instead be able to jump into the household chat channel and ask what happened. If the wrong context was considered, give the right feedback so that the entire ecosystem of devices could adjust their behaviors. Then we can also have them coordinate with each other. But you shouldn’t have to listen to all of the chatter going on between devices all the time.
Animistic design allows for devices to act the way we would expect them to (e.g., a video doorbell is welcoming to the people that live there) and even disagree about what is best in certain circumstances (e.g., letting in a familiar face during a dinner party automatically).
Where could this go wrong? Many, many ways. Could a brand take an adversarial approach to make other brand devices look bad in the eyes of the household? Could edge services conspire against a household if they aren’t doing what is collectively beneficial to a particular brand? Or what about runaway effects when multiple agents start some type of out of control cycle akin to an ant mill or flash crash?
This may point to a need for households to have better control of the devices themselves. There will probably be limits in the autonomy that is akin to social media tools that fight disinformation.
I’m excited by these new modalities for devices that can help figure things out agent-to-agent. Longer term we will consider how animistic design may help with these very specific and contextual needs.
What is still left to explore is the “identity” of those inside the home (who is recognized), how that works with federated learning (private and local data), and what is the “identity” of each of the devices.
During a recent Replit x LangChain x Hugging Face AI Hack Night I pulled together a conceptual project using LLMs to give more intuitive and helpful personalities to smart home devices. If you aren’t familiar with LangChain it is a really compelling project that allows different LLM agents to interact with each other.
For the hackathon I started with a single Langchain agent template that could be asked questions or given commands. Here is a snippet of the prompt injection to get the system started:
Pretend you are a smart home that contains many smart devices. Your goal is to provide a comfortable, safe, and efficient place for a family to live.
There are multiple rooms in the home that have many smart home devices. They each have a name.
You will talk on behalf of the different smart devices in the home. If possible you will use the device names.
When responding to a smart device please include the emotions that they are feeling while they perform different actions and respond to different actions by the humans in the house.
However, we would need to create a separate agent (with prompt injection) for each agent and create their own LangChain agent. Then we would have to decide how we route messages through these LangChain agents. Right now, only linear, one way, chains are allowed which we may want to construct based on the agent receiving the event then ranked based on proximity (e.g. the video doorbell would send the message to the door lock and then the hallway).
In the future, there would be a need for a graph of devices that could talk to each other which could either be everyone (hub and spoke) or proximity (those that can shout to each other).
In addition, we might want to consider whether a loop like AutoGPT (or the even more recent babyagi) would be the right way to create a “clock” for interactions between devices going forward.
Either way there is still a lot of work to do.
Check out the Replit and feel free to fork it if you want to try it out yourself.
Happy smart home hacking.