I love Christmas. Anyone else love Christmas? Okay, one person. Come on, there’s more than that. And I know that presents are not the point of Christmas, but presents are one of the reasons that I love Christmas. And in 2016, one of my gifts was the very first surveillance device that I at least knowingly put in my home, an Amazon Echo dot. And I want you to picture the scene, okay? It’s after lunch, the whole family’s crammed into the living room and I’ve got the best seat in the house right by the fire. And I’m unboxing it and I’m setting it up, and I’ve got to that key moment where it first comes online. That blue ring lights up, Alexa says, “Hello.” And just before I can respond, from across the room, my little brother shouts, “Alexa, order me an Xbox One.” Chaos. “Alexa no, Alexa cancel, Alexa stop, Alexa do not listen to him.”
Now, I’m a software engineer. I have shouted at many computers over the years, but this was the first one that had listened to me. And the thought of that captured my imagination so much that I spent the rest of that vacation looking at the technology, trying out the Alexa Skills Kit. And when I went back to work, a brand new cognitive technologies team was being set up. And the first thing they were looking at was chatbots, and I knew that was the team that I wanted to be on.
So that’s what I do. I’m Gillian Armstrong. I’m a technologist, I work for Liberty IT, and I solve problems using conversational technologies for Liberty Mutual, our parent company. Now it turns out I was not the only person who got a smart speaker for Christmas that year, or the year after. In fact, Gartner’s predicting that by 2020, 75% of all U.S. households will have a smart speaker. The world is changing fast. Conversational interfaces are just becoming a normal part of life. So we need to be able to learn fast. And we learn fast so we can change fast, and we change fast so we can keep on learning fast. And the joy of the two technologies that I’m going to talk to you about today, both chatbot and serverless, are that they let us both learn fast and change fast.
But before we jump into them, let’s go ahead and deal with the elephant in the room. So what is a chatbot? What do you mean by serverless? Do you know that there are still servers? Questions as debated as how big is a microservice? Questions best debated over a cup of coffee, or over a lot of cups of coffee. And we can do that later, but for today I’m going to be opinionated about it. These is the definitions we’re going to work under.
A nice one from Paul Johnston, on serverless. “A serverless solution is one that costs you nothing to run if no one is using it (excluding your data storage costs).” And a chatbot is something that you could interact with conversationally using natural language. Now, some of you are going, “Hold on a minute, Gillian. I know you can build a chatbot that’s just got buttons. It’s got no intelligence.” I’m going to go ahead and assert those are not chatbots. Those are fancy command lines. Those are pretty webforms. It’s natural language understanding that is changing this interface. It’s what makes its significance and so that’s what we’re going to talk about today.
And here’s what we’ve built. This is the first thing that we worked on in that cognitive technologies team. It’s an internal employee digital assistant. It sits on our intranet as part of a larger productivity suite. You’ll see that toolbar. And it just lets employees ask it questions in whatever way they want. So you can do things like get you an answer on HR policy or unlock your work phone if you’ve locked yourself out. Which is apparently a lot of people in our company do quite frequently. And it’s built using Amazon’s conversational interfere service Lex and also AWS technologies. So when I talk about specific technologies, that’s what I’ll be referring to. However, most of our learnings are applicable regardless of the technology that you use.
A Quick Overview of Chatbot Using Conversational AI
So how many people here have built a chatbot using natural language understanding? Okay, we’ve got a few. That’s good. So for those of you who haven’t, let’s just run through the flow and deal with some of the terminology up front. So right at the top here we have a Conversational User Interface. This could be anything from that little pop up on your website, to something like Siri or Alexa. And of course the first thing that happens is that the user says something. We call that the utterance. If it’s a speech system, we have automatic speech recognition. We take the speech and we turn it into text. And then we run it through natural language processing and understanding. We can extract things like entities, times, places, dates, and we map what the user said to an intent, what the user meant. And then we map that intent to fulfillment, what we actually need to do. And that’s where we’re going to put our cool serverless architecture. And then of course we need to respond back to the user. There’s still user interface design. This is where it goes. These are the words that you use, the voice that you use, how you sound. And of course, then the final piece, text to speech.
Now that seems like a lot of stuff, right? But when we break it down, we see that there are three things that we need to do to build a chatbot. Number one, we need to design it so a human can understand it. Number two, we need to train our bot to understand human language. And number three, we need to write some code to actually make it do something.
Now most developers jump straight into number two, myself included, and we’ll look at why that happens. And then you hard code something in for number three, so you can get that cool demo. And then you panic when the business love it and they say, “Can we have it for the next sprint?” And in all of that panic, you completely forget number one. But everyone in this room always thinks about the user first. Before they write a single line of code, they always consider the user. Yes? So that’s where we’re going to start.
In a Chatbot, Conversation is the Interface
So in a chatbot conversation is your interface. Now, this is a big departure from what’s come before. Do we have any web designers or people who do web design in the audience? Okay, we have quite a few. Okay, so do me a favor, take a deep breath before we go to this next slide, okay? Even if you’re not a web designer you know that design matters, you know it makes a difference to your users. And I do not know what happened here, but it’s not a technology issue. This is a design issue. And I see a lot of things that say, “Chatbots are terrible. The technology is not good enough. They’re just going to fail.” And you know what? The technology is not perfect. It has a lot of problems, but design is causing a lot of the failure.
And I get it. It’s hard, because up to now, we’ve just kept designing based on what came before. We’ve carried the concept of menus and commands into our graphical user interfaces. We base our webforms on the paper forms that came before. But it’s time to throw all of that out. It’s time to completely rethink human-computer interaction. Because humans are the original conversational interface. That is what we design on. Not your website.
And when you start thinking from that point, when you start scripting out your conversations, when you test them out with real humans, maybe you’re going to find out that it’s a terrible use case for a chatbot. And that’s a great place to find this out, because then you can go and build something else. But we’re going to assume that your use case is awesome. It’s perfect for a chatbot. And conversational design, it’s a deep topic. There’s a lot in there. We don’t have time to go into the depth of it today. But I’m going to give you a one minute overview based on the things that we learned building our chatbot.
Conversational Design Overview
The first one is be clear. Pick a personality and stick to it. Don’t be really hilarious one minute and then really formal the next. It’s really disconcerting for your users. And be concise. Do not dump your website straight into a chatbot. I guarantee you it will be awkward and it will be verbose. If you’re testing out those scripts with humans you’re going to see that really quickly. And make sure people know what your chatbot can do and what it’s going to do. So one of my colleagues, while playing about with a competitor’s chatbot, decided to go through their cancel credit card flow, and had his credit card cancelled, because it didn’t confirm with him before it did that. So check with the user. Make sure you’re checking with them.
Be helpful. Okay, build in help. Sometimes people forget to build in help, and they forget to build in strategies to get people back on track if they get confused. And vary up your conversational script. Nobody wants to see the same, “Sorry I didn’t understand,” over and over and over again.
And on the topic of error handling, be nice. Websites are mean. They are really mean when you get it wrong. Big red error text. “Uh-uh. Did it wrong.” In a conversation, the user is never wrong. They didn’t say the wrong thing. You just didn’t understand. Be nice. I don’t care how smart you are, I don’t care how great your chatbot is. It is not smarter than a human. And even if your chatbot is the most amazing thing in the world, there will just be people who don’t want to talk to it. Be nice about that too, and build in alternatives for people to use.
The web is all about telling you what to do. “Fill in this field. Push this button.” A conversational UI is about listening to what you want to do. And at every decision point, if you ask yourself, “Am I telling or am I listening?” it will absolutely keep you on the right track.
Train Your Bot to Understand Humans
But a key part of listening is understanding. And so we need our chatbot to understand the human. And as I said, this is quite often where people start. And the reason for that is, if you go to Google and you type in “how to build a chatbot?” you’re probably going to end up somewhere like this, on a nice console where you can build a chatbot. This is the Amazon Lex Console, but there’s a lot of different ones there, and they’re are all quite similar. And when you go to this console and you follow that little tutorial, what it will tell you is that getting your bot to understand your human is as simple as this. You define a set of intents. These are the things that the bot will understand. So essentially the functionality it’s going to have. And for each of those things, you simply give it a set of sample utterances, examples of what the user would say. It’s not a regex. These are training data. And your natural language understanding algorithm’s going to take those and it’s going to generalize on those and it’s going to do some really smart stuff. And then when you user says something, it’s going to match the right intent. Or it’s going to say, “I don’t know what any of those are.”
So let’s take a look at what that would look like in the console. So you’ll see at the left-hand side that we have our intents. You’ll see all I do is type in that list of sample utterances. Down below, I have those entities, Lex calls them slots, that we mentioned before. So I’ve associated these ones with a particular intent. I’ve said, “This is the data I want to collect when someone wants to book a meeting.” And the chatbot then knows actually to collect it if the user hasn’t said it.
Down below we’ve got fulfillment. You’ll notice we can just use it as an API, or there’s a nice little dropdown. I just choose a lambda. We’ve got a little button here that says, “Build.” When you click that, it’s going to take that interaction model, all those intents, those simple utterances, those entities, and it’s going to train your bot. And that blue button, “Publish,” it’s going to put it out live to your users. There’s a built-in test console. There’s a tab that lets you hook it up to popular channels. So fill in all those fields and you are done. Chatbot done.
But is it really? I test out a lot of other people’s chatbots, play about with them and see how our competitors are doing. And quite often, when the bot doesn’t understand, it says something like this. I see this message a lot, “Sorry, I’m just a baby bot. I’m still learning.” And sometimes if I’ve gone back a few times and I’m still seeing the same behavior, I think, “Are you? Are you really still learning?” Because while I said it was as simple as providing a set of sample utterances, it may actually be as complex as providing a set of sample utterances. Because getting the right sample utterances can be really complicated. You can’t just make them up yourself. It shouldn’t just be based on what you would say. It’s got to be based on what your users would say. So for some of our telephony bots, we’re able to take the call center logs. We’re able to get real data about what people say. But for an employee digital assistant, we didn’t have any examples of what people were going to say. We had to learn as we went.
Now, not all services provide conversational transcripts, but if they do, they are absolutely invaluable. Because this is how we learned. So we went through and we looked at those transcripts and we looked at the vocabulary people were using. We are a global company. So while our U.S. colleagues say “vacation,” back home we say “holiday.” While HR may have given us that requirement for an answer for the 82973 form, it’s likely employees have a slightly more informal, and according to HR, incorrect name for that form.
Look for those utterance structures. Are people just phrasing things differently than you thought. People will say unexpected things, but they’ll also say expected things in unexpected ways. We need to take that data, we need to put it back into our interaction model, and we need to keep retraining that bot. We need to keep getting smarter. And look at your interaction patterns. Remember we said we were going to build in help? We were going to test out those scripts. This is going to tell you if your scripts are working. Have you got the right help there? Are your conversations good enough?
And some of you are thinking, “Well hold on a minute, I was definitely promised that chatbots learn by themselves. They get smarter when people talk to them.” I wish that was true. While the algorithms out there are getting smarter, the service that you’re using probably is improving, if you want your chatbot to get smarter quicker, and if you want to service the exact needs of your users, you’re going to have to put in some manual work. You’re going to have to go in and check your conversational transcripts. You’re going to have to check what it’s recognizing, what it’s not recognizing. And it’s also good to look at what is working. Because that can be a really positive thing, can really show you where you’ve done a good job.
Write Code to Make it Do Something!
But there’s no point in us having a design the users understand, having a chatbot that understands the users, if it doesn’t actually do anything. But that’s not a problem, because we just looked at that console a minute ago right? It’s a little drop down and we choose a lambda, and we’re done. Except maybe it’s a little bit more complicated, maybe we need to call an API, maybe you want to store something. And then all of a sudden, your chatbot architecture looks a little bit more like this. So how does that happen?
Well, this is our chatbot architecture. Now it is part of a larger platform, but this is the chatbot piece. So let me walk you through the pieces here and talk about how we ended up with this. So the first thing you’ll see is that we don’t go directly to Lex. We actually come in through API gateway. And that’s because we want to handle all our own authentication, our user identity, and we want to inject our own session and context before we go to Lex. Now that’s sessions, that’s short-term memory. So just what’s going on right now in that conversation? We keep it with a time to live, we don’t keep it for very long. But that context is long-term memory, and it’s the key to the chatbot really being able to take really personalized contextualized fulfillment for you.
So it’s an employee bot. We pull in information about the office that you’re in. Who is your manager? What’s your title? What previous conversations have you had? And that means that when we go to Lex, we know everything about you. And so when Lex maps through our fulfillments, we can take that personalized action for you. Now, we have a whole pile of different types of fulfillments. We have ones that go to external services. We have ones that just do simple lookups. We have ones that hook into our internal systems, which is why you’ll see the VPC there.
And then, of course, one of the most important pieces is our monitoring, analytics, and reporting. I mentioned those conversational transcripts; we save all of those off in S3 and we do that so we can run reports on it. So we can go back and look at how well we’re performing. How much are we matching? Are people saying things we didn’t expect? But we also stream it to Elasticsearch. And the reason we do that is so we can do live analytics. So we can see at any time how many people are talking to our chatbot. What are they asking? How many conversations are going on?
So here are some of the things that we report on. We need to know how many users we have. How many are coming back? How long is the conversation? What functionality is being used, and what functionality is never being used? And we reveal those conversations. So where are we getting it wrong? Not only what did we not match, but what did we mismatch? Which is your worst case scenario? Users are very unhappy if you give them the wrong answer. And where are we getting it right? What are people really enjoying? What functionality is really positive? And what do people want us to add? What are they asking for?
And we also do monitoring. Because you’ve maybe noticed that this is a distributed system. It is serverless, so that’s what you’re going to end up with. And so it can be very hard to keep track of what’s going on if you don’t have monitoring, reporting, tracing in place. So you’ll see X-Ray up there. That’s how we do our tracing. You’ll see we use CloudWatch, and we use CloudWatch alarms and dashboards. But we also stream all our logs to Elasticsearch as well, so we have centralized logging. So we’ve got one place to go and look if something goes wrong.
If you were in Zack’s [Butcher] talk a moment ago, you know how important observability is. If you have a serverless architecture, it’s distributed. It’s lots of little micro pieces. Keeping track of your architecture is very complicated, but it’s the only way that you can know if what you’ve built is working. Have you built a good architecture? Have you built a bad architecture? Is it up? Is it down? It’s a really, really important piece of it.
So now you might be wondering, “Well, hold on, why did you use serverless? Did you just complicate things? You created this whole distributed architecture because you just wanted to use really cool buzzword technology?” Well, here are the two main reasons. The first one is that lower cost lets us learn fast. Being able to do something really quickly, being able to experiment, and not charging very much for it, because it’s pay per use, is really valuable. Chatbots are still an experiment for a lot of companies. They don’t know if they’re going to work. If you can go and say, “I’m going to do a PoC, and you know what? If it doesn’t work right, if nobody uses it, it won’t cost us anything.” That’s really great. And offloading that maintenance lets us change fast. So if we can just focus on our functionality, just on the actual application code of what we actually want to build, and we don’t have to think about all this other stuff, it means we can keep changing really quickly as we learn.
So if I wanted to buy a wardrobe, I could just go out, buy a pre-made one from a shop, and I hope that it would fit my needs. Or I could go and build a custom wardrobe. Or maybe I couldn’t, but maybe you could. And you could pick the perfect type of wood and you could get specifications exactly right and it would perfectly meet your needs. But it would also be a lot of work. So instead, I could go to IKEA, and I could choose one of their modular systems, and I could put together a lot of different pieces to get a custom wardrobe. And it wouldn’t be exactly as perfect as if I had hand-crafted it, but it would be pretty close. And if I change my mind, I can just move those pieces about. I can take pieces out. I could put new pieces in.
And serverless is like that. You assemble a set of services that someone else owns that are in the cloud, and critically, someone else manages, and create the architecture that you want, create the architecture that’s going to fit you. And if it doesn’t work, you can change it easily. Now, some of you are going, “Well, hold on, Gillian, you don’t just pay when you use your IKEA wardrobe”. Of course not. That’s storage. And we know that we pay for storage.
Serverless is all about the services. It’s not just about lambdas. It’s not just about having no servers. It’s all about being able to put together those services so you can focus on your functionality before you need to worry about other things. But having said that, lambdas are a really powerful part of it, and something that we have used really extensively. And they’re really great for conversational UIs, because, of course, the conversation is your user interface. When we make a change, we’re not changing our front end; we’re changing the code in the back. So for every intent we have, we also have a lambda with the fulfillment for that. And that means that we isolate each individual conversation. It’s really quick to develop. You can try things out. If nobody ever talks to it and never has that conversation, it doesn’t cost you anything else.
There are things that are asked every single day. “What’s for lunch?” There are things that are asked maybe just once a year. “What’s my bonus?” But when they’re asked, everybody is asking them. So the fact that they can scale independently is really powerful.
But don’t forget that things like performance tests are still needed. Don’t assume that because you’re in the cloud and because you’re serverless, everything is just going to scale magically. We know that when we first did our performance test, we immediately caught the fact we hadn’t quite configured our DynamoDB quite correctly. There’s an auto scale on it, but it gradually ramps up. It doesn’t do spiky load. So you need to make sure your reads and writes are configured correctly. And those external APIs you’re calling- do they have limits? How fast are they? You need to know that everything’s going to scale. So performance testing, game says they’re really still very important.
And being able to try things out, being able to experiment and not paying until loads of people use it is great. And whenever it’s really, really popular, whenever you’ve got it right, it’s going to scale. But don’t forget that low usage, low cost means high usage, high cost. So you may not be able to predict your cost profile in the same way that you could before. Now, hopefully, it’s still going to be cheaper. But you need to also consider that that cost may not be linear. Different parts of your infrastructure may have different cost profiles. And some of them have a free tier, so that may go one day from being really, really cheap to suddenly jumping up. Don’t get caught out.
Some Notes for Those Existing Systems
And we were really privileged because we were building a brand new system. But there are lots of other teams in my company building chatbots that don’t have a brand new system. They have a big legacy system. And we all work together and we do a lot of sharing of learnings. So here’s some of the things that I’ve learned about building a conversational interface on top of a legacy system.
So the likelihood is that if you have an existing system that either you are using microservices or you’re thinking about moving to microservices, so you’re kind of getting there. And you’re moving into this asynchronous event-driven world and that’s normally where people are at when they’re going, “I’m going to go to serverless.” Because that’s perfect for this asynchronous event-driven world. It’s wonderful. But I want you to remember that REWRITE is not a four-letter word. That’s not just because it’s got seven letters. Sometimes when we refactor, we bring old thinking into new technologies. Sometimes that’s not the best way. Sometimes you need to take a step back and you need to rethink how things should work. Because let’s not forget that we’re not building a website. We’re modeling this on a human.
And a good conversation is synchronous. You wait for a response. Note that I said “good conversation”. We’re all working in IT, we all sit in hours and hours of meetings every week that do not model good conversations. Okay, a good conversation, like the ones you’re having over coffee this week; not your architecture meeting. And not only do we wait for a response, we expect a response. If you say hello to someone and they don’t respond, you may never speak to them again. You’ll be quite offended. And you expect a response immediately, not 10 minutes later. Even 10 seconds is excruciating in a conversation.
So when we think about a conversational architecture, we need it to be fast. Not stock market fast. But it can’t be inconsistent. You need to be able to guarantee that it’s going to come back with a response in a short period of time. And it needs to appear synchronous. Now, under the covers, it can be whatever it wants, but to the user, especially in a voice system, it must appear synchronous.
So let’s go back to that journey. So the first thing is, of course, do not just take your microservice and lift and shift it into a lambda. So just go “woop.” I can’t imagine anything worse than a Spring Boot Service in a lambda. Never do that, okay. Please don’t do that. But equally don’t go, “Oh, this is function as a service, and let’s see my microservice, I’ve got a whole pile of functions in there so I’m going to lift out each function and I’m going to put it in a lambda.” Because then all of a sudden this happens. If you’ve got a good microservice they probably were together for a reason. And if you’ve already decomposed a monolith, you’ve maybe run into the same problem where you’ve ended up coupling microservices, where when you call one, it always calls another one. But those microservices were always on. These lambdas are not. So now we’ve got this problem where we’ve got latency and we’ve got spin up time. And don’t think that you can just pop a queue in between each of these and I’m not going to notice that they’re still called one after another every time, okay?
So what do we do? Well, some of you are thinking, “Well, I’m just going to keep it warm. I’ve heard that’s a good thing to do”. And there are good reasons for keeping things warm. We keep our VPC lambdas warm because they have to bring up their network interfaces, and it’s really slow. But for most other cases, keeping it warm isn’t necessarily the right option. So the first thing you need to check is, is it genuinely the spin up time that’s slow? Or is it your application code? I’ve definitely seen examples of that. And you can increase the memory and make it quicker. And you could change the language you’re using. Something like Node or Python is going to be better than something like Java.
But of course, they were probably together in that microservice for a reason. So you might just want to put them back together. And if some of that’s genuinely reusable, you may just want to put it in a library instead of having it in a separate lambda that has to be called. I’m going to let you in on a secret: it’s okay to have more than one function in your function as a service, but do not just lift and shift your microservices.
And here’s another problem I’m running into in our company, something that we’re actively working together to try and think through and find new patterns for. If you have an existing system, existing APIs tend to be set up to collect sets of data in webforms, right? What happens if you do not send the exact set of data that the API wants? It’s going to reject it. But a conversation doesn’t work like that. We have all these little micro interactions instead. You can get partial data, you can get data that you’re going to expect later on; it’s completely non-linear. Traditional microservices are generally not set up for cognitive flows. We need to rethink how they work.
REWRITE is not a four-letter word. Both serverless and chatbots require new ways of thinking about your architecture. And we get there iteratively by learning fast and changing fast. We certainly didn’t get our architecture right the first time. We certainly still don’t have it right. We change all the time. But we need something else. Something you already know. We need good supporting engineering practices. Just because this is new technology doesn’t let you off the hook.
Anyone here work for a large enterprise? I do. Okay, a few of you, right? Are you going to be okay if someone arrives in and says, “Okay, I’ve got this enterprise application and we’re going to roll it out 200,000 users. And I have just created it in the console, but don’t worry. If we need to change it, we’ll just go in and make some changes and click the publish button and we’ll be great.”? Of course not. You’re not going to be fine with that. Because while it’s obvious that your application code is in code, what may not be immediately obvious is that your entire chatbot should be in code. Your conversation should be in code. And this is the cloud. Your infrastructure should be in code. Everything should be in code. And that’s going to slow you down a little bit. But it’s going to make everything testable, traceable, repeatable and observable, and it’s going to make you go faster afterwards. So no console.
Conversation as Code – Amazon Lex
This is our architecture. What do you think here is created through code? Everything, yes. So I’m going to walk you through our pipeline. So we use the same pipeline that the rest of people at our company do. But we bring this new technology into it. So we create everything through code. And I said that it’s going to slow you down a little bit at start. It took us two weeks to get a PoC out through the console. And then it took us a further two months to get all of this set up. But we create everything through CloudFormation and we build the bot through the SDKs and APIs available. Regardless of the bot service you use, make sure it’s got SDKs and APIs available.
Because it’s all in code, we can do all those things you already know. We can unit test all that code. We can do static analysis. And we do static analysis on the JSON for our API calls. Not to check it’s valid JSON, but to check it’s meeting the contracts for those calls later on. So developers were making simple mistakes, like using disallowed characters or putting in a duplicate utterance. We can catch that as they type it.
And when you push it, we have a full CI/CD Pipeline. We have it in Bamboo. So those tests are run again. Automatic builds and deploys. And everything is sent to Slack, so we know exactly what’s going on. And obviously those builds and deploys don’t happen if the tests don’t pass, but everyone on my team is amazing, so obviously, they always do. And everything gets deployed out to the cloud. And I said that we built that bot using APIs. So as part of our CloudFormation, we use Lambda-backed custom resources. So that lets us as part of the deploy, pause, call out to a lambda, do some work, and then return. And one of those lambdas is our bot builder lambda. And what it does is it takes that JSON that we’ve written, that we’ve now uploaded to S3, it pulls it down and it uses it to call the Lex model building service. If anything goes wrong, if there’s some error, if the bot doesn’t build, we roll the whole thing back.
But if it does, we do a simple smoke test. Just make sure everything’s wired up. And then we do a component test, now only in non-prod. This is a really critical part of our system. So remember we said it’s really important to know how well your bot is recognizing what the users say. So one of the things we found early on, was that when developers were adding the intents, when they were making changes, it was very easy to change the recognition of other intents by accident. And we have dozens and dozens of intents. We don’t want to have to regression test manually every one. But we want to know really quickly that our intent recognition is still working really well. So we run these tests directly against the Lex runtime APIs.
And we constantly update these tests, not just when we add new functionality, but based on what we’re learning about how users are really interacting with our bot. And that means that we can have brand new functionality out to our users in minutes. And it’s completely repeatable. We can do the same thing to any AWS account. And in fact, we have different accounts for non-prod and prod, and we can be certain that they are exactly the same.
But once it’s out there, we don’t just hope for the best. We already talked about it. We have those analytics, we have those alarms, we have that centralized logging. If there’s a problem, we want to know fast so we can fix fast. Monitoring is so important. We’ve already said this. If your chatbot’s down, we don’t want our users to tell us it’s down, we want to know it’s down. We want to know that our system is working, and if it’s not working we want to be able to fix it and we want to keep adding more of what is working in our system.
And those analytics are vital. If our bot isn’t recognizing what our user says we want to know about that as well. We want to know it isn’t working and we want to fix it, and we want to keep adding more of what is. And getting feedback from your system isn’t enough. You need to get feedback from your humans as well. So our bot lets people directly add in feedback at any time. But we also consider anything that we didn’t recognize to be a form of feedback. It lets us know what the user wanted to do and what they expected it to do. We need to know what isn’t working and fix it. We need to keep adding more of what is.
You see, being able to change fast lets us learn fast. And the secret to learning, the secret to any great software is listening. And the joy of conversational UIs is that they are all about listening. So we need to learn to listen, and we need to listen to learn. And if you can do that, I guarantee you, you will have a great chatbot. And in the spirit of listening, I think we have a few minutes, and I’m going to listen to you and take some questions if there’s any.
Questions and Answers
Man 1: I don’t know a lot about this ecosystem, so this may be naive, but you mentioned earlier when load increases cost increases; so have you guys come up with a strategy for basically managing the cost as these get really heavy utilization or become very resource expensive?
Armstrong: Yes, so you can limit how far it will scale. We haven’t because we’re happy. So the internal bot, we’ve got 50,000 users. We’re not getting more usage than we’re happy to pay for. And in our telephony bots, obviously every call that is offloaded from an agent to a bot saves a lot of money. So we’re happy to keep scaling that as much as possible, so at the moment we haven’t limited anything.
Man 2: One question here. I work in the financial industry and there is cyber security concern as to you’ll be shipping PII data to Lex and Polly. How did you solve that problem if you did?
Armstrong: Sure, I work for an insurance company and we’ve a really big legal team. And obviously, we have contracts in place with all of our vendors and we make sure that we are happy with how they’re handling our data. One of the things that Lex has actually put in place, I think because of a lot of feedback from financial companies, is that you can check a box and they will not save any of your data. So nothing will be saved. So you can know they’re not holding onto it. Now in terms of passing it through, that’s a matter of working with your privacy and security teams and making sure that the contracts that you have in place with Amazon, you’re happy with.
Man 3: This is really exciting for me. This is definitely something I’m interested in pursuing, and we’re in the financial business too. It’s going to be interesting how that works out, but I have a very specific question for you. When things go wrong, like your fulfillment side can’t reach a service, for example, how do you explain that to your human?
Armstrong: Sure. I blog a lot about design because as I said, it’s the bit that gets forgotten. And so it took me a while to learn it. So I decided I would go out to Medium and I would blog a lot about it. I’m @virtualgill in Medium as well.
What you need to do is you need your error handling strategies to give different error messages back to the user depending on what’s gone wrong. If you can’t get to a service, there’s no point saying to the user, “Please, repeat what you said,” because it’s not going to be able to get the answer the second time. What you’re wanting to say is, “I’m sorry this functionality isn’t available right now. I can’t complete your request. Here’s an email address you can email instead or a number that you can call.”
I do have a set of three articles, if you want to come and talk to me afterwards, that goes through different strategies for handling different types of errors in your system, and how you inform the user and handle that situation.
Man 4: I was wondering if you could talk a little more about the integration tests that you do towards the end of the deploy process. Are those sample phrases that are not included in the initial learning model, but they’re used more like a test model or something like that?
Armstrong: Yes, so we do pure intent recognition. I know sometimes people do the actual responses. So we do a selection of things. So we take some of the sample utterances that we use, so some of them directly. We do ones that are similar, and then we do ones that shouldn’t match anything. So for each of those we have three things, and sometimes we choose things that we might think might be confusing for the bot. And so we find that that’s pretty useful, and we do update them as we find a user saying something that has mismatched with an intent that we didn’t expect. So some of the same utterances, a sample of ones that are not in our training data, and then some utterances we don’t want to match anything.
Man 5: I really like how you balance both, that there are some things that are complex and some things that seem simple but they are also complex at the same time. Which is why I use actually a chatbot system as one of my interview questions for software design and systems design. But one of the things that I don’t think you spoke to, do you guys use more synchronous and RPC-based interactions between your different layers? Or do you use asynchronous, or is it like a hybrid and where does that play out and what are the tradeoffs?
Armstrong: Yes. It’s a hybrid. So Lex itself is synchronous, so it’s an HTTP call. Obviously, that part of it, when the lambda that calls Lex, obviously it holds and waits for us to go to fulfillment. But we have some parts of our system that go onto queues and things like that. So it is a hybrid and it’s based on the different fulfillments and how fast some of them are. Some of them, we might say, “Okay, we’ve kicked that off for you and we’ll get back to you when it’s done.”
So it’s a combination. And we’ve experimented with putting in synchronous, and then trying the same things in asynchronous. And we find that there’s not a lot of difference between them. It’s maybe just a bit more, if it’s all asynchronous, that traceability and being sure that how quickly it’s going to get picked off the queue is maybe a bit, gets concerning at scale.
Man 6: Thanks so much, it’s a really good presentation. Some of these conversations can get into a loop, the intents, you know. I’ve also seen some tools where you can define the flows. So have you used any kind of flows or how do we design our flows or do we design them as we implement these?
Armstrong: Yes. Designing conversation flows is pretty complex and there’s a lot of different tools out there. And I don’t think anyone’s quite got it right yet. So we would design our individual pieces of our conversation and then overall flow. So yes, you don’t want to ever have anyone in a loop. And you always want to let people escape from that loop. So we would be very careful to put in the error handling. So if we’re seeing that you’ve hit “I don’t understand,” three times is the maximum we ever let you hit that before we just bail you out. But even three times is a lot for people.
At the start we were just using things like Lucidchart, because that’s what we had in-house. There are tools out there like Botsociety and different things like that that let you map out your conversation flows in them. But conversation is non-linear. It’s not an IVR system. And so trying to put it together and showing all the different ways that it can go and different directions is very complicated. And I haven’t seen any really great tools that really capture that.
Man 7: Do you use any version control system for the training data? Do you version your training data, and if you go to staging or something like that, how do you manage the delta and how do you push it to production?
Armstrong: Yes. Everything is in code. So we check it all into our source control. And then when we build the bot, it’s Lex, so we send that data to Lex and it builds it. So we take all that data we’ve got in source control and we send it to the bot. Now things like Lex, they do version for you, but we don’t want to rely on those versions. We want to just recreate the bot from scratch if we need to, so we’ll just go back to an older version and just recreate the whole bot again.
See more presentations with transcripts