In Proceedings of the 3rd Conference on Conversational User Interfaces (CUI '21)
Abstract. Recent research suggests that deliberately manipulating a chatbot’s personality and matching it to the user’s personality can positively impact the user experience. Yet, little is known about whether this similarity attraction effect also applies to the personality dimension agreeableness. In a lab experiment, 30 participants interacted with three versions of an agreeable chatbot (agreeable, neutral, and disagreeable). Whilst our results corroborate a similarity attraction effect between user agreeableness and their preference for the agreeable chatbot, we did not find a reversed relationship with a disagreeable chatbot. Our findings point to a need for moderate instead of extreme chatbot personalities.
Chatbots are considered social actors, with users unconsciously assigning them personalities [Nass et al. 1994]. Similar to human-human interaction, users prefer chatbots with personalities similar to their own, coined the similarity attraction effect [Nass and Lee 2001]. For example in prior work, matching user and chatbot personality had a positive impact on user engagement, users’ self-disclosure, and their willingness to accept the chatbot’s advice [Shumanov et al. 2021, Gwenuch et al. 2020]. In this work, we examine the similarity attraction effect for the personality trait agreeableness which seems particularly interesting for chatbot assistants. Agreeableness is a Big Five personality dimension and describes a tendency to be trustful, genuine, modest, obliging, helpful, and cooperative [McCrae and Costa 2008]. However, it is questionable whether the preference for agreeable chatbots also follows a similarity attraction effect: Whilst agreeable users are likely to favour an agreeable chatbot, disagreeable users might not expect an uncooperative, unhelpful chatbot, given that these characteristics are usually not associated with assistants.
RQ1 Can we synthesise different levels of agreeableness in a chatbot by systematically varying its language style? RQ2 Is there a relationship between user agreeableness and their preference for agreeableness in a chatbot?
To investigate our research questions, we conducted a within-groups lab experiment with 30 participants. In this experiment, participants interacted with three different versions of a chatbot, situated in a film recommender application: An agreeable chatbot, a neutral chatbot, and a disagreeable chatbot. After each interaction, we asked participants to specify their perception of the chatbot’s agreeableness by filling out a standard personality questionnaire [Danner et al. 2016]. Second, participants indicated how much they would like to interact with this chatbot again. At the end of the study, we collected participants’ self-reported level of agreeableness via the same personality questionnaire [Danner et al. 2016]. The survey, in original German language, may be found in this PDF. If you require a translation, please contact the first author.
To imbue the chatbot with personality, we drew upon a plethora of work in psychology and linguistics that has examined how personality is manifested through human language. That is, we leverage verbal cues which are associated with human agreeableness to manipulate the chatbots’ language styles.
The conversation between the user and each of the three chatbots comprises four main parts, as illustrated in the Figure above. First, the chatbot welcomes the user and asks for their name. After the introduction, the chatbots prompt the user with a number of questions to find out more about the user's preferences. These questions were informed by an informal pilot study, during which we asked five streaming service users what questions they would expect from a film recommender chatbot. Four aspects emerged from the interviews: (1) user's preferred genre, (2) available time, (3) mood, and (4) company. After the chatbot has collected the information about the user, it gives a film recommendation based on the user's preferences. The user may either accept the recommendation or ask for another one. The conversation is concluded if either the user accepts the chatbot's film recommendation or the chatbot does not have any more recommendations that match the user's preferences. Finally, the chatbot says goodbye.
The agreeable chatbot agrees with the user on opinions and expresses interpersonal concern. Furthermore, the agreeable chatbot employs positive emotions words such as “nice” or “like”, family-related word such as “together” or “family”, words indicating certainty such as “I’m sure”, as well as blushing and kiss emojis, as informed by previous research.
The neutral chatbot showcases a neutral and polite language, neither expressing positive nor negative emotions in contrast to the other two versions. Moreover, it does not show a reaction to the user’s choices, yet communicates in a respectful and professional way.
The disagreeable chatbot is pugnacious, critical, uncooperative, and does not show any interest in the user. On top of that, it is equipped with negative emotion words such as “bad”, swear words such as “crap”, mannerisms such as “so” and “okay”, along with expressions of anger (e.g., “I’m getting angry.”).
All text modules as expressed for each of three personalities may be found in this PDF. The texts are given in German, in which the study was conducted. If you require a translation, please contact the first author.
The three chatbots were implemented on Botpress, version 10.47.0 in 2018/19. Botpress is an open source development platform for chatbots and is written in JavaScript. To ensure predictable behaviour and a consistent expression of the predefined personalities, we developed rule-based chatbots.
The source is published in this respository on Github. Please note that the current version of the chatbot implementation was intended for internal use only. We publish our source code to make the research accessible and transparent in the spirit of Open Science. That is, the source code is not documented in such a way that it can be easily reused. Please also note that since our implementation, the Botpress architecture has changed. In another research project, we currently implemented a new version of our chatbots using the current Botpress architecture and will publish the source code after completing the research project.
Botpress is a modular development platform, providing developers with a variety of modules for different features. Hence, each chatbot developed with Botpress has a modular software architecture. The figure illustrates how the modules work together during the conversation between a chatbot and a user. The user sends a message via a channel. Botpress chatbots can be placed on different channels, such as Slack, the Facebook Messenger, or be embedded in a website as in this research project. After receiving the user's message, the Natural Language Understanding (NLU) module processes it to extract information from the user's input. This structured data is then forwarded to the Dialogue Manager, which decides what the chatbot will do next. Based on this decision, the chatbot selects the appropriate response message from a database and renders the message for the specific communication channel. This flow is repeated until the end of the conversation. Each of these three components is briefly described below.
Our three chatbots serve as a movie recommender integrated into a website. To help users familiarise with this use case, the chatbots are displayed on a website modeled after the popular streaming service Netflix.
The chatbots use both open and closed questions to converse with the user. For open questions, e.g. asking for the user's name, the user answers via a text. Apart from open questions, the chatbots present the user with closed single-choice questions to display and limit the input options. For example, when asking for the user's preferred genre, the chatbots suggests several genres implemented as buttons from which the user chooses one by clicking it.
To interpret the user's response for open questions, the chatbots use the NLU module provided by Botpress. To this end, we defined several user intents and provided multiple sample utterances for each intent. The set of sample utterances was compiled from several pilot runs in which different users were asked how they would phrase their answer to the chatbots' questions. If detecting one of sample utterances or a similar text, the NLU module maps the user's input to the corresponding intent and extracts meaningful entities. For example, the chatbots prompt the users to specify their company for watching the movie. If the user answers something such as "with my family," "with my dad,", or "I'll watch with my sister," etc. the user's reponse is mapped to the watching-with-family
intent. From the user's answer, the NLU module also extracts the entity of the specific company, e.g. "family," "dad," or "sister." This information is then forwarded to the Dialogue Manager.
Based on the preprocessed user input, the Dialogue Manager decides how the chatbots respond. To this end, we specified the chatbots' rule-based behaviour in a conversation flow by using the Visual Flow Editor Botpress provides. That is, the chatbots go through the predefined conversation flow and execute the next node. That is, the Dialogue manager decides which message to send next based on the conversation flow as well as the interpreted intent. This response message is then retrieved from a JSON
file, which stores all texts, as defined above. The chatbots may either send mulitple messages at once or wait for the user's input before progressing. Finally, the Content Renderer processes the message to adequately display it for the web chat.
For example, at the beginning of the conversation, the Dialogue Manager selects the start-text
to send to the user, which comprises an introductory message and asking for the user's name. The message is rendered by the Content Rendered to display it correctly on the web chat. Following the user's answer with their name, this input is interpreted by the NLU module. Based on the conversation flow, the Dialogue Manager then chooses the next textblock which is sent to the user.
The chatbots' goal is to recommend a film to the user. This film is selected from a small database that we compiled from the German film recommender website Moviepilot. More specifically, we added the three best-rated films for each genre to our database as listed by Moviepilot. We included ten popular genres, such as action, comedy, drama, and horror, from which the user can choose. Hence, the database comprised thirty films in total. The database is realised as a simple JSON
file which stores each film as an object with several key-value properties, namely (1) title, (2) plot summary, (3) suited for watching with family, (4) length, and (5) list of genres. As some films can be assigned to multiple genres, for each genre more than three films may be recommended.
We implemented a simple recommender function that suggests a film from the database based on the user's input. To this end, the chatbots ask the user several questions about their preferred genre, current mood, and company for watching the film, as described above in the conversation flow. The users' answers to these questions are stored in corresponding variables. After a chatbot has gathered all of the user's information, the program goes through the JSON
film database iteratively and selects the first film that matches all of the user's criteria. Following, the chatbot recommends this film to the user by retrieving its title and plot summary from the database. The user can accept the film recommendation or reject it. If the user does not like the recommendation, the chatbot executes the recommender function again but this time selects the second film that matches the user's criteria. If no film can be found any more that matches the user's crtieria, the chatbot informs the user about this and ends the conversation.
Overall, the manipulation was successful, with the agreeable chatbot being perceived as more agreeable than the neutral chatbot, and the disagreeable chatbot. However, participants found the neutral chatbot also rather agreeable. A Greenhouse-Geisser corrected repeated-measures ANOVA underpins these results, pointing to significant differences between the three versions (F(1.67, 48.33) = 381.12, p < .001, η² = 0.93). Pairwise post-hoc tests yielded significant differences between all three pairs (p < .001).
Participants preferred interacting again with the agreeable and neutral chatbots, whilst the desire to chat with the disagreeable version was rather low on average. A Friedman test determined a significant effect of the chatbot on participants’ desire to interact with the chatbot (χ²(2) = 36.94, p < .001). Pairwise Nemenyi post-hoc tests yielded significant differences between the agreeable and disagreeable chatbots (p = .001) as well as between neutral and disagreeable chatbots (p = .001). There was no significant difference between participants’ desire to interact with the agreeable or neutral chatbot. A Spearman's rank correlation demonstrated only a significant, moderate positive relationship between participants’ agreeableness and their preference for the agreeable chatbot (ρ = 0.47, p = .008).