Examining User Preference for Agreeableness in Chatbots

Sarah Theres Völkel and Lale Kaya

In Proceedings of the 3rd Conference on Conversational User Interfaces (CUI '21)

Abstract. Recent research suggests that deliberately manipulating a chatbot’s personality and matching it to the user’s personality can positively impact the user experience. Yet, little is known about whether this similarity attraction effect also applies to the personality dimension agreeableness. In a lab experiment, 30 participants interacted with three versions of an agreeable chatbot (agreeable, neutral, and disagreeable). Whilst our results corroborate a similarity attraction effect between user agreeableness and their preference for the agreeable chatbot, we did not find a reversed relationship with a disagreeable chatbot. Our findings point to a need for moderate instead of extreme chatbot personalities.

doi
⋅
cite
⋅
contact

Motivation

Do agreeable users like a chatbot that is trustful, genuine, modest, obliging, helpful, and cooperative whilst disagreeable users prefer a chatbot that is also selfish, manipulative, competitive, conceited, antagonistic, and cynical?

Chatbots are considered social actors, with users unconsciously assigning them personalities [Nass et al. 1994]. Similar to human-human interaction, users prefer chatbots with personalities similar to their own, coined the similarity attraction effect [Nass and Lee 2001]. For example in prior work, matching user and chatbot personality had a positive impact on user engagement, users’ self-disclosure, and their willingness to accept the chatbot’s advice [Shumanov et al. 2021, Gwenuch et al. 2020].

In this work, we examine the similarity attraction effect for the personality trait agreeableness which seems particularly interesting for chatbot assistants. Agreeableness is a Big Five personality dimension and describes a tendency to be trustful, genuine, modest, obliging, helpful, and cooperative [McCrae and Costa 2008]. However, it is questionable whether the preference for agreeable chatbots also follows a similarity attraction effect: Whilst agreeable users are likely to favour an agreeable chatbot, disagreeable users might not expect an uncooperative, unhelpful chatbot, given that these characteristics are usually not associated with assistants.

RQ1 Can we synthesise different levels of agreeableness in a chatbot by systematically varying its language style?
RQ2 Is there a relationship between user agreeableness and their preference for agreeableness in a chatbot?

Research Design

To investigate our research questions, we conducted a within-groups lab experiment with 30 participants. In this experiment, participants interacted with three different versions of a chatbot, situated in a film recommender application: An agreeable chatbot, a neutral chatbot, and a disagreeable chatbot. After each interaction, we asked participants to specify their perception of the chatbot’s agreeableness by filling out a standard personality questionnaire [Danner et al. 2016]. Second, participants indicated how much they would like to interact with this chatbot again. At the end of the study, we collected participants’ self-reported level of agreeableness via the same personality questionnaire [Danner et al. 2016]. The survey, in original German language, may be found in this PDF. If you require a translation, please contact the first author.

Chatbot Dialogues

To imbue the chatbot with personality, we drew upon a plethora of work in psychology and linguistics that has examined how personality is manifested through human language. That is, we leverage verbal cues which are associated with human agreeableness to manipulate the chatbots’ language styles.

Conversation Flow

The conversation between the user and each of the three chatbots comprises four main parts, as illustrated in the Figure above. First, the chatbot welcomes the user and asks for their name. After the introduction, the chatbots prompt the user with a number of questions to find out more about the user's preferences. These questions were informed by an informal pilot study, during which we asked five streaming service users what questions they would expect from a film recommender chatbot. Four aspects emerged from the interviews: (1) user's preferred genre, (2) available time, (3) mood, and (4) company. After the chatbot has collected the information about the user, it gives a film recommendation based on the user's preferences. The user may either accept the recommendation or ask for another one. The conversation is concluded if either the user accepts the chatbot's film recommendation or the chatbot does not have any more recommendations that match the user's preferences. Finally, the chatbot says goodbye.

Personality Manipulation

Excerpts from the chat between the agreeable chatbot and the user: Chatbot: Excellent! I am sure that together we will find something suitable for you. To get to know your film taste better, you can tell me now, which genre you like most or have last seen. -- User: I like comedies. -- Chatbot: Very cool! I also really this genre! Do you fancy this genre today?

Excerpts from the chat between the neutral chatbot and the user: Chatbot: Which of the genres do you like or did you last see? -- User: I like comedies. -- Chatbot: Do you fancy this genre today?

Agreeable Chatbot

Neutral Chatbot

Disagreeable Chatbot

The agreeable chatbot agrees with the user on opinions and expresses interpersonal concern. Furthermore, the agreeable chatbot employs positive emotions words such as “nice” or “like”, family-related word such as “together” or “family”, words indicating certainty such as “I’m sure”, as well as blushing and kiss emojis, as informed by previous research.

The neutral chatbot showcases a neutral and polite language, neither expressing positive nor negative emotions in contrast to the other two versions. Moreover, it does not show a reaction to the user’s choices, yet communicates in a respectful and professional way.

The disagreeable chatbot is pugnacious, critical, uncooperative, and does not show any interest in the user. On top of that, it is equipped with negative emotion words such as “bad”, swear words such as “crap”, mannerisms such as “so” and “okay”, along with expressions of anger (e.g., “I’m getting angry.”).

Dialogues Text Modules

All text modules as expressed for each of three personalities may be found in this PDF. The texts are given in German, in which the study was conducted. If you require a translation, please contact the first author.

Chatbot Implementation

The three chatbots were implemented on Botpress, version 10.47.0 in 2018/19. Botpress is an open source development platform for chatbots and is written in JavaScript. To ensure predictable behaviour and a consistent expression of the predefined personalities, we developed rule-based chatbots.

Source Code

The source is published in this respository on Github. Please note that the current version of the chatbot implementation was intended for internal use only. We publish our source code to make the research accessible and transparent in the spirit of Open Science. That is, the source code is not documented in such a way that it can be easily reused. Please also note that since our implementation, the Botpress architecture has changed. In another research project, we currently implemented a new version of our chatbots using the current Botpress architecture and will publish the source code after completing the research project.

Botpress Architecture

Botpress is a modular development platform, providing developers with a variety of modules for different features. Hence, each chatbot developed with Botpress has a modular software architecture. The figure illustrates how the modules work together during the conversation between a chatbot and a user. The user sends a message via a channel. Botpress chatbots can be placed on different channels, such as Slack, the Facebook Messenger, or be embedded in a website as in this research project. After receiving the user's message, the Natural Language Understanding (NLU) module processes it to extract information from the user's input. This structured data is then forwarded to the Dialogue Manager, which decides what the chatbot will do next. Based on this decision, the chatbot selects the appropriate response message from a database and renders the message for the specific communication channel. This flow is repeated until the end of the conversation. Each of these three components is briefly described below.

Channel

Our three chatbots serve as a movie recommender integrated into a website. To help users familiarise with this use case, the chatbots are displayed on a website modeled after the popular streaming service Netflix.

Natural Language Understanding

The chatbots use both open and closed questions to converse with the user. For open questions, e.g. asking for the user's name, the user answers via a text. Apart from open questions, the chatbots present the user with closed single-choice questions to display and limit the input options. For example, when asking for the user's preferred genre, the chatbots suggests several genres implemented as buttons from which the user chooses one by clicking it.

To interpret the user's response for open questions, the chatbots use the NLU module provided by Botpress. To this end, we defined several user intents and provided multiple sample utterances for each intent. The set of sample utterances was compiled from several pilot runs in which different users were asked how they would phrase their answer to the chatbots' questions. If detecting one of sample utterances or a similar text, the NLU module maps the user's input to the corresponding intent and extracts meaningful entities. For example, the chatbots prompt the users to specify their company for watching the movie. If the user answers something such as "with my family," "with my dad,", or "I'll watch with my sister," etc. the user's reponse is mapped to the watching-with-family intent. From the user's answer, the NLU module also extracts the entity of the specific company, e.g. "family," "dad," or "sister." This information is then forwarded to the Dialogue Manager.

Dialogue Manager

Based on the preprocessed user input, the Dialogue Manager decides how the chatbots respond. To this end, we specified the chatbots' rule-based behaviour in a conversation flow by using the Visual Flow Editor Botpress provides. That is, the chatbots go through the predefined conversation flow and execute the next node. That is, the Dialogue manager decides which message to send next based on the conversation flow as well as the interpreted intent. This response message is then retrieved from a JSON file, which stores all texts, as defined above. The chatbots may either send mulitple messages at once or wait for the user's input before progressing. Finally, the Content Renderer processes the message to adequately display it for the web chat.

For example, at the beginning of the conversation, the Dialogue Manager selects the start-text to send to the user, which comprises an introductory message and asking for the user's name. The message is rendered by the Content Rendered to display it correctly on the web chat. Following the user's answer with their name, this input is interpreted by the NLU module. Based on the conversation flow, the Dialogue Manager then chooses the next textblock which is sent to the user.

Film Recommender Function

The chatbots' goal is to recommend a film to the user. This film is selected from a small database that we compiled from the German film recommender website Moviepilot. More specifically, we added the three best-rated films for each genre to our database as listed by Moviepilot. We included ten popular genres, such as action, comedy, drama, and horror, from which the user can choose. Hence, the database comprised thirty films in total. The database is realised as a simple JSON file which stores each film as an object with several key-value properties, namely (1) title, (2) plot summary, (3) suited for watching with family, (4) length, and (5) list of genres. As some films can be assigned to multiple genres, for each genre more than three films may be recommended.

We implemented a simple recommender function that suggests a film from the database based on the user's input. To this end, the chatbots ask the user several questions about their preferred genre, current mood, and company for watching the film, as described above in the conversation flow. The users' answers to these questions are stored in corresponding variables. After a chatbot has gathered all of the user's information, the program goes through the JSON film database iteratively and selects the first film that matches all of the user's criteria. Following, the chatbot recommends this film to the user by retrieving its title and plot summary from the database. The user can accept the film recommendation or reject it. If the user does not like the recommendation, the chatbot executes the recommender function again but this time selects the second film that matches the user's criteria. If no film can be found any more that matches the user's crtieria, the chatbot informs the user about this and ends the conversation.

Results

A boxplot diagram showing how participants evaluated the three chatbots regarding their perceived level of agreeableness: On a scale from one (disagreeable) to five (agreeable), the agreeable chatbot was rated as highly agreeable (Median = 4.67), the neutral chatbot was rated as rather agreeable (Median = 4.13), and the disagreeable chatbot was rated as disagreeable (Median = 1.42).

Agreeableness Manipulation Check

Overall, the manipulation was successful, with the agreeable chatbot being perceived as more agreeable than the neutral chatbot, and the disagreeable chatbot. However, participants found the neutral chatbot also rather agreeable. A Greenhouse-Geisser corrected repeated-measures ANOVA underpins these results, pointing to significant differences between the three versions (F(1.67, 48.33) = 381.12, p < .001, η² = 0.93). Pairwise post-hoc tests yielded significant differences between all three pairs (p < .001).

A boxplot diagram showing how participants evaluated the three chatbots regarding how much they would like to interact with the chatbot again: On a scale from one (small desire) to five (great desire), the agreeable and neutral chatbots received high ratings (Median = 4.0 for both), whereas participants had only a small desire to interact again with the disagreeable chatbot (Median = 1.0).

Desire to Interact with Chatbot

Participants preferred interacting again with the agreeable and neutral chatbots, whilst the desire to chat with the disagreeable version was rather low on average. A Friedman test determined a significant effect of the chatbot on participants’ desire to interact with the chatbot (χ²(2) = 36.94, p < .001). Pairwise Nemenyi post-hoc tests yielded significant differences between the agreeable and disagreeable chatbots (p = .001) as well as between neutral and disagreeable chatbots (p = .001). There was no significant difference between participants’ desire to interact with the agreeable or neutral chatbot.

A Spearman's rank correlation demonstrated only a significant, moderate positive relationship between participants’ agreeableness and their preference for the agreeable chatbot (ρ = 0.47, p = .008).

Takeaways

Agreeableness can be deliberately manipulated by varying a chatbot's language.

We found a similarity attraction effect between user agreeableness and preference for agreeable chatbot.

Not all participants favour the agreeable chatbot.

There is a need to create and evaluate moderate personalities in conversational agents.

References

Daniel Danner, Beatrice Rammstedt, Matthias Bluemke, Lisa Treiber, Sabrina Berres, Christopher Soto, and Oliver John. 2016. Die deutsche Version des Big Five Inventory 2 (BFI-2). GESIS, Mannheim, Germany. https://doi.org/10.6102/zis247
Ulrich Gnewuch, Meng Yu, and Alexander Maedche. 2020. The Effect of Perceived Similarity in Dominance on Customer Self-Disclosure to Chatbots in Conversational Commerce. In Proceedings of the 28th European Conference on Information Systems (ECIS 2020). AIS, eLibrary (AISeL).
Jacob B. Hirsh, Colin G. DeYoung, and Jordan B. Peterson. 2009. Metatraits of the Big Five Differentially Predict Engagement and Restraint of Behavior. Journal of Personality 77, 4 (2009), 1085–1102. https://doi.org/10.1111/j.1467-6494.2009.00575.x
Robert R. McCrae and Paul T. Costa. 2008. A five-factor theory of personality. In Handbook of Personality: Theory and Research, O.P. John, R.W. Robins, and L.A. Pervin (Eds.). Vol. 3. The Guilford Press, New York, NY, USA, 159–181.
Clifford Nass and Kwan Min Lee. 2001. Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity-attraction, and consistency-attraction. Journal of Experimental Psychology: Applied 7, 3 (2001), 171. https://doi.org/10.1037/1076-898X.7.3.171
Clifford Nass, Jonathan Steuer, and Ellen R. Tauber. 1994. Computers Are Social Actors. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Boston, Massachusetts, USA) (CHI ’94). Association for Computing Machinery, New York, NY, USA, 72–78. https://doi.org/10.1145/191666.191703
Michael Shumanov and Lester Johnson. 2021. Making conversations with chatbots more personalized. Computers in Human Behavior 117 (2021), 106627. https://doi.org/10.1016/j.chb.2020.106627