CareBot

June 2, 2021 8 minute read

This is my team’s final project for COSC 89.11 Cognitive Computing with Watson with Professor Charles Palmer. Our team consisted of Abhimanyu Kapur, Yakoob Khan, Ugur Yavuz, and myself. Any sections below labeled ✅ refer to my contributions. Enjoy!

Video (7 minute watch)

You can find a brief demo of CareBot right here.

Motivation

Mental health and wellness problems are an incredibly important issue affecting large parts of the population but often inadequately addressed if at all. In the United States, it is estimated that 40 million adults are affected by anxiety disorders every year and 16 million adults have at least one depressive episode in a given year.¹ The prevalence rate of mental health issues is only increasing and we don’t have the current capacity today to handle it.

Our entire team has a background in mental health and wellness. Our research showed that current digital mental wellness tools require a user to be proactive to benefit from them. Examples of such tools are daily mood journals, guided meditations, podcasts, anxiety reducing exercises, etc. However, such tools require a user to recognize that he/she is in a negative emotional or mental state, and then take action. Often it can be hard to recognize when one is in a negative state, and this requires a high degree of self-awareness and emotional sensitivity. Even our friends might not realize when we’re in a bad place. Therefore, we wanted to build a mental wellness app that could intervene and encourage reflection without need for user interaction.

Business Model

We see a large market for our product, based on the success of existing mental wellness apps such as Headspace. According to market reports, consumers will spend over $1.6 Billion dollars on mental health and wellness apps in 2021.² We present the key points of our business model below:

Subscription Model
- This provides recurring revenue and eliminates the need for us to sell user data for profit. We believe a tiered pricing system would be best, with a starting price of $0.99/month to bring in users. Additional tiers would include more features such as increased social media integration, better insights and visualizations into a user’s mental wellness, custom wellness resources, etc.
3 months Free Trial
- With a service like ours, we need a longer trial period to demonstrate value to the user and create moments of reflection and delight.
Privacy-Focus
- User privacy and safety is very important to us, as a result we plan on never selling or revealing any user data to a third party source, nor will we use it for any purposes except to improve a user’s mental health.

Knowledge Sources ✅

Watson Natural Language Understanding (NLU) is a cloud native product using deep learning to extract metadata from a given text. Such metadata may include entities, categories, emotion, keywords, relations, sentiment, and syntax. For the purposes of CareBot, we consider emotion and sentiment metadata assigned by Watson to a given Tweet. We harness Watson NLU’s predictive capabilities in order to conduct analysis of a user’s tweets, which is largely unstructured data; it would arguably be more challenging to analyze data of unstructured nature without the use of powerful, pre-trained deep-learning-based techniques.

Watson NLU recommends text passed to it for analysis to be of at least 100 characters, which is generally the case for tweets we intend to analyze; additionally, all tweets are assumed to be in English. The out-of-the-box machine learning models harnessed by Watson NLU to conduct sentiment and emotion analysis are said to offer a high degree of accuracy across a variety of content, which we hope to assess via evaluations later in this report. With regards to data collection, note that as non-premium users of Watson API, our data is collected by IBM to use to improve its services for premium users.

To demonstrate Watson NLU in action, we harness live tweets from Elon Musk from the past month to today. In order to evaluate the performance of Watson NLU on tweets, we also require a dataset of tweets with labeled (1) ground truth sentiments (eg. negative, neutral, positive) and (2) ground truth emotions (eg. anger, disgust, fear, joy, sadness). We use the following open-source datasets for evaluation purposes:

Sentiment Datasets

The following datasets are considered, which contain tweets with labeled sentiments. Each of their samples is labeled with “negative,” “neutral,” or “positive” sentiment:

Apple Twitter Sentiment³
Twitter US Airline Sentiment⁴
Sentiment 140 (test set only, since the train set is too big!)⁵
Pre-processed Twitter tweets⁶

Emotion Datasets

The following dataset is considered, which contains tweets with labeled emotions. Each of its samples is labeled with either “anger,” “disgust,” “fear,” “joy,” or “sadness” emotion. Note that it may make sense for a real world sample to contain joy, or some subset of {anger, disgust, fear, sadness}, or perhaps no emotion at all! Yet, the dataset below assigns a single emotion to each sample for simplification purposes.

Twitter Reviews for Emotion Analysis⁷

Miscellaneous

For the front end of CareBot, we use a publicly available ReactJS-based landing page⁸ as a template. The template itself is inspired by another from Free-CSS.⁹

Finally for constructing the message we sent to the users, we used a dataset of 101 famous motivational quotes from GitHub¹⁰ and scraped Spotify links from a playlist called Peaceful Meditation¹¹ using a tool adapted from GitHub called Spotify-to-Json.¹²

Tools and Techniques ✅

Workflow

while True {
    every 1 hour { 
        fetch tweets
        if no new tweets
            continue
        NLU{
            get sentiment and emotions 
            if any(emotion > e_threshold) for emotion in emotions
                    or (sentiment < s_threshold)
                [send message(sentiment, emotion)]
        }
    }
    every week {
        compute average scores for new tweets
        add average weekly scores to DB
        create weekly report
    }
}

Organization

To ensure that the project could be completed in a remote work setting, our group divided up the work into modular pieces. This consisted of the front-end web application, a Flask backend server, and an interface to communicate with the IBM Watson NLU service. Yakoob was primarily building the front-end web application that describes the product. Abhi created the Flask server that contains the API that connects to the Watson NLU service and sends Twilio direct messages to users. Aadil and Ugur worked on pulling users tweets hourly and interfacing with the NLU service to generate sentimental analysis on the data. After all the components were built, the team met to integrate everything together. Overall, we divided the work equally and everyone helped on everything at the end.

Challenges

There were a number of challenges in this project. First, there were challenges in the technical stack as we have never used Flask before. We chose to run the server locally during development and could use Heroku for deployment. Moreover, we had to learn how to use the React Framework for front-end web development to create a professional-looking landing page and the usage of the Twilio API for sending direct messages. Also we initially planned on integrating with Headspace and other wellness services but this proved to be much harder than we thought and we had to give up on this feature. Finally, finding suitable custom labeled datasets for evaluating Watson’s NLU was challenging as well as ensuring that we stayed within the quota limits for using Watson’s free service.

Results/Fair Assessment of the Work ✅

In order to study how well Watson NLU does at identifying sentiment and emotion across real world tweets, we pass samples from the aforementioned labeled datasets to the CareBot workflow for analysis. Since some of these datasets are enormous, and we have a limited number of items that can be processed using the Watson NLU’s free tier, we randomly sample 5% of our combined sentiment datasets stratified across labels (ie. ~300 negative, ~300 neutral, ~300 positive samples) and 10% of our emotion dataset (ie. ~200 anger, ~200 disgust, ~200 fear, ~200 joy, ~200 sadness samples).

For sentiment, we obtain the following classification report. We notice that recall for samples with negative samples is exceptionally high, whilst sacrificing precision. This we believe is ideal, as we’d like to ensure little to no negative sentiments are missed by Watson NLU, even if this means labeling some actually non-negative sentiments as negative by accident. In contrast, positive sentiment precision and recall are higher and lower than that for negative sentiment, respectively:

For emotion, we obtain the following receiver operating characteristic curves (ROCs) for each emotion label. We see that performance across samples having anger and fear are predicted the best by Watson NLU, at least overall:

Finally, after extracting tweets made by Elon Musk from the past month, we show the kind of results that CareBot may report to him. We will send him pie charts weekly that summarize his overall sentiment over the past week. We may also send him a ranking of the top emotions he appeared to show in his past week’s tweets.

Future Work / Improvements

Our team made amazing progress in the short time span given for the project, and plan to continue working on this with the following goals.

Move to mobile app
- We are currently using a web app, but to reach the maximum user base, as well as expand functionality to read phone messages and learn from location data we want to transition to the mobile platform.
Increase social media platforms used
- To reach more users and get a more holistic understanding of their emotional state we want to expand our product to more social media platforms. Specifically we want to to start with adding iMessage, Whatsapp, and Facebook Messenger, as these are among the most used communication platforms in the world.
Integrate with Third-Party Services
- We currently have assembled a list of Spotify meditation and relaxation podcasts which we send to users; however, we would love to integrate and partner with third-party applications such as Headspace, Calm, Breathe, etc. to provide specific services for different situations and emotions to the user.
Entity-Level Problem Recognition
- We currently identify whether sentiment is positive, negative or neutral and if negative, then what type of emotion is present in the text sample. Given Watson NLU’s capacities for entity and relation recognition, our eventual goal is to expand to provide specific feedback and intervention that discuss a user’s specific problems. For example, if a user sends an angry message to their roommate over non-payment of rent, we want to be able to recognize their relationship and why the user is angry, and send a focused message.
Custom Models for each User
- Similar to Google Assistant or Siri, we want to build custom-trained models for each user for a higher degree of personalization and understanding, that tracks user behaviour over a long period of time and adapts responses accordingly.

Aadil Islam