For this assignment, we will be working on understanding the behaviors and characteristics of people who use a digital application. The product offers recommendations on nearby attractions, restaurants, and businesses based on the user’s location. This includes a free version for any user along with a subscription model that provides more customized recommendations for users who pay for the service.
With free installation on a mobile device, digital applications have a low barrier to entry. They also experience high rates of attrition, as users may not continue to log in. With this in mind, the company is interested in better understanding the early experience of users with the application. A time point of 30 days was selected as an important milestone. Which factors might impact whether new users remain active beyond 30 days? Who is likely to subscribe within 30 days?
The company would benefit from analyzing the available data to understand the current trends.
Data
To begin to investigate these questions, the company has gathered some simple information about new users of the application. A simple random sample of users was taken by gathering information in the company’s database. The sample was limited only to users who first installed the application in the last 6 months, when a new version of the application was released. The sample was further limited to users who signed up and had enough time for the company to measure its key milestones. To ensure reasonable comparisons, the data were limited to users in Australia, Canada, United Kingdom, and the United States, which were deemed appropriately similar in terms of their linguistic and economic characteristics.
For each user, basic information (age group, gender, and country) was collected from the user’s profile. Then the following characteristics were measured:
- daily_sessions: This is the average number of sessions per day in the first 30 days for each user. One session consists of a period of active use as measured by the company’s database. Then the daily sessions for a user is the total number of sessions for the period divided by 30.
- subscribed_30: This measure (TRUE/FALSE) indicates whether the user paid for any subscription service within 30 days.
- active_30: This measures (TRUE/FALSE) whether the user remained active at 30 days. The company decided to measure this by identifying whether the user had at least one active session in the 7-day period after the first 30 days.
Instructions
Based upon the information above and the data provided, please answer the following questions. When numeric answers are requested, a few sentences of explanation should also be provided. Please show your code for any calculations performed.
Materials and Data
Templates
Data
Assessment
You will be assessed on the accuracy and thoughtfulness of your responses.
- Questions 1-20: 5 points apiece
Questions
- We are interested in the question of whether female users have higher rates of daily sessions than other users do. What kind of parameter should we select as our metric for each group?
- Use the data to estimate the values of your selected parameter for female users and for other users.
- Does there appear to be an observed difference between the groups? Without performing statistical tests, would you consider this difference to be meaningful for the business? Explain your answer.
- Which statistical test would be appropriate for testing the two groups for differences in their daily sessions according to your selected metric?
- How many samples are included in your selected statistical test?
- How many tails are considered in your selected statistical test?
- Perform your selected statistical test. Report a p-value for the results.
- How would you interpret this finding for the product’s managers of the digital application? Make sure to frame the result in terms that will be meaningful for their work.
- The product’s managers are also interested in the age groups that tend to use the product and how they vary by country. Create a table with the following characteristics:
- Each row represents an age group.
- Each column represents a country
- Each listed value shows the number of users of that age group within that country.
- Now convert the previous table of counts by age group and country into percentages. However, we want the percentages to be calculated separately within each country. Show the resulting table as percentages (ranging from 0 to 100) rounded to 1 decimal place.
- Without performing any statistical tests, do you think that each country has a similar distribution of users across the age groups? Explain why or why not.
- Which statistical test would help you determine if there are age-based differences across these countries? Explain why you selected this test.
- What is the value of the test statistic for your selected test? Calculate this answer independently without using an existing testing function. (You may use such a function to check your answer.) Show your code along with the result.
- What is the p-value for this test? Calculate this answer independently without using an existing testing function. (You may use such a function to check your answer.) Show your code along with the result.
- How would you interpret this finding for the product’s managers of the digital application? Make sure to frame the result in terms that will be meaningful for their work.
- Canada and the United States are geographically connected and often having overlapping media markets. We can place them in one group and compare them to a second group with Australia and the United Kingdom. Do these two groups have similar rates of users who remain active at 30 days? Perform a statistical test, explain why you selected it, and interpret the results.
- The application’s managers would like to study the relationship between daily sessions and subscriptions. Anecdotally, they think that having at least 1 session per day could be a meaningful indicator. Using the outcome of subscriptions at 30 days, compare the rates of subscriptions for users with at least 1 daily session to those with fewer. Perform a statistical test, explain the reasons for your selection, and interpret the results.
- What type of study was conducted? Are there any concerns about the analyses based upon the method of experimentation?
- How actionable are the findings of this analysis? Do the independent variables help us to make choices about how to improve the outcomes of activity and subscription at 30 days?
- What else could you recommend to the managers of the product for improving their preferred outcomes of activity and subscriptions at 30 days? Provide a number of strategic recommendations that are actionable, measurable, and amenable to experimentation.
Submission
Please turn in the following files:
- Output file showing your code, answers, and explanations (.html file).
- Source code (.Rmd file)
Please do not compress or zip these files.