1. Vignette ProjectG5 • ProjectG5

Before anything, to install the package you will need to run :

devtools::install_github("ptds2021/project--G5")

library(ProjectG5)

Welcome to our project about anime !

The aim of the project was to create a package that would enable someone to use a shiny application to find an anime to watch. At first we started by letting the user do some simple filter with his age, type of interest and freetime available which would then return a table composed of all possible anime that are ordered by Popularity which takes into account: user scores, how much it was recently watched, etc.

We then extended the application by providing two other was taken from Ander Fernández Jauregui’s Tutorial and adapting it to our data:

An Item Based Recommendation. This is done through the creation of a user item matrix and through comparison of animes on this matrix, see the function explanations under to better understand.
And the User Based Recommendation that uses the same user item matrix but this time we add rows depending on the user choice and scores and search what similar users liked.

Data from the package

In the package you will find some data. You can see how it was created in the Data wrangling article. We have two tables

Anime

Data set composed of all kept anime with many parameters, see the reference part to check everything. To access it you can use:

anime <- ProjectG5::anime

Anime with ratings

Data set composed of all kept anime with their ratings from all users that we keps, see the reference part to check everything. To access it you can use:

anime_with_ratings <- ProjectG5::anime_with_ratings

Launching the application through R

To launch the application when you ar ein R, simple run :

ProjectG5::anime_finder()

Function walkthrough

We will go through the function by looking at each tab one after the other in order to make it easier for someone to understand what is happening.

Basic tab (I am new)

Let’s first look at the function used in our first simple recommendation tab, the newcommer_recom(). In this function we use the user selection from the application to filter the table and return all animes fitting what he wishes. Note that an anime is selected when one of the gender selected appears in it. Here we take the example of a 15 years old, that likes sport and have 30 minutes before him to watch something it will:

Use the age to filter the right age Rating classes
Into those one select all that have Sport as a type
Finally filter for the ones that are 30 minutes or shorter per episode.

ProjectG5::newcommer_recom(anime, 15, "Sport", 30)
#> # A tibble: 315 x 7
#>    Name                      Genders   Type  Episodes Duration Rating Popularity
#>    <chr>                     <chr>     <chr> <chr>       <int> <chr>       <dbl>
#>  1 Haikyu!!                  Comedy, ~ TV    25             24 PG-13~         50
#>  2 Kuroko's Basketball       Comedy, ~ TV    25             24 PG-13~        104
#>  3 Kuroko's Basketball 2     Comedy, ~ TV    25             24 PG-13~        185
#>  4 Free! - Iwatobi Swim Club Slice of~ TV    12             24 PG-13~        189
#>  5 Yuri!!! On ICE            Comedy, ~ TV    12             23 PG-13~        191
#>  6 Kuroko's Basketball 3     Comedy, ~ TV    25             24 PG-13~        232
#>  7 Haikyuu!!: To the Top     Comedy, ~ TV    13             24 PG-13~        276
#>  8 Fighting Spirit           Comedy, ~ TV    75             23 PG-13~        345
#>  9 Chihayafuru               Drama, G~ TV    25             22 PG-13~        389
#> 10 Free! - Eternal Summer    Slice of~ TV    13             24 PG-13~        450
#> # ... with 305 more rows

Item based recommendation tab (I am an expert)

For this recommendation, we will first need to compute the user item matrix from the anime_with_ratings data, for this one there is no need to add rows:

item_matrix_1 <- ProjectG5::user_item_matrix(anime_with_ratings)

Let’s now pretend that the user selected the anime Naruto:

selected_anime <- "Naruto"

Now that we have all of what we need let’s run the function that will:

Find the anime in the matrix
Filter only for the users that scored it
Then uses the cos_similarity() function to compute the similarity in between the selected anime and all the other ones
Output only the 5 most similar anime

ProjectG5::item_recommendation(selected_item_name = selected_anime,
                               user_item_matrix = item_matrix_1,
                               n_recommendation = 5,
                               data = anime)
#> # A tibble: 5 x 5
#>   item_id Name             Episodes Duration similarity
#>   <chr>   <chr>            <chr>       <int>      <dbl>
#> 1 ID1735  Naruto:Shippuden 500            23      0.911
#> 2 ID269   Bleach           366            24      0.876
#> 3 ID1535  Death Note       37             23      0.862
#> 4 ID16498 Attack on Titan  25             24      0.848
#> 5 ID11757 Sword Art Online 25             23      0.843

User based recommendation tab (I am a judging expert)

For this recommendation, we will first need to compute the user item matrix from the anime_with_ratings data, but this time we will need to add rows:

To do so let’s look at both the selectize_count() and the selectize_names() functions. For this, let’s pretend the user selected the anime Naruto and Death Note.

selectize <- c("Naruto", "Death Note")

count <- ProjectG5::selectize_count(selectize)

names_select <- ProjectG5::selectize_names(selectize)

count
#> [1] 2

names_select
#> [1] "Naruto"     "Death Note"

As you can see those functions are pretty easy but since we use them more than once in the shiny it was usefull to have them like that. Using those results, we create numeric boxes to give the ability to the user to put his own scores into the app. This is done through the function create_numeric_input():

boxes <- ProjectG5::create_numeric_input(selectednames = names_select,
                              selectedcounts = count,
                              min_user = 1,
                              max_user = 10,
                              placeholder = 5,
                              wanted_step = 0.5,
                              id = "weights")
  
boxes

Naruto Score

Death Note Score

As you can see this creates two boxes since there were two anime selected. To recover the scores we then use the score_recovery() function as follow. Note that since we are not in the application and that the input parameter does not exist, this will not recover the weights so we will have to select them arbitrarily:

weights <- ProjectG5::score_recovery(selectedcounts = count, input = input, id = "weights")

weight_to_use <- c(10,5)

Then, with all this we create a table to which we add a user id that is voluntarily extremely high to avoid conflicts with other id’s:

temp_tibble <- tibble::tibble(Name = names_select, rating = weight_to_use)

anime_selected <- dplyr::left_join(anime, temp_tibble, by = c("Name" = "Name"))
anime_selected <- dplyr::filter(anime_selected, Name %in% names_select)
anime_selected <- dplyr::mutate(anime_selected, user_id = 999999999)

We can now compute the user item matrix using user_item_matrix()again, but this time we will add rows with the anime_selected:

user_item_2 <- ProjectG5::user_item_matrix(anime_with_ratings,
                                           adding_row = TRUE,
                                           row_data = anime_selected)

And finally we will run the user_based_recom() function that will:

Select the newly added row with the user scores
Compute it’s similarity to other users with the cos_similarity() function
Select the most similar users depending on the number of nearest neighbors wanted
Filter for the anime with at east one similar user score
And will finally select the 5 anime that were the best graded by similar users

ProjectG5::user_based_recom(userid = 999999999,
                            user_item_matrix = user_item_2,
                            ratings_data = anime_with_ratings,
                            n_recommendation = 5,
                            threshold = 1,
                            nearest_neighbors = 10)
#> # A tibble: 5 x 6
#> # Groups:   item_id, Name, Episodes [5]
#>   item_id Name           Episodes Duration count rating
#>   <chr>   <chr>          <chr>       <int> <int>  <dbl>
#> 1 ID6     Trigun         26             24     3   10  
#> 2 ID2001  Gurren Lagann  27             24     2   10  
#> 3 ID47    AKIRA          1             124     4    9.5
#> 4 ID1     Cowboy Bebop   26             24     2    9.5
#> 5 ID10083 Shiki Specials 2              23     2    9.5