vignette-G5.Rmd
Before anything, to install the package you will need to run :
devtools::install_github("ptds2021/project--G5")
library(ProjectG5)
The aim of the project was to create a package that would enable someone to use a shiny application to find an anime to watch. At first we started by letting the user do some simple filter with his age, type of interest and freetime available which would then return a table composed of all possible anime that are ordered by Popularity which takes into account: user scores, how much it was recently watched, etc.
We then extended the application by providing two other was taken from Ander Fernández Jauregui’s Tutorial and adapting it to our data:
An Item Based Recommendation. This is done through the creation of a user item matrix and through comparison of animes on this matrix, see the function explanations under to better understand.
And the User Based Recommendation that uses the same user item matrix but this time we add rows depending on the user choice and scores and search what similar users liked.
In the package you will find some data. You can see how it was created in the Data wrangling article. We have two tables
Data set composed of all kept anime with many parameters, see the reference part to check everything. To access it you can use:
anime <- ProjectG5::anime
Data set composed of all kept anime with their ratings from all users that we keps, see the reference part to check everything. To access it you can use:
anime_with_ratings <- ProjectG5::anime_with_ratings
To launch the application when you ar ein R, simple run :
ProjectG5::anime_finder()
We will go through the function by looking at each tab one after the other in order to make it easier for someone to understand what is happening.
Let’s first look at the function used in our first simple recommendation tab, the newcommer_recom()
. In this function we use the user selection from the application to filter the table and return all animes fitting what he wishes. Note that an anime is selected when one of the gender selected appears in it. Here we take the example of a 15 years old, that likes sport and have 30 minutes before him to watch something it will:
ProjectG5::newcommer_recom(anime, 15, "Sport", 30)
#> # A tibble: 315 x 7
#> Name Genders Type Episodes Duration Rating Popularity
#> <chr> <chr> <chr> <chr> <int> <chr> <dbl>
#> 1 Haikyu!! Comedy, ~ TV 25 24 PG-13~ 50
#> 2 Kuroko's Basketball Comedy, ~ TV 25 24 PG-13~ 104
#> 3 Kuroko's Basketball 2 Comedy, ~ TV 25 24 PG-13~ 185
#> 4 Free! - Iwatobi Swim Club Slice of~ TV 12 24 PG-13~ 189
#> 5 Yuri!!! On ICE Comedy, ~ TV 12 23 PG-13~ 191
#> 6 Kuroko's Basketball 3 Comedy, ~ TV 25 24 PG-13~ 232
#> 7 Haikyuu!!: To the Top Comedy, ~ TV 13 24 PG-13~ 276
#> 8 Fighting Spirit Comedy, ~ TV 75 23 PG-13~ 345
#> 9 Chihayafuru Drama, G~ TV 25 22 PG-13~ 389
#> 10 Free! - Eternal Summer Slice of~ TV 13 24 PG-13~ 450
#> # ... with 305 more rows
For this recommendation, we will first need to compute the user item matrix from the anime_with_ratings
data, for this one there is no need to add rows:
item_matrix_1 <- ProjectG5::user_item_matrix(anime_with_ratings)
Let’s now pretend that the user selected the anime Naruto
:
selected_anime <- "Naruto"
Now that we have all of what we need let’s run the function that will:
cos_similarity()
function to compute the similarity in between the selected anime and all the other ones
ProjectG5::item_recommendation(selected_item_name = selected_anime,
user_item_matrix = item_matrix_1,
n_recommendation = 5,
data = anime)
#> # A tibble: 5 x 5
#> item_id Name Episodes Duration similarity
#> <chr> <chr> <chr> <int> <dbl>
#> 1 ID1735 Naruto:Shippuden 500 23 0.911
#> 2 ID269 Bleach 366 24 0.876
#> 3 ID1535 Death Note 37 23 0.862
#> 4 ID16498 Attack on Titan 25 24 0.848
#> 5 ID11757 Sword Art Online 25 23 0.843
For this recommendation, we will first need to compute the user item matrix from the anime_with_ratings
data, but this time we will need to add rows:
To do so let’s look at both the selectize_count()
and the selectize_names()
functions. For this, let’s pretend the user selected the anime Naruto
and Death Note
.
selectize <- c("Naruto", "Death Note")
count <- ProjectG5::selectize_count(selectize)
names_select <- ProjectG5::selectize_names(selectize)
count
#> [1] 2
names_select
#> [1] "Naruto" "Death Note"
As you can see those functions are pretty easy but since we use them more than once in the shiny it was usefull to have them like that. Using those results, we create numeric boxes to give the ability to the user to put his own scores into the app. This is done through the function create_numeric_input()
:
boxes <- ProjectG5::create_numeric_input(selectednames = names_select,
selectedcounts = count,
min_user = 1,
max_user = 10,
placeholder = 5,
wanted_step = 0.5,
id = "weights")
boxes
As you can see this creates two boxes since there were two anime selected. To recover the scores we then use the score_recovery()
function as follow. Note that since we are not in the application and that the input
parameter does not exist, this will not recover the weights so we will have to select them arbitrarily:
weights <- ProjectG5::score_recovery(selectedcounts = count, input = input, id = "weights")
weight_to_use <- c(10,5)
Then, with all this we create a table to which we add a user id that is voluntarily extremely high to avoid conflicts with other id’s:
temp_tibble <- tibble::tibble(Name = names_select, rating = weight_to_use)
anime_selected <- dplyr::left_join(anime, temp_tibble, by = c("Name" = "Name"))
anime_selected <- dplyr::filter(anime_selected, Name %in% names_select)
anime_selected <- dplyr::mutate(anime_selected, user_id = 999999999)
We can now compute the user item matrix using user_item_matrix()
again, but this time we will add rows with the anime_selected
:
user_item_2 <- ProjectG5::user_item_matrix(anime_with_ratings,
adding_row = TRUE,
row_data = anime_selected)
And finally we will run the user_based_recom()
function that will:
cos_similarity()
function
ProjectG5::user_based_recom(userid = 999999999,
user_item_matrix = user_item_2,
ratings_data = anime_with_ratings,
n_recommendation = 5,
threshold = 1,
nearest_neighbors = 10)
#> # A tibble: 5 x 6
#> # Groups: item_id, Name, Episodes [5]
#> item_id Name Episodes Duration count rating
#> <chr> <chr> <chr> <int> <int> <dbl>
#> 1 ID6 Trigun 26 24 3 10
#> 2 ID2001 Gurren Lagann 27 24 2 10
#> 3 ID47 AKIRA 1 124 4 9.5
#> 4 ID1 Cowboy Bebop 26 24 2 9.5
#> 5 ID10083 Shiki Specials 2 23 2 9.5