presentation

| | |

Home Page

If you are a FIFA 19 player as me, you’ll like this article. Here we are going to use the kaggle dataset FIFA 19 complete player dataset to find free cost best players for your career mode. To do this, we are going to find the mainly attributes for each player position and then compare the best players with the players with no club to find the bests.


Let’s begin importing the packages that we are going to use: datatable and ggplot2. Then let’s read the data using the fread function provided from the datatable package. Since the file is already in a regular csv format, it’s not necessary to use any other argument.

%% importing packages
   import(data.table)
   import(ggplot2)
%% readind data
   dt <- fread(input = "data.csv")

Let’s start doing some data cleaning.First, let’s remove some columns that are not important for our analysis.

%% - removing some columns
  dt <- dt[, !c("V1", "Photo", "Flag", "Club Logo", "Special", "Body Type", "Real Face", "Jersey Number", "Joined")]

Then, let’s deal with the money columns: Value and Wage. Using the head function, we can see there are some symbols on these columns. There’s the current symbol '€' and the letters 'M' and 'K' indicating the magnitude of the number.

   head(dt[, c("Value", "Wage")])
     Value  Wage
1: €110.5M €565K
2:    €77M €405K
3: €118.5M €290K
4:    €72M €260K
5:   €102M €355K
6:    €93M €340K

Let’s correct these columns to create numeric columns. We are going to use the gsub function to replace the '€' symbol for nothing in both columns.

%% - adjusting money columns
  # - replacing current symbol
  dt[, Value := gsub("€", "", Value)]
  dt[, Wage  := gsub("€", "", Wage )]

To do the magnitude transformation we are going to use de mapply function. The mapply is a function that receive another function, pass one or more vectors as a parameter to that function and return another list with results. First, let’s define our function.

%% - function to transform money columns
  x <- function(Value){
    if (grepl("M", Value)){
      real_value <- as.numeric(gsub("M", "", Value))*1000000
    } else if (grepl("K", Value)){
      real_value <- as.numeric(gsub("K", "", Value))*1000
    } else {
      real_value <- as.numeric(Value)
    }
    return(real_value)
  }

Now, let’s use mapply passing first the Value column and after the Wage column as parameters. Then, we use the result of each mapply to create new columns in the dataset.

%% - using mapply
  real_values <- mapply(x, dt[, Value])
  real_wage   <- mapply(x, dt[, Wage ])
%% - creating new columns
  dt <- cbind(dt, real_values)
  dt <- cbind(dt, real_wage)

Ok, now it’s time to start the analysis. In FIFA, we have a lot of player positions. Using the unique function we can see all of them.

%% -
  unique(dt[, Position])
 [1] "RF"  "ST"  "LW"  "GK"  "RCM" "LF"  "RS"  "RCB" "LCM" "CB"  "LDM" "CAM" "CDM" "LS"  "LCB" "RM"  "LAM" "LM"  "LB"  "RDM" "RW"
[22] "CM"  "RB"  "RAM" "CF"  "RWB" "LWB" ""

These positions are very specific. To create a better analysis, let’s aggregate these functions in more general functions: - GK: For goalkeepers - DEF: To defenders - MID-DEF: To defenders midfielders - MID-ATA: To attackers midfielders - ATA-LAT: To lateral attackers - ATA: To central attackers

To do that, we are going to create a new datatable containing the "transformation". Then, we are going to use the merge function to join the two datatables and give a general position to each player.

%% - creating new datatable
  dt.pos <- data.table("Position" = unique(dt[, Position]))
  dt.pos[, Gen_Position := ""]
dt.pos[Position %in% c("GK"), Gen_Position := "GK"]
dt.pos[Position %in% c("RB", "CB", "LB", "SW", "RWB", "LWB", "RCB", "LCB"), Gen_Position := "DEF"]
dt.pos[Position %in% c("CDM", "CM", "RCM", "LCM", "LDM", "RDM"), Gen_Position := "MID-DEF"]
dt.pos[Position %in% c("LOM", "ROM", "LM", "RM", "LWM", "RWM", "RAM", "LAM", "CAM", "OM"), Gen_Position := "MID-ATA"]
dt.pos[Position %in% c("RW", "LW", "LF", "RF"), Gen_Position := "ATA-LAT"]
dt.pos[Position %in% c("ST", "CF", "RS", "LS"), Gen_Position := "ATA"]
%% - joining the two datatables
  dt <- merge(dt, dt.pos, "Position")

Now we have a dataset containing all the players, values and wages and a general position. Let’s go the attributes. In FIFA we have 34 attributes that defines how good is each player. The players also have an overall that indicates in one number the player level. Let’s create a vector containing all attributes.

%% - attributes
  att <- c("Crossing", "Finishing", "HeadingAccuracy", "ShortPassing", "Volleys", "Dribbling", "Curve", "FKAccuracy", "LongPassing",         "BallControl", "Acceleration", "SprintSpeed", "Agility", "Reactions", "Balance", "ShotPower", "Jumping", "Stamina", "Strength", "LongShots", "Aggression", "Interceptions", "Positioning","Vision","Penalties","Composure", "Marking", "StandingTackle", "SlidingTackle", "GKDiving", "GKHandling", "GKKicking", "GKPositioning", "GKReflexes")

Of course there are attributes that are very important for some position and less important for others. What we are going to do is find, for each general position, which attributes are more important. To do that, we are going to use the top-10 players of each position to calculate the average of each attribute and find the top five.

Let’s use the setorder function to order the datable based on the Gen_Position and Overall columns. Then, we are going to use the .SD trick of the datatable package to retrieve the top ten by position. The .SD is a abbreviation of SubDatatable, and it represent all the subset of datatable defined using the command by = Gen_Position.

  %% - topten of each new position
  setorder(dt, Gen_Position, -Overall)
  dt.topten <- dt[, .SD[c(1:10), c("Name", ..att)], by = Gen_Position]

Now, we can calculate the average of all the attributes for each Gen_Position.

  %% - mean of the attributes of each position
  dt.att <- dt.topten[, cbind("Attribute_Name"  = names(.SD[, c(3:35)]),
                              data.table("Attribute_Value" = colMeans(.SD[, 3:35]))), by = Gen_Position]

Finding the top five of attributes of each position.

  %%
  setorder(dt.att, Gen_Position,-Attribute_Value)
  dt.att_best <- dt.att[, .SD[c(1:5)], Gen_Position]

Let’s take a look on the results.

Gen_Position Attribute_Name Attribute_Value
 1:          ATA    Positioning            91.4
 2:          ATA      Finishing            91.0
 3:          ATA      Reactions            89.5
 4:          ATA      ShotPower            88.2
 5:          ATA      Composure            86.7
 6:      ATA-LAT      Dribbling            92.4
 7:      ATA-LAT    BallControl            92.3
 8:      ATA-LAT        Agility            90.7
 9:      ATA-LAT        Balance            89.1
10:      ATA-LAT         Vision            88.1
11:          DEF StandingTackle            88.6
12:          DEF  Interceptions            88.0
13:          DEF  SlidingTackle            87.4
14:          DEF        Marking            86.0
15:          DEF        Jumping            84.3
16:           GK     GKReflexes            88.9
17:           GK       GKDiving            87.6
18:           GK  GKPositioning            86.0
19:           GK     GKHandling            85.7
20:           GK      Reactions            84.6
21:      MID-ATA    BallControl            87.8
22:      MID-ATA      Reactions            87.5
23:      MID-ATA    Positioning            87.1
24:      MID-ATA      Composure            85.7
25:      MID-ATA      Finishing            85.0
26:      MID-DEF   ShortPassing            89.1
27:      MID-DEF    BallControl            87.8
28:      MID-DEF      Reactions            87.0
29:      MID-DEF    LongPassing            86.7
30:      MID-DEF         Vision            86.6

Using this results we can say, for example, that for a lateral attacker is more important the dribble while for a pure attacker positioning and finishing are the most importants.

Ok, now lets find some players for your FIFA career. Thinking first about money, let’s look to the player who doesn’t have a club and you can sign for free.

  %% - Players without team (Free)
  dt.noteam <- dt[Club == "", ]
  dt.noteam <- dt.noteam[Age <= 25, ]

We have now all the player under-25 with no club in FIFA 19. Let’s find the bests for each position based on the attributes we find. We have to create datatables for each position. Then, we get only the five main attributes for each one and finally we calculate the average of these attributes. Finally we pick the top five based on this average.

  %%GK
  a <- dt.att_best[Gen_Position == "GK", Attribute_Name]
  dt.gk <- dt.noteam[Gen_Position == "GK", c("Name", ..a)]
  dt.gk[, Rate := rowMeans(dt.gk[, c(2:5)])]
  setorder(dt.gk, -Rate)
  dt.gk <- dt.gk[c(1:5)]
%% DEF
a <- dt.att_best[Gen_Position == "DEF", Attribute_Name]
dt.def <- dt.noteam[Gen_Position == "DEF", c("Name", ..a)]
dt.def[, Rate := rowMeans(dt.def[, c(2:5)])]
setorder(dt.def, -Rate)
dt.def <- dt.def[c(1:5)]
%% MID-DEF
a <- dt.att_best[Gen_Position == "MID-DEF", Attribute_Name]
dt.mid_def <- dt.noteam[Gen_Position == "MID-DEF", c("Name", ..a)]
dt.mid_def[, Rate := rowMeans(dt.mid_def[, c(2:5)])]
setorder(dt.mid_def, -Rate)
dt.mid_def <- dt.mid_def[c(1:5)]
%% MID-ATA
a <- dt.att_best[Gen_Position == "MID-ATA", Attribute_Name]
dt.mid_ata <- dt.noteam[Gen_Position == "MID-ATA", c("Name", ..a)]
dt.mid_ata[, Rate := rowMeans(dt.mid_ata[, c(2:5)])]
setorder(dt.mid_ata, -Rate)
dt.mid_ata <- dt.mid_ata[c(1:5)]
%% ATA-LAT
a <- dt.att_best[Gen_Position == "ATA-LAT", Attribute_Name]
dt.ata_lat <- dt.noteam[Gen_Position == "ATA-LAT", c("Name", ..a)]
dt.ata_lat[, Rate := rowMeans(dt.ata_lat[, c(2:5)])]
setorder(dt.ata_lat, -Rate)
dt.ata_lat <- dt.ata_lat[c(1:5)]
%% ATA
a <- dt.att_best[Gen_Position == "ATA", Attribute_Name]
dt.ata <- dt.noteam[Gen_Position == "ATA", c("Name", ..a)]
dt.ata[, Rate := rowMeans(dt.ata[, c(2:5)])]
setorder(dt.ata, -Rate)
dt.ata <- dt.ata[c(1:5)]

We have now the top-five Under-25 players with no club in FIFA 19. Let’s compare all these player with some of the best Under-25 players in the world and see how good are these players. we are going to use the same code, changing the datatable dt.noteam to the datatable dt[Age ⇐ 25, ] to include all U-25 players. Then, we are going to get the top 20 and compute the average of the same attributes and include the results in the datatable of each position as "U25 Average Player".

%% --
  dt.young <- dt[Age <= 25, ]
%% GK
a <- dt.att_best[Gen_Position == "GK", Attribute_Name]
dt.gk_all <- dt.young[Gen_Position == "GK", c("Name", ..a)]
dt.gk_all[, Rate := rowMeans(dt.gk_all[, c(2:5)])]
setorder(dt.gk_all, -Rate)
dt.gk_all <- dt.gk_all[c(1:20)]
dt.gk_plot <- rbind(dt.gk, t(c(Name = "Best U25 Average", colMeans(dt.gk_all[, c(2:7)]))))
%% DEF
a <- dt.att_best[Gen_Position == "DEF", Attribute_Name]
dt.def_all <- dt.young[Gen_Position == "DEF", c("Name", ..a)]
dt.def_all[, Rate := rowMeans(dt.def_all[, c(2:5)])]
setorder(dt.def_all, -Rate)
dt.def_all <- dt.def_all[c(1:20)]
dt.def_plot <- rbind(dt.def, t(c(Name = "Best U25 Average", colMeans(dt.def_all[, c(2:7)]))))
%% MID-DEF
a <- dt.att_best[Gen_Position == "MID-DEF", Attribute_Name]
dt.mid_def_all <- dt.young[Gen_Position == "MID-DEF", c("Name", ..a)]
dt.mid_def_all[, Rate := rowMeans(dt.mid_def_all[, c(2:5)])]
setorder(dt.mid_def_all, -Rate)
dt.mid_def_all <- dt.mid_def_all[c(1:20)]
dt.mid_def_plot <- rbind(dt.mid_def, t(c(Name = "Best U25 Average", colMeans(dt.mid_def_all[, c(2:7)]))))
%% MID-ATA
a <- dt.att_best[Gen_Position == "MID-ATA", Attribute_Name]
dt.mid_ata_all <- dt.young[Gen_Position == "MID-ATA", c("Name", ..a)]
dt.mid_ata_all[, Rate := rowMeans(dt.mid_ata_all[, c(2:5)])]
setorder(dt.mid_ata_all, -Rate)
dt.mid_ata_all <- dt.mid_ata_all[c(1:20)]
dt.mid_ata_plot <- rbind(dt.mid_ata, t(c(Name = "Best U25 Average", colMeans(dt.mid_ata_all[, c(2:7)]))))
%% ATA-LAT
a <- dt.att_best[Gen_Position == "ATA-LAT", Attribute_Name]
dt.ata_lat_all <- dt.young[Gen_Position == "ATA-LAT", c("Name", ..a)]
dt.ata_lat_all[, Rate := rowMeans(dt.ata_lat_all[, c(2:5)])]
setorder(dt.ata_lat_all, -Rate)
dt.ata_lat_all <- dt.ata_lat_all[c(1:20)]
dt.ata_lat_plot <- rbind(dt.ata_lat, t(c(Name = "Best U25 Average", colMeans(dt.ata_lat_all[, c(2:7)]))))
%% ATA
a <- dt.att_best[Gen_Position == "ATA", Attribute_Name]
dt.ata_all <- dt.young[Gen_Position == "ATA", c("Name", ..a)]
dt.ata_all[, Rate := rowMeans(dt.ata_all[, c(2:5)])]
setorder(dt.ata_all, -Rate)
dt.ata_all <- dt.ata_all[c(1:20)]
dt.ata_plot <- rbind(dt.ata, t(c(Name = "Best U25 Average", colMeans(dt.ata_all[, c(2:7)]))))

Now we have the top five players with no clubs compared with the best ones. Let’s plot the results and see what we have.

  %% GK
  p_gk <- ggplot(data = dt.gk_plot, aes(x = V2, y = as.numeric(V1), fill = factor(Name, levels = rev(unique(dt.gk_plot[, Name]))))) +
          geom_bar(stat = "identity", position = "dodge") +
          ggtitle("Goalkeeper Attributes Comparison") +
          xlab("Attribute") +
          ylab("Attribute Value") + scale_y_continuous(breaks = seq(0, 100, 10)) +
          scale_fill_discrete("Player",breaks = rev(unique(dt.gk_plot[, Name]))) +
          theme(axis.text.x = element_text(angle = 45, hjust = 1))
%% DEF
p_def <- ggplot(data = dt.def_plot, aes(x = V2, y = as.numeric(V1), fill = factor(Name, levels = rev(unique(dt.def_plot[, Name]))))) +
         geom_bar(stat = "identity", position = "dodge") +
         ggtitle("Defensor Attributes Comparison") +
         xlab("Attribute") +
         ylab("Attribute Value")  + scale_y_continuous(breaks = seq(0, 100, 10)) +
         scale_fill_discrete("Player",breaks = rev(unique(dt.def_plot[, Name]))) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
# MID DEF
p_mid_def <- ggplot(data = dt.mid_def_plot, aes(x = V2, y = as.numeric(V1), fill = factor(Name, levels = rev(unique(dt.mid_def_plot[, Name]))))) +
             geom_bar(stat = "identity", position = "dodge") +
             ggtitle("Defensive Midfield Attributes Comparison") +
             xlab("Attribute") +
             ylab("Attribute Value")+ scale_y_continuous(breaks = seq(0, 100, 10)) +
             scale_fill_discrete("Player",breaks = rev(unique(dt.mid_def_plot[, Name]))) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
%% MID ATA
p_mid_ata <- ggplot(data = dt.mid_ata_plot, aes(x = V2, y = as.numeric(V1), fill = factor(Name, levels = rev(unique(dt.mid_ata_plot[, Name]))))) +
             geom_bar(stat = "identity", position = "dodge") +
             ggtitle("Attacker Midfield Attributes Comparison") +
             xlab("Attribute") +
             ylab("Attribute Value")+ scale_y_continuous(breaks = seq(0, 100, 10)) +
             scale_fill_discrete("Player",breaks = rev(unique(dt.mid_ata_plot[, Name]))) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
%% ATA LAT
p_ata_lat <- ggplot(data = dt.ata_lat_plot, aes(x = V2, y = as.numeric(V1), fill = factor(Name, levels = rev(unique(dt.ata_lat_plot[, Name])))))+
             geom_bar(stat = "identity", position = "dodge") +
             ggtitle("Lateral Attacker Attributes Comparison") +
             xlab("Attribute") +
             ylab("Attribute Value")+ scale_y_continuous(breaks = seq(0, 100, 10)) +
             scale_fill_discrete("Player",breaks = rev(unique(dt.ata_lat_plot[, Name]))) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
%% ATA
p_ata <- ggplot(data = dt.ata_plot, aes(x = V2, y = as.numeric(V1), fill = factor(Name, levels = rev(unique(dt.ata_plot[, Name]))))) +
         geom_bar(stat = "identity", position = "dodge") +
         ggtitle("Attacker Attributes Comparison") +
         xlab("Attribute") +
         ylab("Attribute Value")+ scale_y_continuous(breaks = seq(0, 100, 10)) +
         scale_fill_discrete("Player",breaks = rev(unique(dt.ata_plot[, Name]))) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
gk
def
mid_def
mid_ata
ata_lat
ata

Looking to the graphs, we can see all the goalkeepers and defensors with no team are far from the best ones. But in the defensive midfielders the player L. Paredes has similar attributes to the U-25 Average, becoming a great player to contract. Looking to the lateral attackers, the players A. Mitritã and P. Tau looks like good options. Finally, B. Angulo has good attributes comparing with the best U-25 average.