A Statistical Analysis of Racism Across 10,790 Replays
Register

User Tag List

Results 1 to 35 of 35

Threaded View

  1. ISO #1

    A Statistical Analysis of Racism Across 10,790 Replays

    The Dataset
    The dataset consists of 10,790 replays spanning a period of over 8 years, with the majority of the replays occurring between April 2017 and June 2020.



    Among these 10,790 games, there were 15,769 unique players. Below is the distribution of players based on the number of games they have played within the dataset. This is an important distinction, a player could have played 1000 games of mafia, but if they only appear in a single replay within the dataset, then they will appear in the "1 game played within dataset" category.



    Slur Usage by Player - Number of Slurs
    In this section the total number of messages a player sent that contain a slur are aggregated for that player across all games in which they appear in the dataset.

    For example, a player with the following chat logs would have a score of 3. Even though one of their messages contains two slurs, this is not double-counted to avoid skewing the dataset.
    Code:
    Game 1:
    Player: {slur} {slur}
    Player: hello world
    Player: {slur}
    Code:
    Game 2:
    Player: {slur}


    Zooming in on just the top 5% of the previous graph:



    Slur Usage by Player - Number of Games
    An alternative way to analyze the data is to count the number of games in which a given player used a slur.

    For example, a player that has played 1000 games in the dataset where they used 0 slurs in 999 of those games, but used a slur 200 times in a single game would appear in the top category in the previous section, but would only appear in the "1 game in which player used a slur" category in this section.



    Again, zooming in on the top 5%:



    Slur Usage by Game
    This section maps the distribution of games based on the number of messages containing slurs sent by all players within the game.

    The following game would have a score of 4 as there were 4 messages sent that contain slurs.
    Code:
    Player 1: Hello World
    Player 2: {slur} {slur}
    Player 2: {slur} {slur} {slur}
    Player 3: {slur}
    Player 4: Hello World
    Player 2: {slur}


    Preliminary Observations
    • 90.95% of players never used a slur across all of their games
    • 96.11% of players never used a slur in more than one game
    • The top 1.89% of players based on number of slurs used contribute 89.20% of all slurs used



    This raises the question: what would the statistics look like if these repeat offenders were filtered from the dataset?

    Slur Usage by Game - Filtering Chronic Offenders
    Players that have used a slur in more than 3 games are flagged as "chronic offenders" for the purposes of the filter.

    These chronic offenders make up the top 1.89% (denoted as Top 2% in graphics for the sake of brevity) of players in the dataset by number of games in which they used a slur.

    Two different filtering methods are used.

    Method 1: Ignore all messages sent by players in the top 1.89%
    In this method all messages sent by the filtered players are simply ignored.

    Method 2: Remove games from the dataset in which a filtered player used a slur
    The hypothesis behind this method is that there are knock-on effects from the usage of slurs by these chronic offenders - a chronic offender using slurs in game is likely to increase the probability that a non-chronic offender uses a slur in that game as well.

    Take the following chat log:
    Code:
    Player 1: {slur} {slur} {slur}
    Player 2: Player 1, please don't say "{slur}"
    In Method 1, Player 2 would still be flagged for using a slur. Therefore Method 2 seeks to reduce this factor by omitting games entirely in which filtered players used slurs.

    This method filtered out 30.19% of the games from the dataset (3257 out of 10790)



    Zooming in on the top 10%:



    Unrelated Offenses of Chronic Offenders
    This section is motivated behind the hypothesis that chronic offenders with respect to slur usage also tend to be problematic players in other aspects of the game.

    The metric used to test this hypothesis measured the number of games in which a player used a slur above the average rate.

    This metric accomplishes two things:
    1. It ignores players that have a high Slur Games / Total Games ratio simply due to the fact that they have played a low number of games. For example a player with 3 games in the data set in which they used a slur in all 3 games would have a 100% slur games / total games ratio - but the actual impact of their slur usage on the community itself is minimal as they only appeared in 3 games.
    2. It ignores players that have a high number of slur games only by virtue of having a high number of games. For example a player with 18 slur games would rank in the top 0.5% of players based on number of games in which they used a slur, but if this player has 1000 games within the dataset, they are still well below the average rate of slur usage.


    After ordering the players using this metric of number of slur games above the average rate:
    • Every player in the top 20 has had a report filed against them
    • 18 of the top 20 players have had a report approved against them for unrelated offenses
    • 16 of the top 20 players have been watchlisted for unrelated offenses
    • 7 of the top 12 players have been banlisted for unrelated offenses
    • 4 of the top 12 players have been permabanned for unrelated offenses


    Unfortunately these statistics need to be checked manually, and therefore a more comprehensive analysis is not viable at this time. However these preliminary results are strong evidence in favor of the hypothesis that chronic offenders when it comes to slur usage are also chronic offenders when it comes to other offenses.

    (Thank you to the staff for helping with these statistics!)

    Key Observations
    • The overwhelming majority of players do not use slurs at all
      • 90.95% of players used zero slurs across all their game
      • Only 3.18% of players used more than 3 slurs across all of their games
      • Only 1.89% of players used a slur in more than 3 of their games
    • The racism problem in the game is caused almost entirely by a tiny minority of players
      • The top 1.89% of players based on number of slurs used contribute 89.20% of all slurs used
      • When you ignore messages sent by these players then the number of games with zero slurs increases from 63.59% to 87.99% and the number of games with more than 1 slur drops to only 4.68%
      • When you filter out games with slur usage incited by the chronic offenders then the number of games with zero slurs increases further from 87.99% to 91.8% and the number of games with more than 1 slur drops further to only 3.27%
    • The chronic offenders when it comes to slur usage also tend to be chronic offenders when it comes to other rule violations
      • Every player in the top 20 has had a report filed against them
      • 18 of the top 20 players have had a report approved against them for unrelated offenses
      • 16 or the top 20 players have been watchlisted for unrelated offenses
      • 7 of the top 12 players have been banlisted for unrelated offenses
      • 4 of the top 12 players have been permabanned for unrelated offenses
    Last edited by Lumi; June 15th, 2021 at 09:13 AM.

 

 

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •