Available Models

class batcore.baselines.ACRec(gamma=60, lambd=0.5, no_owner=True, no_inactive=True, inactive_time=60)

ACRec recommends reviewers based on how much they commented on recent pull requests. For this ACRec looks on previous reviews in a given timeframe and for each comment assigns its commenter a score based on the time passed. Candidates with the best accumulated scores are suggested as reviewers

Paper: Who Should Comment on This Pull Request? Analyzing Attributes for More Accurate Commenter Recommendation in Pull-Based Development

Parameters:
  • gamma – number of days to pass for a pull request to ignored during predictions

  • lambd – time-decaying parameter

  • no_owner – flag to add or remove owners of the pull request from the recommendations

  • no_inactive – flag to add or remove inactive reviewers from recommendations

  • inactive_time – number of consecutive days without any actions needed to be considered an inactive

fit(data)

remembers each pull request and build relation between them and comments

Parameters:

data – a batch of pull requests and comments

predict(pull, n=10)

goes through recent pull requests and accumulates score for each commenter based on the recency of their comment

Parameters:
  • pull – pull requests for which reviwers are required

  • n – number of reviewers to recommend

Returns:

at most n reviewers for the pull request

class batcore.baselines.cHRev(no_owner=True, no_inactive=True, inactive_time=60)

cHRev recommends candidates based on their commenting history. For this xFactor is calculated which measures relative portion and time recency of comments done by a candidate to the files

Paper: Automatically Recommending Peer Reviewers in Modern Code Review

Parameters:
  • no_owner – flag to add or remove owners of the pull request from the recommendations

  • no_inactive – flag to add or remove inactive reviewers from recommendations

  • inactive_time – number of consecutive days without any actions needed to be considered an inactive

fit(data)

performs necessary updates

predict(pull, n=10)

scoring of candidates is performed by xFactor (equation 1-2 from the paper)

Parameters:
  • pull – pull requests for which reviewers are required

  • n – number of reviewers to recommend

Returns:

at most n reviewers for the pull request

class batcore.baselines.RevFinder(items2ids, max_date=100, no_owner=True, no_inactive=True, inactive_time=60)

RevFinder suggest possible reviewers based on their previous reviews of the similar files. In RevFinder there are 4 different file similarities metrics. For each metric list of suggestions is calculated, and then they are combined into one

Paper: Who Should Review My Code? A File Location-Based Code-Reviewer Recommendation Approach for Modern Code Review

Parameters:
  • items2ids – dict with all possible reviewers

  • max_date – time in days after which old reviews stop influence predictions

  • no_owner – flag to add or remove owners of the pull request from the recommendations

  • no_inactive – flag to add or remove inactive reviewers from recommendations

  • inactive_time – number of consecutive days without any actions needed to be considered an inactive

fit(data)

adds reviews into a history buffer

predict(pull, n=10)
Parameters:

n – number of reviewers to recommend

class batcore.baselines.RevRec(items2ids, k=0.5, ga_params=None, no_owner=True, no_inactive=True, inactive_time=60)

RevRec finds best set of reviewers based on the two metrics: group expertise on modified files and amount of collaborations with pull request submitter. The search for the best set is performed via genetic algorithm

Paper: Search-Based Peer Reviewers Recommendation in Modern Code Review

Parameters:
  • items2ids – dict with users2ids

  • k – threshold for files similarity

  • ga_params

    dict of hyperparameters for genetic algorithm

    ga_parameter max_rev:

    maximum number of reviewers to recommend. default=10

    ga_parameter min_rev:

    minimum number of reviewers to recommend. default=1

    ga_parameter size:

    population size. default=20

    ga_parameter prob:

    mutation probability. default=0.1

    ga_parameter max_eval:

    number of genetic algorithm iterations. default=100

    ga_parameter n:

    number of best solutions that contribute to the sorting of the best reviewers. default=10

    ga_parameter alpha:

    weight of the reviewer expertise score. default=0.5

    ga_parameter beta:

    weight of the reviewer collaboration score. default=0.5

  • no_owner – flag to add or remove owners of the pull request from the recommendations

  • no_inactive – flag to add or remove inactive reviewers from recommendations

  • inactive_time – number of consecutive days without any actions needed to be considered an inactive

fit(data)

builds a collaboration graph, and remembers comment interactions between developers and files

get_rc_score(candidate, owners)

counts collaboration score

get_re_score(candidate, expertise)

counts expertise score

get_scores(population, expertise, owners)
Returns:

for each candidate in :param population: counts expertise and collaboration score with weights

new_gen(population, parents_ids)
Parameters:
  • population – current population

  • parents_ids – ids of current candidates that will be produce next generation

Returns:

new generation

predict(pull, n=10)
Parameters:
  • pull – pull requests for which reviwers are required

  • n – number of reviewers to recommend

Returns:

at most n reviewers for the pull request

run_ga(owners, expertise)

runs a genetic algorithm to get reviewers recommendations

Parameters:
  • owners – owners of the pull requests for which are recommendations are made

  • expertise – expertise of the potential candidates

Returns:

best set of reviewers and list of occurrences of each reviewer in the last population

set_banned(pull)

sets self.banned to a binary mask of candidates that won’t be recommended

update_time(events)

for all the participants in each event updates time of most recent action :param events: batch of events

class batcore.baselines.Tie(item_list, text_splitter=<function Tie.<lambda>>, alpha=0.7, max_date=100, no_owner=True, no_inactive=True, inactive_time=60)

Tie recommends reviewers based on file paths and the title. Each candidate is assigned two scores. One is based on path distance between files in current pr and previously reviewed file. Second is a score from naive Bayes classifier trained on the titles of prs.

Paper: Who Should Review This Change? Putting Text and File Location Analyses Together for More Accurate Recommendations

Parameters:
  • item_list – dict with word_list and reviewer_list

  • text_splitter – a function to parse pull comments

  • alpha – weight between path-based and text-based recommenders

  • max_date – time in days after which reviews are not considered

  • no_owner – flag to add or remove owners of the pull request from the recommendations

  • no_inactive – flag to add or remove inactive reviewers from recommendations

  • inactive_time – number of consecutive days without any actions needed to be considered an inactive

bayes_score(pull, reviewer_index)

Assigns score to each candidate based on naive bayes classifier trained on pull titles

fit(data)

Updates the state of the model with an input review.

predict(pull, n=10)

Recommends appropriate reviewers of the given review. This method returns max_count reviewers at most.

Parameters:

n – number of candidates to return

update_pull(pull)

turns title into bag of words vector and replaces reviewers with their ids

class batcore.baselines.WRC(items2ids, delta=1, no_owner=True, no_inactive=True, inactive_time=60)

WRC recommends reviewers based on the file path similarity measures. Files contribute to the final score not only based on their similary but also on their recency and size of their pull request.

dataset - StandardDataset(data, user_items=True, file_items=True)

Paper: Automatically Recommending Code Reviewers Based on Their Expertise: An Empirical Comparison

Parameters:
  • items2ids – dict with user2id and file2id

  • delta – time decay factor for weight of the previous reviews

  • no_owner – flag to add or remove owners of the pull request from the recommendations

  • no_inactive – flag to add or remove inactive reviewers from recommendations

  • inactive_time – number of consecutive days without any actions needed to be considered an inactive

fit(data)

updates wrc matrix

predict(pull, n=10)

counts sum of all wrc score for each possible reviewer and files in the pull

Parameters:

n – number of reviewers to recommend

class batcore.baselines.xFinder(no_owner=True, no_inactive=True, inactive_time=60)

xFinder recommends candidates based on their committing history. For this xFactor is calculated which measures relative portion and time recency of commits done by a developer to the files

dataset - StandardDataset(data, commits=True)

Paper: Assigning change requests to software developers

Parameters:
  • no_owner – flag to add or remove owners of the pull request from the recommendations

  • no_inactive – flag to add or remove inactive reviewers from recommendations

  • inactive_time – number of consecutive days without any actions needed to be considered an inactive

fit(data)

performs necessary updates

predict(pull, n=10)

scoring of candidates is performed by xFactor (equations from page 8)

Parameters:
  • pull – pull requests for which reviwers are required

  • n – number of reviewers to recommend

Returns:

at most n reviewers for the pull request

class batcore.baselines.CN(items2ids, lambd=0.5, no_owner=True, no_inactive=True, inactive_time=60)

CN recommends reviewers based on their comments on previous reviews. For this a Comment Network (weighted directed graph is constructed). Vertices are developers in the project and edges represents weighted number of reviewing interactions. Scores to each candidate are assigned based on the distance in graph

Paper: Reviewer Recommendation for Pull-Requests in GitHub: What Can We Learn from Code Review and Bug Assignment?

Parameters:
  • items2ids – dict with user2id

  • lambd – weight decay coefficient for comments within a single review beyond the first one

  • no_owner – flag to add or remove owners of the pull request from the recommendations

  • no_inactive – flag to add or remove inactive reviewers from recommendations

  • inactive_time – number of consecutive days without any actions needed to be considered an inactive

fit(data)

updates CN graph and supporting characteristics

predict(pull, n=10)

recommends reviewers based on owner of the pull request

predict_apriori(i, k=10)

Predict apriori suggest reviewers for the prs with owners that had commented previously on other prs

predict_community(i, k=10)

Suggest reviewers for prs of newcomers that had no interactions with others

predict_pac(i, k=10)

PAC prediction recommendations are for the owners that have that had previous pull requests that have been reviewed e.i. their internal degree > 0