Available Models
- class batcore.baselines.ACRec(gamma=60, lambd=0.5, no_owner=True, no_inactive=True, inactive_time=60)
ACRec recommends reviewers based on how much they commented on recent pull requests. For this ACRec looks on previous reviews in a given timeframe and for each comment assigns its commenter a score based on the time passed. Candidates with the best accumulated scores are suggested as reviewers
- Parameters:
gamma – number of days to pass for a pull request to ignored during predictions
lambd – time-decaying parameter
no_owner – flag to add or remove owners of the pull request from the recommendations
no_inactive – flag to add or remove inactive reviewers from recommendations
inactive_time – number of consecutive days without any actions needed to be considered an inactive
- fit(data)
remembers each pull request and build relation between them and comments
- Parameters:
data – a batch of pull requests and comments
- predict(pull, n=10)
goes through recent pull requests and accumulates score for each commenter based on the recency of their comment
- Parameters:
pull – pull requests for which reviwers are required
n – number of reviewers to recommend
- Returns:
at most n reviewers for the pull request
- class batcore.baselines.cHRev(no_owner=True, no_inactive=True, inactive_time=60)
cHRev recommends candidates based on their commenting history. For this xFactor is calculated which measures relative portion and time recency of comments done by a candidate to the files
Paper: Automatically Recommending Peer Reviewers in Modern Code Review
- Parameters:
no_owner – flag to add or remove owners of the pull request from the recommendations
no_inactive – flag to add or remove inactive reviewers from recommendations
inactive_time – number of consecutive days without any actions needed to be considered an inactive
- fit(data)
performs necessary updates
- predict(pull, n=10)
scoring of candidates is performed by xFactor (equation 1-2 from the paper)
- Parameters:
pull – pull requests for which reviewers are required
n – number of reviewers to recommend
- Returns:
at most n reviewers for the pull request
- class batcore.baselines.RevFinder(items2ids, max_date=100, no_owner=True, no_inactive=True, inactive_time=60)
RevFinder suggest possible reviewers based on their previous reviews of the similar files. In RevFinder there are 4 different file similarities metrics. For each metric list of suggestions is calculated, and then they are combined into one
- Parameters:
items2ids – dict with all possible reviewers
max_date – time in days after which old reviews stop influence predictions
no_owner – flag to add or remove owners of the pull request from the recommendations
no_inactive – flag to add or remove inactive reviewers from recommendations
inactive_time – number of consecutive days without any actions needed to be considered an inactive
- fit(data)
adds reviews into a history buffer
- predict(pull, n=10)
- Parameters:
n – number of reviewers to recommend
- class batcore.baselines.RevRec(items2ids, k=0.5, ga_params=None, no_owner=True, no_inactive=True, inactive_time=60)
RevRec finds best set of reviewers based on the two metrics: group expertise on modified files and amount of collaborations with pull request submitter. The search for the best set is performed via genetic algorithm
Paper: Search-Based Peer Reviewers Recommendation in Modern Code Review
- Parameters:
items2ids – dict with users2ids
k – threshold for files similarity
ga_params –
dict of hyperparameters for genetic algorithm
- ga_parameter max_rev:
maximum number of reviewers to recommend. default=10
- ga_parameter min_rev:
minimum number of reviewers to recommend. default=1
- ga_parameter size:
population size. default=20
- ga_parameter prob:
mutation probability. default=0.1
- ga_parameter max_eval:
number of genetic algorithm iterations. default=100
- ga_parameter n:
number of best solutions that contribute to the sorting of the best reviewers. default=10
- ga_parameter alpha:
weight of the reviewer expertise score. default=0.5
- ga_parameter beta:
weight of the reviewer collaboration score. default=0.5
no_owner – flag to add or remove owners of the pull request from the recommendations
no_inactive – flag to add or remove inactive reviewers from recommendations
inactive_time – number of consecutive days without any actions needed to be considered an inactive
- fit(data)
builds a collaboration graph, and remembers comment interactions between developers and files
- get_rc_score(candidate, owners)
counts collaboration score
- get_re_score(candidate, expertise)
counts expertise score
- get_scores(population, expertise, owners)
- Returns:
for each candidate in :param population: counts expertise and collaboration score with weights
- new_gen(population, parents_ids)
- Parameters:
population – current population
parents_ids – ids of current candidates that will be produce next generation
- Returns:
new generation
- predict(pull, n=10)
- Parameters:
pull – pull requests for which reviwers are required
n – number of reviewers to recommend
- Returns:
at most n reviewers for the pull request
- run_ga(owners, expertise)
runs a genetic algorithm to get reviewers recommendations
- Parameters:
owners – owners of the pull requests for which are recommendations are made
expertise – expertise of the potential candidates
- Returns:
best set of reviewers and list of occurrences of each reviewer in the last population
- set_banned(pull)
sets self.banned to a binary mask of candidates that won’t be recommended
- update_time(events)
for all the participants in each event updates time of most recent action :param events: batch of events
- class batcore.baselines.Tie(item_list, text_splitter=<function Tie.<lambda>>, alpha=0.7, max_date=100, no_owner=True, no_inactive=True, inactive_time=60)
Tie recommends reviewers based on file paths and the title. Each candidate is assigned two scores. One is based on path distance between files in current pr and previously reviewed file. Second is a score from naive Bayes classifier trained on the titles of prs.
- Parameters:
item_list – dict with word_list and reviewer_list
text_splitter – a function to parse pull comments
alpha – weight between path-based and text-based recommenders
max_date – time in days after which reviews are not considered
no_owner – flag to add or remove owners of the pull request from the recommendations
no_inactive – flag to add or remove inactive reviewers from recommendations
inactive_time – number of consecutive days without any actions needed to be considered an inactive
- bayes_score(pull, reviewer_index)
Assigns score to each candidate based on naive bayes classifier trained on pull titles
- fit(data)
Updates the state of the model with an input review.
- predict(pull, n=10)
Recommends appropriate reviewers of the given review. This method returns max_count reviewers at most.
- Parameters:
n – number of candidates to return
- update_pull(pull)
turns title into bag of words vector and replaces reviewers with their ids
- class batcore.baselines.WRC(items2ids, delta=1, no_owner=True, no_inactive=True, inactive_time=60)
WRC recommends reviewers based on the file path similarity measures. Files contribute to the final score not only based on their similary but also on their recency and size of their pull request.
dataset - StandardDataset(data, user_items=True, file_items=True)
Paper: Automatically Recommending Code Reviewers Based on Their Expertise: An Empirical Comparison
- Parameters:
items2ids – dict with user2id and file2id
delta – time decay factor for weight of the previous reviews
no_owner – flag to add or remove owners of the pull request from the recommendations
no_inactive – flag to add or remove inactive reviewers from recommendations
inactive_time – number of consecutive days without any actions needed to be considered an inactive
- fit(data)
updates wrc matrix
- predict(pull, n=10)
counts sum of all wrc score for each possible reviewer and files in the pull
- Parameters:
n – number of reviewers to recommend
- class batcore.baselines.xFinder(no_owner=True, no_inactive=True, inactive_time=60)
xFinder recommends candidates based on their committing history. For this xFactor is calculated which measures relative portion and time recency of commits done by a developer to the files
dataset - StandardDataset(data, commits=True)
Paper: Assigning change requests to software developers
- Parameters:
no_owner – flag to add or remove owners of the pull request from the recommendations
no_inactive – flag to add or remove inactive reviewers from recommendations
inactive_time – number of consecutive days without any actions needed to be considered an inactive
- fit(data)
performs necessary updates
- predict(pull, n=10)
scoring of candidates is performed by xFactor (equations from page 8)
- Parameters:
pull – pull requests for which reviwers are required
n – number of reviewers to recommend
- Returns:
at most n reviewers for the pull request
- class batcore.baselines.CN(items2ids, lambd=0.5, no_owner=True, no_inactive=True, inactive_time=60)
CN recommends reviewers based on their comments on previous reviews. For this a Comment Network (weighted directed graph is constructed). Vertices are developers in the project and edges represents weighted number of reviewing interactions. Scores to each candidate are assigned based on the distance in graph
- Parameters:
items2ids – dict with user2id
lambd – weight decay coefficient for comments within a single review beyond the first one
no_owner – flag to add or remove owners of the pull request from the recommendations
no_inactive – flag to add or remove inactive reviewers from recommendations
inactive_time – number of consecutive days without any actions needed to be considered an inactive
- fit(data)
updates CN graph and supporting characteristics
- predict(pull, n=10)
recommends reviewers based on owner of the pull request
- predict_apriori(i, k=10)
Predict apriori suggest reviewers for the prs with owners that had commented previously on other prs
- predict_community(i, k=10)
Suggest reviewers for prs of newcomers that had no interactions with others
- predict_pac(i, k=10)
PAC prediction recommendations are for the owners that have that had previous pull requests that have been reviewed e.i. their internal degree > 0