Examples

Running implemented models

from batcore.data import PullLoader
from batcore.tester import RecTester
from batcore.data import MRLoaderData
from batcore.baselines import CN
from batcore.data import get_gerrit_dataset

# reloads saved data from the checkpoint
data = MRLoaderData().from_checkpoint('projects/openstack')

# gets dataset for the CN model. Pull request with more than 56 files are removed
dataset = get_gerrit_dataset(data, max_file=56, model_cls=CN)

# creates an iterator over dataset that iterates over pull request one-by-one
data_iterator = PullLoader(dataset, 10)

# creates a CN model. dataset.get_items2ids() provides model with necessary encodings
# (eg. users2id, files2id) for optimization of evaluation
model = CN(dataset.get_items2ids())

# create a tester object
tester = RecTester()

# run the tester and receive dict with all the metrics
res = tester.test_recommender(model, data_iterator)

Loading dataset from MRLoader output

from batcore.data import MRLoaderData

data = MRLoaderData('path', # path to the directory containing output of MRLoader
                     bots='', # path to file with bots or 'auto'
                     project_name='', # name of the project for in case of auto bot detection
                     from_checkpoint=False, # when true reloads saves data
                     from_date=datetime(), # all events before are removed
                     to_date=datetime(), # all events after are removed
                     factorize_users=True, # when true users are replaced withs numerical ids
                     alias=True, # when true users with close names/emails/logins are treated as one
                     remove_bots=True # when true bots are removed from the data
                    )

Creating Dataset from GerritLoader

from batcore.data import StandardDataset

dataset = StandardDataset(data, # instance of MRLoaderData
                          max_file=100, # number of maximum files in a pull request
                          commits=False, # if true commits are included in the dataset
                          comments=False, # if true comments are included in the dataset
                          user_items=False, # if true makes a user2id map
                          file_items=False, # if true makes a file2id map
                          pull_items=False, # if true makes a pull2id map
                          remove_empty=False, # if true pull requests w/out reviewers are removed
                          owner_policy='', # strategy for identification of the author of the pull request
                          remove=[] # list of columns/features that will be removed
                         )

Creating new model

To create new mode one can simply implement abstract class RecommenderBase with fit and predict methods.

fit is a methods that trains the model on the given data
predict is methods that returns list of candidates to review give pull request pull

For any model implementing those two methods testing can be done the same way as with implemented baselines. The only exception is changing dataset initialization from get_gerrit_dataset to manual initialization. A simple example of the recommender implementation can be found below:

from batcore.modelbase import RecommenderBase
import numpy as np

class SimpleRecommender(RecommenderBase):
    def __init__(self):
        super().__init__()
        self.reviewers = []

    def predict(self, pull, n=10):
        return [np.random.choice(self.reviewers)]

    def fit(self, data):
        self.reviewers.extend(event['reviewer'])