experimentum.Experiments package

Submodules

experimentum.Experiments.App module

Main Service Container and Provider.

Sets up the framework and runs the experiments and lets you customize/extend the behavior of the framework.

Binding

We can register/bind a new alias by extending either the register_aliases class() or by directly adding the alias to the aliases dictionary. The key is the alias name you want to register and the value is a function that returns an instance of the class:

def register_aliases(self):
    super(MyAppClass, self).register_aliases()

    self.aliases['my_custom_api'] = lambda: API(self.store)

Additional arguments for creating a class instance may be passed when resolving. Your function just has to add them in order to use them:

self.aliases['my_custom_api'] = lambda name, user_id=None: API(self.store, name, user_id)

Resolving

You may use the make() method to resolve a class instance out of the container. The make() method accepts the alias of the class you want to resolve:

api = self.app.make('my_custom_api')

If some of your class’ dependencies are not resolvable via the app container, you may pass them as additional args and keyword args:

api = self.app.make('my_custom_api', foo, user_id=42)

Customizing

Commands

Register new commands with the App.register_commands() method. It should return a dictionary where the keys are names of the commands and the values are the command handlers. The command handlers must either be derived from AbstractCommand or a function with the decorator AbstractCommand.command(). Example return:

{
    'foo': FooCommand,
    'bar': BarCommand
}
class experimentum.Experiments.App.App(name, root)

Bases: object

Main entry point of the framework.

config_path

Defaults to .. Path to config files.

Type:str
base_repository

Defaults to Repository. Repository Base Class

Type:AbstractRepository
name

Name of the app.

Type:str
root

Root Path of the app.

Type:str
config

Config Manager.

Type:Config
log

Logger.

Type:logging.Logger
store

Data Store.

Type:AbstractStore
aliases

Dictionary of aliases and factory functions.

Type:dict
base_repository

alias of experimentum.Storage.SQLAlchemy.Repository.Repository

bootstrap()

Bootstrap the app, i.e. setup config and logger.

config_path = '.'
make(alias, *args, **kwargs)

Create an instance of an aliased class.

Parameters:alias (str) – Name of class alias.
Raises:Exception – if the alias does not exists.
Returns:Instance of the aliased class
Return type:object
register_aliases()

Register aliases for classes.

register_commands()

Register custom cli commands.

Returns:commands to register
Return type:dict
run()

Run the app.

setup_datastore(datastore)

Set up the data store.

Parameters:datastore (dict) – Datastore config.

experimentum.Experiments.DataBag module

In-Memory Container to easily store experiment results on the fly.

Adding Entries

Add the entry {‘foo’: {‘bar’: {‘baz’: 42} }} to the DatBag via a dot-notation supported key:

DataBag.add('foo.bar.baz', 42)

Mergins Entries

Merging dictionary and list entires:

DataBag.add('foo.bar', [1, 2, 3])
DataBag.add('foo.baz', {'a': 1, 'b': 2})
DataBag.merge('foo.bar', [2, 4, 8])
DataBag.merge('foo.baz', {'a': 42, 'c': 3})

# Results in
# {
#   'foo': {
#       'bar': [1, 2, 3, 2, 4, 8],
#       'baz': {'a': 42, 'b': 2, 'c': 3}
#   }
# }

Getting Entries

Add the entry baz of the dictionary {‘foo’: {‘bar’: {‘baz’: 42} }} in the DatBag via a dot-notation supported key:

data = DataBag.get('foo.bar.baz')  # 42
The DataBag also provides a default value if the key does not exist
data = DataBag.get(‘a.b.c’, ‘default value’) # ‘default value’

Deleting Entries

Deleting entries works just like adding/getting entries:

DataBag.delete('foo.bar.baz')  # {'foo': {'bar': {}}}

Flushing Entries

Flushing the DataBag means that you get all the entires (or just a specific key) and then clear the content:

DataBag.add('foo.bar.baz', 42)
print(DataBag.flush('foo.bar.baz))  # prints: 42, contains: {'foo': {'bar': {}}}
print(DataBag.flush())              # prints: {'foo': {'bar': {}}}, contains: {}}
class experimentum.Experiments.DataBag.DataBag

Bases: object

In-Memory Container to easily store experiment results on the fly.

classmethod add(key, value)

Add an item to the DatBag via a dot-notation supported key.

Parameters:
  • key (str) – Key to set (i.e. ‘foo’ or ‘foo.bar.baz’ for a deep key)
  • value (object) – Value to set for key.
classmethod all()

Return content of the DataBag.

Returns:DataBag content.
Return type:dict
classmethod clear()

Clear DataBag content.

classmethod delete(key)

Delete an item from the DataBag via a dot-notation supported key.

Parameters:key (str) – Key to delete (i.e. ‘foo’ or ‘foo.bar.baz’ for a deep key)
Returns:Statuscode, 1: okay, -1: Key not found
Return type:int
classmethod flush(key=None, default=None)

Flush a specific key or all of the databag (i.e. return and delete).

Parameters:
  • key (str) – Key to set(i.e. ‘foo’ or ‘foo.bar.baz’ for a deep key)
  • default (object, optional) – Defaults to None. Default if key does not exist
Returns:

Value of the key

Return type:

object

classmethod get(key, default=None)

Get an item from the DataBag via a dot-notation supported key.

Parameters:
  • key (str) – Key to get (i.e. ‘foo’ or ‘foo.bar.baz’ for a deep key)
  • default (object, optional) – Defaults to None. Default if key does not exist
Returns:

Value of the key

Return type:

object

classmethod merge(key, data)

Merge two dicts or lists.

For example:
  • {‘a’: 1, ‘b’: 2} + {‘a’: 42, ‘c’: 3} => {‘a’: 42, ‘b’: 2, ‘c’: 3}
  • [‘a’, ‘b’, ‘c’] + [‘a’, ‘d’, ‘e’] => [‘a’, ‘b’, ‘c’, ‘a’, ‘d’, ‘e’]
Parameters:
  • key (str) – Key to set (i.e. ‘foo’ or ‘foo.bar.baz’ for a deep key)
  • data (dict|list) – Value to update/extend for key.
experimentum.Experiments.DataBag.del_key(items, data)

Recursively remove deep key from dict.

Parameters:
  • items (list) – List of dictionary keys
  • data (dict) – Portion of dictionary to operate on
Raises:

KeyError – If key does not exist

experimentum.Experiments.DataBag.get_from(items, data)

Recursively get value from dictionary deep key.

Parameters:
  • items (list) – List of dictionary keys
  • data (dict) – Portion of dictionary to operate on
Returns
object: Value from dictionary
Raises:KeyError – If key does not exist
experimentum.Experiments.DataBag.set_to(items, data, value)

Recursively set value to dictionary deep key.

Parameters:
  • items (list) – List of dictionary keys
  • data (dict) – Portion of dictionary to operate on
  • value (object) – Value to set for key.

experimentum.Experiments.Experiment module

Run experiments and save the results for future analysations.

Writing Experiments

Experiments are created in the experiments directory and they must adhere to the following naming convention: {NAME}Experiment.py.

All Experiments extend the Experiment class. Experiments contain a reset() and run() method. Within reset() the method, you should reset/initialize the data structuresand values you want to use in each test run.

The run() method should contain the code you want to test. It should return a dictionary with the values you want to save.

The Reset Method

As mentioned before the reset() should reset/initialize the data structuresand values you want to use in each test run.

Let’s take a look at a basic experiment. Within any of your experiment methods, you always have access to the app attribute which provides access to the main app class and to the config which contains the content of the config_file:

from experimentum.Experiments import Experiment
import random


class FooExperiment(Experiment):

    config_file = 'foo.json'

    def reset(self):
        # use app to create an instance of a custom aliased class
        self.user = self.config.get('user')
        self.some_class = self.app.make('some_class', self.user)
        self.rand = random.randint(0, 10)

The Run Method

As mentioned before the run() method should contain the code you want to test and return a dictionary with the values you want to save.

Let’s take a look at a basic experiment, assuming that you added a rand attribute to your TestCaseRepository with a migration:

from experimentum.Experiments import Experiment
import random


class FooExperiment(Experiment):

    config_file = 'foo.json'

    def run(self):
        with self.performance.point('Task to measure some Rscript algo') as point:
            script = self.call('some_script.r')  # prints json to stdout as return value.
            algo_result = script.get_json()
            script.process.wait()  # Wait for child process to terminate.

        # Add a custom performance entry
        return {
            'rand': self.rand,
            'performances': [{
                'label': 'Custom Rscript measuring',
                'time': algo_result.get('time'),
                'memory': 0,
                'peak_memory': 0,
                'level': 0,
                'type': 'custom'
            }]
        }
class experimentum.Experiments.Experiment.Experiment(app, path)

Bases: object

Run experiments and save the results in the data store.

app

Main Application Class.

Type:App
performance

Performance Profiler.

Type:Performance
config

Hold the experiment configuration.

Type:Config
show_progress

Flag to show/hide the progress bar.

Type:bool
hide_performance

Flag to show/hide the performance table.

Type:bool
config_file

Config file to load.

Type:str
repos

Experiment and Testcast Repo to save results.

Type:dict
boot()

Boot up the experiment, e.g. load config etc.

static call(cmd, verbose=False, shell=False)

Call another script to run algorithms for your experiment.

Warning

Passing shell=True can be a security hazard if combined with untrusted input. See the warning under Frequently Used Arguments for details.

Parameters:
  • cmd (str, list) – Command which you want to call.
  • verbose (bool, optional) – Defaults to False. Print the cmd output or not.
  • shell (bool, optional) – Defaults to False. Specifices whether to use the shell as the program to execute.
Returns:

Executed script to get output from

Return type:

Script

config_file = None
static get_experiments(path)

[DEPRECATED] Get experiment names from exp files/classes.

Parameters:path (str) – Path to experiments folder.
Returns:Names of experiments
Return type:list
static get_status(app)

Get status information about experiments.

Parameters:app (App) – Main Service Provider/Container.
Returns:Dictionary with experiment status
Return type:dict
static load(app, path, name)

Load and initialize an experiment class.

Parameters:
  • app (App) – Main app calss
  • path (str) – Path to experiments folder.
  • name (str) – Name of experiment.
Returns:

Loaded experiment.

Return type:

Experiment

reset()

Reset data structured and values used in the run method.

run()

Run a test of the experiment.

save(result, iteration)

Save the test results in the data store.

Parameters:
  • result (dict) – Result of experiment test run.
  • iteration (int) – Number of test run iteration.
start(steps=10)

Start the test runs of the experiment.

Parameters:steps (int, optional) – Defaults to 10. How many tests runs should be executed.
class experimentum.Experiments.Experiment.Script(cmd, verbose=False, shell=False, stdout=-1)

Bases: object

Call another script to run algorithms for your experiment.

Example:

script = Script(['Rscript', 'myalgo.r', 'arg1', 'arg2'], verbose=True, shell=True)
algo_result = script.get_json()
script.process.wait()  # Wait for child process to terminate.
process

Called script.

Type:subprocess.Popen
output

Output of called script.

Type:str
get_json()

Decode JSON of process output.

Returns:Process output
Return type:object
get_text()

Get the text of the process output.

Returns:Process output
Return type:str

experimentum.Experiments.Performance module

The Performance module lets you easily measure the performance.

It measures execution time and memory consumption of your python script and lets you add messages to each measuring point for a more detailed overview.

Example:

performance = Performance()
with performance.point('Task A') as point:
    # Do some heavy work
    point.message('Database query insert xy')

    with performance.point('Subtask A1') as subpoint:
        # Do some heavy work

    with performance.point('Subtask A2') as subpoint:
        subpoint.message('Database query insert xy')

performance.results()  # print results table
class experimentum.Experiments.Performance.Formatter

Bases: object

Format performance values to a human-readable format.

static format_number(value, decimals, unit)

Round a number and add a unit.

Parameters:
  • value (float) – Number to round.
  • decimals (int) – Numer of decimals places.
  • unit (str) – Unit to append
Returns:

string

get_table(points, tablefmt='psql')

Print performance table in human-readable format.

Parameters:
  • points (list) – Measuring Points.
  • tablefmt (str, optional) – Defaults to ‘psql’. Table format for tabulate
Returns:

Performance table

Return type:

str

memory_to_human(_bytes, unit='auto', decimals=2)

Transform used memory into a better suited unit.

Parameters:
  • _bytes (float) – Used Bytes.
  • unit (str, optional) – Defaults to ‘auto’. Unit to transform seconds into.
  • decimals (int, optional) – Defaults to 2. To which decimal point is rounded to.
Returns:

used memory suffixed with unit identifer

Return type:

string

print_table(points, tablefmt='psql')

Print performance table in human-readable format.

Parameters:
  • points (list) – Measuring Points.
  • tablefmt (str, optional) – Defaults to ‘psql’. Table format for tabulate
time_to_human(seconds, unit='auto', decimals=2)

Transform time into a better suited unit.

Parameters:
  • seconds (float) – Time seconds.
  • unit (str, optional) – Defaults to ‘auto’. Unit to transform seconds into.
  • decimals (int, optional) – Defaults to 2. To which decimal point is rounded to.
Raises:

TypeError – if an unknown unit is passed

Returns:

time suffixed with unit identifer

Return type:

string

class experimentum.Experiments.Performance.Performance

Bases: object

Easily measure the performance of your python scripts.

points

List of measuring points

Type:list
iteration

Number of current iteration

Type:int
formatter

Formatter to output human readable results

Type:Formatter
export(metrics=False)

Export the measuring points as a dictionary.

Parameters:metrics (bool, optional) –
Returns:Measuring points
Return type:dict
iterate(start, stop)

Iterate over multiple performance points to later calculate avg and standard deviation.

Parameters:
  • start (int) – Start at
  • stop (int) – Stop at
Yields:

int – current iteration

static mean(values)

Calculate Mean of values.

Parameters:values (list) – List of values
Returns:Mean value
Return type:float
point(**kwds)

Set measuring point with or without a label.

Keyword Arguments:
 label (str, optional) – Defaults to ‘Point’. Enter point label

Example:

with performance.point('Task A') as point:
    # do something
    point.message('Some Message)
Yields:Point – new measuring point
results()

Print the performance results in a human-readable format.

set_formatter(formatter)

Set a formatter for human readable output.

Parameters:formatter (Formatter) – Human-readable output formatter
static standard_deviation(numbers)

Calculate the standard deviation.

Parameters:numbers (list) – List of numbers
Returns:Standard Deviation
Return type:float
class experimentum.Experiments.Performance.Point(label, iter_id=None)

Bases: object

Measuring Point Data Structure.

Keeps track of execution time and memory consumption. You can also append messages to a point to keep track of different events. Each message contains the time since the last message was logged and the message content itself.

label

Label of the point.

Type:str
id

Id to keep track of same points when iterating.

Type:int
start_time

Startpoint Timestamp.

Type:float
stop_time

Endpoint Timestamp.

Type:float
start_memory

Memory consumption on start.

Type:int
stop_memory

Memory consumption on end.

Type:int
messages

List of optional messages.

Type:list
subpoints

List of optional subpoints.

Type:list
message(msg)

Set a message associated with the point.

Parameters:msg (str) – Enter message
to_df()

Transform point to dataframe.

Returns:Dataframe
Return type:dict
to_dict()

Return all measured attributes and messages as a dictionary.

Returns:measured attributes and messages.
Return type:dict
experimentum.Experiments.Performance.memory_usage()

Return the memory usage of the current process.

Note

Thanks to Fabian Pedregosa for his overview of different ways to determine the memory usage in python. http://fa.bianp.net/blog/2013/different-ways-to-get-memory-consumption-or-lessons-learned-from-memory_profiler

Returns:used bytes.
Return type:float

Module contents

Provides the main entry point and some utilities for running experiments.

Import App, Experiment and Performance so that they can be easly imported by other packages/modules.