experimentum.Experiments package¶

Submodules¶

experimentum.Experiments.App module¶

Main Service Container and Provider.

Sets up the framework and runs the experiments and lets you customize/extend the behavior of the framework.

Binding¶

We can register/bind a new alias by extending either the register_aliases class() or by directly adding the alias to the aliases dictionary. The key is the alias name you want to register and the value is a function that returns an instance of the class:

def register_aliases(self):
    super(MyAppClass, self).register_aliases()

    self.aliases['my_custom_api'] = lambda: API(self.store)

Additional arguments for creating a class instance may be passed when resolving. Your function just has to add them in order to use them:

self.aliases['my_custom_api'] = lambda name, user_id=None: API(self.store, name, user_id)

Resolving¶

You may use the make() method to resolve a class instance out of the container. The make() method accepts the alias of the class you want to resolve:

api = self.app.make('my_custom_api')

If some of your class’ dependencies are not resolvable via the app container, you may pass them as additional args and keyword args:

api = self.app.make('my_custom_api', foo, user_id=42)

Customizing¶

Commands¶

Register new commands with the App.register_commands() method. It should return a dictionary where the keys are names of the commands and the values are the command handlers. The command handlers must either be derived from AbstractCommand or a function with the decorator AbstractCommand.command(). Example return:

{
    'foo': FooCommand,
    'bar': BarCommand
}

class experimentum.Experiments.App.App(name, root)¶

Bases: object

Main entry point of the framework.

config_path¶

Defaults to .. Path to config files.

Type:	str

base_repository¶

Defaults to Repository. Repository Base Class

Type:	AbstractRepository

name¶

Name of the app.

Type:	str

root¶

Root Path of the app.

Type:	str

config¶

Config Manager.

Type:	Config

log¶

Logger.

Type:	logging.Logger

store¶

Data Store.

Type:	AbstractStore

aliases¶

Dictionary of aliases and factory functions.

Type:	dict

base_repository: alias of experimentum.Storage.SQLAlchemy.Repository.Repository

bootstrap()¶: Bootstrap the app, i.e. setup config and logger.

config_path = '.'

make(alias, *args, **kwargs)¶

Create an instance of an aliased class.

Parameters:	alias (str) – Name of class alias.
Raises:	Exception – if the alias does not exists.
Returns:	Instance of the aliased class
Return type:	object

register_aliases()¶: Register aliases for classes.

register_commands()¶

Register custom cli commands.

Returns:	commands to register
Return type:	dict

run()¶: Run the app.

setup_datastore(datastore)¶

Set up the data store.

Parameters:	datastore (dict) – Datastore config.

experimentum.Experiments.DataBag module¶

In-Memory Container to easily store experiment results on the fly.

Adding Entries¶

Add the entry {‘foo’: {‘bar’: {‘baz’: 42} }} to the DatBag via a dot-notation supported key:

DataBag.add('foo.bar.baz', 42)

Mergins Entries¶

Merging dictionary and list entires:

DataBag.add('foo.bar', [1, 2, 3])
DataBag.add('foo.baz', {'a': 1, 'b': 2})
DataBag.merge('foo.bar', [2, 4, 8])
DataBag.merge('foo.baz', {'a': 42, 'c': 3})

# Results in
# {
#   'foo': {
#       'bar': [1, 2, 3, 2, 4, 8],
#       'baz': {'a': 42, 'b': 2, 'c': 3}
#   }
# }

Getting Entries¶

Add the entry baz of the dictionary {‘foo’: {‘bar’: {‘baz’: 42} }} in the DatBag via a dot-notation supported key:

data = DataBag.get('foo.bar.baz')  # 42

The DataBag also provides a default value if the key does not exist: data = DataBag.get(‘a.b.c’, ‘default value’) # ‘default value’

Deleting Entries¶

Deleting entries works just like adding/getting entries:

DataBag.delete('foo.bar.baz')  # {'foo': {'bar': {}}}

Flushing Entries¶

Flushing the DataBag means that you get all the entires (or just a specific key) and then clear the content:

DataBag.add('foo.bar.baz', 42)
print(DataBag.flush('foo.bar.baz))  # prints: 42, contains: {'foo': {'bar': {}}}
print(DataBag.flush())              # prints: {'foo': {'bar': {}}}, contains: {}}

class experimentum.Experiments.DataBag.DataBag¶

Bases: object

In-Memory Container to easily store experiment results on the fly.

classmethod add(key, value)¶

Add an item to the DatBag via a dot-notation supported key.

Parameters:	key (str) – Key to set (i.e. ‘foo’ or ‘foo.bar.baz’ for a deep key) value (object) – Value to set for key.

classmethod all()¶

Return content of the DataBag.

Returns:	DataBag content.
Return type:	dict

classmethod clear()¶: Clear DataBag content.

classmethod delete(key)¶

Delete an item from the DataBag via a dot-notation supported key.

Parameters:	key (str) – Key to delete (i.e. ‘foo’ or ‘foo.bar.baz’ for a deep key)
Returns:	Statuscode, 1: okay, -1: Key not found
Return type:	int

classmethod flush(key=None, default=None)¶

Flush a specific key or all of the databag (i.e. return and delete).

Parameters:	key (str) – Key to set(i.e. ‘foo’ or ‘foo.bar.baz’ for a deep key) default (object, optional) – Defaults to None. Default if key does not exist
Returns:	Value of the key
Return type:	object

classmethod get(key, default=None)¶

Get an item from the DataBag via a dot-notation supported key.

Parameters:	key (str) – Key to get (i.e. ‘foo’ or ‘foo.bar.baz’ for a deep key) default (object, optional) – Defaults to None. Default if key does not exist
Returns:	Value of the key
Return type:	object

classmethod merge(key, data)¶

Merge two dicts or lists.

For example:

{‘a’: 1, ‘b’: 2} + {‘a’: 42, ‘c’: 3} => {‘a’: 42, ‘b’: 2, ‘c’: 3}
[‘a’, ‘b’, ‘c’] + [‘a’, ‘d’, ‘e’] => [‘a’, ‘b’, ‘c’, ‘a’, ‘d’, ‘e’]

Parameters:	key (str) – Key to set (i.e. ‘foo’ or ‘foo.bar.baz’ for a deep key) data (dict\|list) – Value to update/extend for key.

experimentum.Experiments.DataBag.del_key(items, data)¶

Recursively remove deep key from dict.

Parameters:	items (list) – List of dictionary keys data (dict) – Portion of dictionary to operate on
Raises:	KeyError – If key does not exist

experimentum.Experiments.DataBag.get_from(items, data)¶

Recursively get value from dictionary deep key.

Parameters:	items (list) – List of dictionary keys data (dict) – Portion of dictionary to operate on

Returns: object: Value from dictionary

Raises:	KeyError – If key does not exist

experimentum.Experiments.DataBag.set_to(items, data, value)¶

Recursively set value to dictionary deep key.

Parameters:	items (list) – List of dictionary keys data (dict) – Portion of dictionary to operate on value (object) – Value to set for key.

experimentum.Experiments.Experiment module¶

Run experiments and save the results for future analysations.

Writing Experiments¶

Experiments are created in the experiments directory and they must adhere to the following naming convention: {NAME}Experiment.py.

All Experiments extend the Experiment class. Experiments contain a reset() and run() method. Within reset() the method, you should reset/initialize the data structuresand values you want to use in each test run.

The run() method should contain the code you want to test. It should return a dictionary with the values you want to save.

The Reset Method¶

As mentioned before the reset() should reset/initialize the data structuresand values you want to use in each test run.

Let’s take a look at a basic experiment. Within any of your experiment methods, you always have access to the app attribute which provides access to the main app class and to the config which contains the content of the config_file:

from experimentum.Experiments import Experiment
import random


class FooExperiment(Experiment):

    config_file = 'foo.json'

    def reset(self):
        # use app to create an instance of a custom aliased class
        self.user = self.config.get('user')
        self.some_class = self.app.make('some_class', self.user)
        self.rand = random.randint(0, 10)

The Run Method¶

As mentioned before the run() method should contain the code you want to test and return a dictionary with the values you want to save.

Let’s take a look at a basic experiment, assuming that you added a rand attribute to your TestCaseRepository with a migration:

from experimentum.Experiments import Experiment
import random


class FooExperiment(Experiment):

    config_file = 'foo.json'

    def run(self):
        with self.performance.point('Task to measure some Rscript algo') as point:
            script = self.call('some_script.r')  # prints json to stdout as return value.
            algo_result = script.get_json()
            script.process.wait()  # Wait for child process to terminate.

        # Add a custom performance entry
        return {
            'rand': self.rand,
            'performances': [{
                'label': 'Custom Rscript measuring',
                'time': algo_result.get('time'),
                'memory': 0,
                'peak_memory': 0,
                'level': 0,
                'type': 'custom'
            }]
        }

class experimentum.Experiments.Experiment.Experiment(app, path)¶

Bases: object

Run experiments and save the results in the data store.

app¶

Main Application Class.

Type:	App

performance¶

Performance Profiler.

Type:	Performance

config¶

Hold the experiment configuration.

Type:	Config

show_progress¶

Flag to show/hide the progress bar.

Type:	bool

hide_performance¶

Flag to show/hide the performance table.

Type:	bool

config_file¶

Config file to load.

Type:	str

repos¶

Experiment and Testcast Repo to save results.

Type:	dict

boot()¶: Boot up the experiment, e.g. load config etc.

static call(cmd, verbose=False, shell=False)¶

Call another script to run algorithms for your experiment.

Warning

Passing shell=True can be a security hazard if combined with untrusted input. See the warning under Frequently Used Arguments for details.

Parameters:	cmd (str, list) – Command which you want to call. verbose (bool, optional) – Defaults to False. Print the cmd output or not. shell (bool, optional) – Defaults to False. Specifices whether to use the shell as the program to execute.
Returns:	Executed script to get output from
Return type:	Script

config_file = None

static get_experiments(path)¶

[DEPRECATED] Get experiment names from exp files/classes.

Parameters:	path (str) – Path to experiments folder.
Returns:	Names of experiments
Return type:	list

static get_status(app)¶

Get status information about experiments.

Parameters:	app (App) – Main Service Provider/Container.
Returns:	Dictionary with experiment status
Return type:	dict

static load(app, path, name)¶

Load and initialize an experiment class.

Parameters:	app (App) – Main app calss path (str) – Path to experiments folder. name (str) – Name of experiment.
Returns:	Loaded experiment.
Return type:	Experiment

reset()¶: Reset data structured and values used in the run method.

run()¶: Run a test of the experiment.

save(result, iteration)¶

Save the test results in the data store.

Parameters:	result (dict) – Result of experiment test run. iteration (int) – Number of test run iteration.

start(steps=10)¶

Start the test runs of the experiment.

Parameters:	steps (int, optional) – Defaults to 10. How many tests runs should be executed.

class experimentum.Experiments.Experiment.Script(cmd, verbose=False, shell=False, stdout=-1)¶

Bases: object

Call another script to run algorithms for your experiment.

Example:

script = Script(['Rscript', 'myalgo.r', 'arg1', 'arg2'], verbose=True, shell=True)
algo_result = script.get_json()
script.process.wait()  # Wait for child process to terminate.

process¶

Called script.

Type:	subprocess.Popen

output¶

Output of called script.

Type:	str

get_json()¶

Decode JSON of process output.

Returns:	Process output
Return type:	object

get_text()¶

Get the text of the process output.

Returns:	Process output
Return type:	str

experimentum.Experiments.Performance module¶

The Performance module lets you easily measure the performance.

It measures execution time and memory consumption of your python script and lets you add messages to each measuring point for a more detailed overview.

Example:

performance = Performance()
with performance.point('Task A') as point:
    # Do some heavy work
    point.message('Database query insert xy')

    with performance.point('Subtask A1') as subpoint:
        # Do some heavy work

    with performance.point('Subtask A2') as subpoint:
        subpoint.message('Database query insert xy')

performance.results()  # print results table

class experimentum.Experiments.Performance.Formatter¶

Bases: object

Format performance values to a human-readable format.

static format_number(value, decimals, unit)¶

Round a number and add a unit.

Parameters:	value (float) – Number to round. decimals (int) – Numer of decimals places. unit (str) – Unit to append
Returns:	string

get_table(points, tablefmt='psql')¶

Print performance table in human-readable format.

Parameters:	points (list) – Measuring Points. tablefmt (str, optional) – Defaults to ‘psql’. Table format for `tabulate`
Returns:	Performance table
Return type:	str

memory_to_human(_bytes, unit='auto', decimals=2)¶

Transform used memory into a better suited unit.

Parameters:	_bytes (float) – Used Bytes. unit (str, optional) – Defaults to ‘auto’. Unit to transform seconds into. decimals (int, optional) – Defaults to 2. To which decimal point is rounded to.
Returns:	used memory suffixed with unit identifer
Return type:	string

print_table(points, tablefmt='psql')¶

Print performance table in human-readable format.

Parameters:	points (list) – Measuring Points. tablefmt (str, optional) – Defaults to ‘psql’. Table format for `tabulate`

time_to_human(seconds, unit='auto', decimals=2)¶

Transform time into a better suited unit.

Parameters:	seconds (float) – Time seconds. unit (str, optional) – Defaults to ‘auto’. Unit to transform seconds into. decimals (int, optional) – Defaults to 2. To which decimal point is rounded to.
Raises:	TypeError – if an unknown unit is passed
Returns:	time suffixed with unit identifer
Return type:	string

class experimentum.Experiments.Performance.Performance¶

Bases: object

Easily measure the performance of your python scripts.

points¶

List of measuring points

Type:	list

iteration¶

Number of current iteration

Type:	int

formatter¶

Formatter to output human readable results

Type:	Formatter

export(metrics=False)¶

Export the measuring points as a dictionary.

Parameters:	metrics (bool, optional) –
Returns:	Measuring points
Return type:	dict

iterate(start, stop)¶

Iterate over multiple performance points to later calculate avg and standard deviation.

Parameters:	start (int) – Start at stop (int) – Stop at
Yields:	int – current iteration

static mean(values)¶

Calculate Mean of values.

Parameters:	values (list) – List of values
Returns:	Mean value
Return type:	float

point(**kwds)¶

Set measuring point with or without a label.

Keyword Arguments:
	label (str, optional) – Defaults to ‘Point’. Enter point label

Example:

with performance.point('Task A') as point:
    # do something
    point.message('Some Message)

Yields:	Point – new measuring point

results()¶: Print the performance results in a human-readable format.

set_formatter(formatter)¶

Set a formatter for human readable output.

Parameters:	formatter (Formatter) – Human-readable output formatter

static standard_deviation(numbers)¶

Calculate the standard deviation.

Parameters:	numbers (list) – List of numbers
Returns:	Standard Deviation
Return type:	float

class experimentum.Experiments.Performance.Point(label, iter_id=None)¶

Bases: object

Measuring Point Data Structure.

Keeps track of execution time and memory consumption. You can also append messages to a point to keep track of different events. Each message contains the time since the last message was logged and the message content itself.

label¶

Label of the point.

Type:	str

id¶

Id to keep track of same points when iterating.

Type:	int

start_time¶

Startpoint Timestamp.

Type:	float

stop_time¶

Endpoint Timestamp.

Type:	float

start_memory¶

Memory consumption on start.

Type:	int

stop_memory¶

Memory consumption on end.

Type:	int

messages¶

List of optional messages.

Type:	list

subpoints¶

List of optional subpoints.

Type:	list

message(msg)¶

Set a message associated with the point.

Parameters:	msg (str) – Enter message

to_df()¶

Transform point to dataframe.

Returns:	Dataframe
Return type:	dict

to_dict()¶

Return all measured attributes and messages as a dictionary.

Returns:	measured attributes and messages.
Return type:	dict

experimentum.Experiments.Performance.memory_usage()¶

Return the memory usage of the current process.

Note

Thanks to Fabian Pedregosa for his overview of different ways to determine the memory usage in python. http://fa.bianp.net/blog/2013/different-ways-to-get-memory-consumption-or-lessons-learned-from-memory_profiler

Returns:	used bytes.
Return type:	float

Module contents¶

Provides the main entry point and some utilities for running experiments.

Import App, Experiment and Performance so that they can be easly imported by other packages/modules.