experimentum.Experiments package¶
Submodules¶
experimentum.Experiments.App module¶
Main Service Container and Provider.
Sets up the framework and runs the experiments and lets you customize/extend the behavior of the framework.
Binding¶
We can register/bind a new alias by extending either the register_aliases class()
or by directly adding the alias to the aliases
dictionary. The key is the alias
name you want to register and the value is a function that returns an instance of the class:
def register_aliases(self):
super(MyAppClass, self).register_aliases()
self.aliases['my_custom_api'] = lambda: API(self.store)
Additional arguments for creating a class instance may be passed when resolving. Your function just has to add them in order to use them:
self.aliases['my_custom_api'] = lambda name, user_id=None: API(self.store, name, user_id)
Resolving¶
You may use the make()
method to resolve a class instance out of the container.
The make()
method accepts the alias of the class you want to resolve:
api = self.app.make('my_custom_api')
If some of your class’ dependencies are not resolvable via the app container, you may pass them as additional args and keyword args:
api = self.app.make('my_custom_api', foo, user_id=42)
Customizing¶
Commands¶
Register new commands with the App.register_commands()
method.
It should return a dictionary where the keys are names of the commands and
the values are the command handlers. The command handlers must either be
derived from AbstractCommand
or a function with the decorator
AbstractCommand.command()
. Example return:
{
'foo': FooCommand,
'bar': BarCommand
}
-
class
experimentum.Experiments.App.
App
(name, root)¶ Bases:
object
Main entry point of the framework.
-
config_path
¶ Defaults to
.
. Path to config files.Type: str
-
base_repository
¶ Defaults to
Repository
. Repository Base ClassType: AbstractRepository
-
name
¶ Name of the app.
Type: str
-
root
¶ Root Path of the app.
Type: str
-
log
¶ Logger.
Type: logging.Logger
-
store
¶ Data Store.
Type: AbstractStore
-
aliases
¶ Dictionary of aliases and factory functions.
Type: dict
-
base_repository
alias of
experimentum.Storage.SQLAlchemy.Repository.Repository
-
bootstrap
()¶ Bootstrap the app, i.e. setup config and logger.
-
config_path
= '.'
-
make
(alias, *args, **kwargs)¶ Create an instance of an aliased class.
Parameters: alias (str) – Name of class alias. Raises: Exception – if the alias does not exists. Returns: Instance of the aliased class Return type: object
-
register_aliases
()¶ Register aliases for classes.
-
register_commands
()¶ Register custom cli commands.
Returns: commands to register Return type: dict
-
run
()¶ Run the app.
-
setup_datastore
(datastore)¶ Set up the data store.
Parameters: datastore (dict) – Datastore config.
-
experimentum.Experiments.DataBag module¶
In-Memory Container to easily store experiment results on the fly.
Adding Entries¶
Add the entry {‘foo’: {‘bar’: {‘baz’: 42} }} to the DatBag via a dot-notation supported key:
DataBag.add('foo.bar.baz', 42)
Mergins Entries¶
Merging dictionary and list entires:
DataBag.add('foo.bar', [1, 2, 3])
DataBag.add('foo.baz', {'a': 1, 'b': 2})
DataBag.merge('foo.bar', [2, 4, 8])
DataBag.merge('foo.baz', {'a': 42, 'c': 3})
# Results in
# {
# 'foo': {
# 'bar': [1, 2, 3, 2, 4, 8],
# 'baz': {'a': 42, 'b': 2, 'c': 3}
# }
# }
Getting Entries¶
Add the entry baz of the dictionary {‘foo’: {‘bar’: {‘baz’: 42} }} in the DatBag via a dot-notation supported key:
data = DataBag.get('foo.bar.baz') # 42
- The DataBag also provides a default value if the key does not exist
- data = DataBag.get(‘a.b.c’, ‘default value’) # ‘default value’
Deleting Entries¶
Deleting entries works just like adding/getting entries:
DataBag.delete('foo.bar.baz') # {'foo': {'bar': {}}}
Flushing Entries¶
Flushing the DataBag means that you get all the entires (or just a specific key) and then clear the content:
DataBag.add('foo.bar.baz', 42)
print(DataBag.flush('foo.bar.baz)) # prints: 42, contains: {'foo': {'bar': {}}}
print(DataBag.flush()) # prints: {'foo': {'bar': {}}}, contains: {}}
-
class
experimentum.Experiments.DataBag.
DataBag
¶ Bases:
object
In-Memory Container to easily store experiment results on the fly.
-
classmethod
add
(key, value)¶ Add an item to the DatBag via a dot-notation supported key.
Parameters: - key (str) – Key to set (i.e. ‘foo’ or ‘foo.bar.baz’ for a deep key)
- value (object) – Value to set for key.
-
classmethod
all
()¶ Return content of the DataBag.
Returns: DataBag content. Return type: dict
-
classmethod
clear
()¶ Clear DataBag content.
-
classmethod
delete
(key)¶ Delete an item from the DataBag via a dot-notation supported key.
Parameters: key (str) – Key to delete (i.e. ‘foo’ or ‘foo.bar.baz’ for a deep key) Returns: Statuscode, 1: okay, -1: Key not found Return type: int
-
classmethod
flush
(key=None, default=None)¶ Flush a specific key or all of the databag (i.e. return and delete).
Parameters: - key (str) – Key to set(i.e. ‘foo’ or ‘foo.bar.baz’ for a deep key)
- default (object, optional) – Defaults to None. Default if key does not exist
Returns: Value of the key
Return type: object
-
classmethod
get
(key, default=None)¶ Get an item from the DataBag via a dot-notation supported key.
Parameters: - key (str) – Key to get (i.e. ‘foo’ or ‘foo.bar.baz’ for a deep key)
- default (object, optional) – Defaults to None. Default if key does not exist
Returns: Value of the key
Return type: object
-
classmethod
merge
(key, data)¶ Merge two dicts or lists.
- For example:
- {‘a’: 1, ‘b’: 2} + {‘a’: 42, ‘c’: 3} => {‘a’: 42, ‘b’: 2, ‘c’: 3}
- [‘a’, ‘b’, ‘c’] + [‘a’, ‘d’, ‘e’] => [‘a’, ‘b’, ‘c’, ‘a’, ‘d’, ‘e’]
Parameters: - key (str) – Key to set (i.e. ‘foo’ or ‘foo.bar.baz’ for a deep key)
- data (dict|list) – Value to update/extend for key.
-
classmethod
-
experimentum.Experiments.DataBag.
del_key
(items, data)¶ Recursively remove deep key from dict.
Parameters: - items (list) – List of dictionary keys
- data (dict) – Portion of dictionary to operate on
Raises: KeyError – If key does not exist
-
experimentum.Experiments.DataBag.
get_from
(items, data)¶ Recursively get value from dictionary deep key.
Parameters: - items (list) – List of dictionary keys
- data (dict) – Portion of dictionary to operate on
- Returns
- object: Value from dictionary
Raises: KeyError – If key does not exist
-
experimentum.Experiments.DataBag.
set_to
(items, data, value)¶ Recursively set value to dictionary deep key.
Parameters: - items (list) – List of dictionary keys
- data (dict) – Portion of dictionary to operate on
- value (object) – Value to set for key.
experimentum.Experiments.Experiment module¶
Run experiments and save the results for future analysations.
Writing Experiments¶
Experiments are created in the experiments directory and they must adhere to the following naming convention: {NAME}Experiment.py.
All Experiments extend the Experiment
class. Experiments
contain a reset()
and run()
method. Within reset()
the method, you should
reset/initialize the data structuresand values you want to use in each test run.
The run()
method should contain the code you want to test.
It should return a dictionary with the values you want to save.
The Reset Method¶
As mentioned before the reset()
should reset/initialize the
data structuresand values you want to use in each test run.
Let’s take a look at a basic experiment. Within any of your experiment methods,
you always have access to the app
attribute which
provides access to the main app class and to the config
which
contains the content of the config_file
:
from experimentum.Experiments import Experiment
import random
class FooExperiment(Experiment):
config_file = 'foo.json'
def reset(self):
# use app to create an instance of a custom aliased class
self.user = self.config.get('user')
self.some_class = self.app.make('some_class', self.user)
self.rand = random.randint(0, 10)
The Run Method¶
As mentioned before the run()
method should contain the code
you want to test and return a dictionary with the values you want to save.
Let’s take a look at a basic experiment, assuming that you added a rand
attribute to
your TestCaseRepository with a migration:
from experimentum.Experiments import Experiment
import random
class FooExperiment(Experiment):
config_file = 'foo.json'
def run(self):
with self.performance.point('Task to measure some Rscript algo') as point:
script = self.call('some_script.r') # prints json to stdout as return value.
algo_result = script.get_json()
script.process.wait() # Wait for child process to terminate.
# Add a custom performance entry
return {
'rand': self.rand,
'performances': [{
'label': 'Custom Rscript measuring',
'time': algo_result.get('time'),
'memory': 0,
'peak_memory': 0,
'level': 0,
'type': 'custom'
}]
}
-
class
experimentum.Experiments.Experiment.
Experiment
(app, path)¶ Bases:
object
Run experiments and save the results in the data store.
-
performance
¶ Performance Profiler.
Type: Performance
-
show_progress
¶ Flag to show/hide the progress bar.
Type: bool
-
hide_performance
¶ Flag to show/hide the performance table.
Type: bool
-
config_file
¶ Config file to load.
Type: str
-
repos
¶ Experiment and Testcast Repo to save results.
Type: dict
-
boot
()¶ Boot up the experiment, e.g. load config etc.
-
static
call
(cmd, verbose=False, shell=False)¶ Call another script to run algorithms for your experiment.
Warning
Passing
shell=True
can be a security hazard if combined with untrusted input. See the warning under Frequently Used Arguments for details.Parameters: - cmd (str, list) – Command which you want to call.
- verbose (bool, optional) – Defaults to False. Print the cmd output or not.
- shell (bool, optional) – Defaults to False. Specifices whether to use the shell as the program to execute.
Returns: Executed script to get output from
Return type:
-
config_file
= None
-
static
get_experiments
(path)¶ [DEPRECATED] Get experiment names from exp files/classes.
Parameters: path (str) – Path to experiments folder. Returns: Names of experiments Return type: list
-
static
get_status
(app)¶ Get status information about experiments.
Parameters: app (App) – Main Service Provider/Container. Returns: Dictionary with experiment status Return type: dict
-
static
load
(app, path, name)¶ Load and initialize an experiment class.
Parameters: - app (App) – Main app calss
- path (str) – Path to experiments folder.
- name (str) – Name of experiment.
Returns: Loaded experiment.
Return type:
-
reset
()¶ Reset data structured and values used in the run method.
-
run
()¶ Run a test of the experiment.
-
save
(result, iteration)¶ Save the test results in the data store.
Parameters: - result (dict) – Result of experiment test run.
- iteration (int) – Number of test run iteration.
-
start
(steps=10)¶ Start the test runs of the experiment.
Parameters: steps (int, optional) – Defaults to 10. How many tests runs should be executed.
-
-
class
experimentum.Experiments.Experiment.
Script
(cmd, verbose=False, shell=False, stdout=-1)¶ Bases:
object
Call another script to run algorithms for your experiment.
Example:
script = Script(['Rscript', 'myalgo.r', 'arg1', 'arg2'], verbose=True, shell=True) algo_result = script.get_json() script.process.wait() # Wait for child process to terminate.
-
process
¶ Called script.
Type: subprocess.Popen
-
output
¶ Output of called script.
Type: str
-
get_json
()¶ Decode JSON of process output.
Returns: Process output Return type: object
-
get_text
()¶ Get the text of the process output.
Returns: Process output Return type: str
-
experimentum.Experiments.Performance module¶
The Performance module lets you easily measure the performance.
It measures execution time and memory consumption of your python script and lets you add messages to each measuring point for a more detailed overview.
Example:
performance = Performance()
with performance.point('Task A') as point:
# Do some heavy work
point.message('Database query insert xy')
with performance.point('Subtask A1') as subpoint:
# Do some heavy work
with performance.point('Subtask A2') as subpoint:
subpoint.message('Database query insert xy')
performance.results() # print results table
-
class
experimentum.Experiments.Performance.
Formatter
¶ Bases:
object
Format performance values to a human-readable format.
-
static
format_number
(value, decimals, unit)¶ Round a number and add a unit.
Parameters: - value (float) – Number to round.
- decimals (int) – Numer of decimals places.
- unit (str) – Unit to append
Returns: string
-
get_table
(points, tablefmt='psql')¶ Print performance table in human-readable format.
Parameters: - points (list) – Measuring Points.
- tablefmt (str, optional) – Defaults to ‘psql’. Table format for
tabulate
Returns: Performance table
Return type: str
-
memory_to_human
(_bytes, unit='auto', decimals=2)¶ Transform used memory into a better suited unit.
Parameters: - _bytes (float) – Used Bytes.
- unit (str, optional) – Defaults to ‘auto’. Unit to transform seconds into.
- decimals (int, optional) – Defaults to 2. To which decimal point is rounded to.
Returns: used memory suffixed with unit identifer
Return type: string
-
print_table
(points, tablefmt='psql')¶ Print performance table in human-readable format.
Parameters: - points (list) – Measuring Points.
- tablefmt (str, optional) – Defaults to ‘psql’. Table format for
tabulate
-
time_to_human
(seconds, unit='auto', decimals=2)¶ Transform time into a better suited unit.
Parameters: - seconds (float) – Time seconds.
- unit (str, optional) – Defaults to ‘auto’. Unit to transform seconds into.
- decimals (int, optional) – Defaults to 2. To which decimal point is rounded to.
Raises: TypeError – if an unknown unit is passed
Returns: time suffixed with unit identifer
Return type: string
-
static
-
class
experimentum.Experiments.Performance.
Performance
¶ Bases:
object
Easily measure the performance of your python scripts.
-
points
¶ List of measuring points
Type: list
-
iteration
¶ Number of current iteration
Type: int
-
export
(metrics=False)¶ Export the measuring points as a dictionary.
Parameters: metrics (bool, optional) – Returns: Measuring points Return type: dict
-
iterate
(start, stop)¶ Iterate over multiple performance points to later calculate avg and standard deviation.
Parameters: - start (int) – Start at
- stop (int) – Stop at
Yields: int – current iteration
-
static
mean
(values)¶ Calculate Mean of values.
Parameters: values (list) – List of values Returns: Mean value Return type: float
-
point
(**kwds)¶ Set measuring point with or without a label.
Keyword Arguments: label (str, optional) – Defaults to ‘Point’. Enter point label Example:
with performance.point('Task A') as point: # do something point.message('Some Message)
Yields: Point – new measuring point
-
results
()¶ Print the performance results in a human-readable format.
-
set_formatter
(formatter)¶ Set a formatter for human readable output.
Parameters: formatter (Formatter) – Human-readable output formatter
-
static
standard_deviation
(numbers)¶ Calculate the standard deviation.
Parameters: numbers (list) – List of numbers Returns: Standard Deviation Return type: float
-
-
class
experimentum.Experiments.Performance.
Point
(label, iter_id=None)¶ Bases:
object
Measuring Point Data Structure.
Keeps track of execution time and memory consumption. You can also append messages to a point to keep track of different events. Each message contains the time since the last message was logged and the message content itself.
-
label
¶ Label of the point.
Type: str
-
id
¶ Id to keep track of same points when iterating.
Type: int
-
start_time
¶ Startpoint Timestamp.
Type: float
-
stop_time
¶ Endpoint Timestamp.
Type: float
-
start_memory
¶ Memory consumption on start.
Type: int
-
stop_memory
¶ Memory consumption on end.
Type: int
-
messages
¶ List of optional messages.
Type: list
-
subpoints
¶ List of optional subpoints.
Type: list
-
message
(msg)¶ Set a message associated with the point.
Parameters: msg (str) – Enter message
-
to_df
()¶ Transform point to dataframe.
Returns: Dataframe Return type: dict
-
to_dict
()¶ Return all measured attributes and messages as a dictionary.
Returns: measured attributes and messages. Return type: dict
-
-
experimentum.Experiments.Performance.
memory_usage
()¶ Return the memory usage of the current process.
Note
Thanks to Fabian Pedregosa for his overview of different ways to determine the memory usage in python. http://fa.bianp.net/blog/2013/different-ways-to-get-memory-consumption-or-lessons-learned-from-memory_profiler
Returns: used bytes. Return type: float
Module contents¶
Provides the main entry point and some utilities for running experiments.
Import App
, Experiment
and Performance
so that
they can be easly imported by other packages/modules.