.. _task_management: =============== Task Management =============== .. currentmodule:: qlib Introduction ============ The `Workflow <../component/introduction.html>`_ part introduces how to run research workflow in a loosely-coupled way. But it can only execute one ``task`` when you use ``qrun``. To automatically generate and execute different tasks, ``Task Management`` provides a whole process including `Task Generating`_, `Task Storing`_, `Task Training`_ and `Task Collecting`_. With this module, users can run their ``task`` automatically at different periods, in different losses, or even by different models.The processes of task generation, model training and combine and collect data are shown in the following figure. .. image:: ../_static/img/Task-Gen-Recorder-Collector.svg :align: center This whole process can be used in `Online Serving <../component/online.html>`_. An example of the entire process is shown `here `__. Task Generating =============== A ``task`` consists of `Model`, `Dataset`, `Record`, or anything added by users. The specific task template can be viewed in `Task Section <../component/workflow.html#task-section>`_. Even though the task template is fixed, users can customize their ``TaskGen`` to generate different ``task`` by task template. Here is the base class of ``TaskGen``: .. autoclass:: qlib.workflow.task.gen.TaskGen :members: :noindex: ``Qlib`` provides a class `RollingGen `_ to generate a list of ``task`` of the dataset in different date segments. This class allows users to verify the effect of data from different periods on the model in one experiment. More information is `here <../reference/api.html#TaskGen>`__. Task Storing ============ To achieve higher efficiency and the possibility of cluster operation, ``Task Manager`` will store all tasks in `MongoDB `_. ``TaskManager`` can fetch undone tasks automatically and manage the lifecycle of a set of tasks with error handling. Users **MUST** finish the configuration of `MongoDB `_ when using this module. Users need to provide the MongoDB URL and database name for using ``TaskManager`` in `initialization <../start/initialization.html#Parameters>`_ or make a statement like this. .. code-block:: python from qlib.config import C C["mongo"] = { "task_url" : "mongodb://localhost:27017/", # your MongoDB url "task_db_name" : "rolling_db" # database name } .. autoclass:: qlib.workflow.task.manage.TaskManager :members: :noindex: More information of ``Task Manager`` can be found in `here <../reference/api.html#TaskManager>`__. Task Training ============= After generating and storing those ``task``, it's time to run the ``task`` which is in the *WAITING* status. ``Qlib`` provides a method called ``run_task`` to run those ``task`` in task pool, however, users can also customize how tasks are executed. An easy way to get the ``task_func`` is using ``qlib.model.trainer.task_train`` directly. It will run the whole workflow defined by ``task``, which includes *Model*, *Dataset*, *Record*. .. autofunction:: qlib.workflow.task.manage.run_task :noindex: Meanwhile, ``Qlib`` provides a module called ``Trainer``. .. autoclass:: qlib.model.trainer.Trainer :members: :noindex: ``Trainer`` will train a list of tasks and return a list of model recorders. ``Qlib`` offer two kinds of Trainer, TrainerR is the simplest way and TrainerRM is based on TaskManager to help manager tasks lifecycle automatically. If you do not want to use ``Task Manager`` to manage tasks, then use TrainerR to train a list of tasks generated by ``TaskGen`` is enough. `Here <../reference/api.html#Trainer>`_ are the details about different ``Trainer``. Task Collecting =============== Before collecting model training results, you need to use the ``qlib.init`` to specify the path of mlruns. To collect the results of ``task`` after training, ``Qlib`` provides `Collector <../reference/api.html#Collector>`_, `Group <../reference/api.html#Group>`_ and `Ensemble <../reference/api.html#Ensemble>`_ to collect the results in a readable, expandable and loosely-coupled way. `Collector <../reference/api.html#Collector>`_ can collect objects from everywhere and process them such as merging, grouping, averaging and so on. It has 2 step action including ``collect`` (collect anything in a dict) and ``process_collect`` (process collected dict). `Group <../reference/api.html#Group>`_ also has 2 steps including ``group`` (can group a set of object based on `group_func` and change them to a dict) and ``reduce`` (can make a dict become an ensemble based on some rule). For example: {(A,B,C1): object, (A,B,C2): object} ---``group``---> {(A,B): {C1: object, C2: object}} ---``reduce``---> {(A,B): object} `Ensemble <../reference/api.html#Ensemble>`_ can merge the objects in an ensemble. For example: {C1: object, C2: object} ---``Ensemble``---> object. You can set the ensembles you want in the ``Collector``'s process_list. Common ensembles include ``AverageEnsemble`` and ``RollingEnsemble``. Average ensemble is used to ensemble the results of different models in the same time period. Rollingensemble is used to ensemble the results of different models in the same time period So the hierarchy is ``Collector``'s second step corresponds to ``Group``. And ``Group``'s second step correspond to ``Ensemble``. For more information, please see `Collector <../reference/api.html#Collector>`_, `Group <../reference/api.html#Group>`_ and `Ensemble <../reference/api.html#Ensemble>`_, or the `example `_.