预测模型:模型训练与预测¶
简介¶
Forecast Model
旨在生成股票的`预测分数`。用户可以通过 qrun
在自动化工作流中使用 Forecast Model
,详情请参阅 工作流:工作流管理。
由于 Qlib
的组件采用松耦合方式设计,Forecast Model
也可以作为独立模块使用。
基类与接口¶
Qlib
provides a base class qlib.model.base.Model from which all models should inherit.
基类提供以下接口:
- class qlib.model.base.Model
Learnable Models
- fit(dataset: Dataset, reweighter: Reweighter)
Learn model from the base model
备注
The attribute names of learned model should not start with '_'. So that the model could be dumped to disk.
The following code example shows how to retrieve x_train, y_train and w_train from the dataset:
# get features and labels df_train, df_valid = dataset.prepare( ["train", "valid"], col_set=["feature", "label"], data_key=DataHandlerLP.DK_L ) x_train, y_train = df_train["feature"], df_train["label"] x_valid, y_valid = df_valid["feature"], df_valid["label"] # get weights try: wdf_train, wdf_valid = dataset.prepare(["train", "valid"], col_set=["weight"], data_key=DataHandlerLP.DK_L) w_train, w_valid = wdf_train["weight"], wdf_valid["weight"] except KeyError as e: w_train = pd.DataFrame(np.ones_like(y_train.values), index=y_train.index) w_valid = pd.DataFrame(np.ones_like(y_valid.values), index=y_valid.index)
- 参数:
dataset (Dataset) -- dataset will generate the processed data from model training.
- abstractmethod predict(dataset: Dataset, segment: str | slice = 'test') object
give prediction given Dataset
- 参数:
dataset (Dataset) -- dataset will generate the processed dataset from model training.
segment (Text or slice) -- dataset will use this segment to prepare data. (default=test)
- 返回类型:
Prediction results with certain type such as pandas.Series.
Qlib
还提供了一个基类 qlib.model.base.ModelFT,其中包含用于微调模型的方法。
关于其他接口如 finetune,请参阅 Model API。
示例¶
Qlib
的 Model Zoo 包含诸如 LightGBM
、MLP
、LSTM
等模型。这些模型被视为 Forecast Model
的基准。以下步骤展示了如何将 LightGBM
作为独立模块运行。
首先使用 qlib.init 初始化
Qlib
,请参阅 Initialization。- 运行以下代码获取 prediction score pred_score
from qlib.contrib.model.gbdt import LGBModel from qlib.contrib.data.handler import Alpha158 from qlib.utils import init_instance_by_config, flatten_dict from qlib.workflow import R from qlib.workflow.record_temp import SignalRecord, PortAnaRecord market = "csi300" benchmark = "SH000300" data_handler_config = { "start_time": "2008-01-01", "end_time": "2020-08-01", "fit_start_time": "2008-01-01", "fit_end_time": "2014-12-31", "instruments": market, } task = { "model": { "class": "LGBModel", "module_path": "qlib.contrib.model.gbdt", "kwargs": { "loss": "mse", "colsample_bytree": 0.8879, "learning_rate": 0.0421, "subsample": 0.8789, "lambda_l1": 205.6999, "lambda_l2": 580.9768, "max_depth": 8, "num_leaves": 210, "num_threads": 20, }, }, "dataset": { "class": "DatasetH", "module_path": "qlib.data.dataset", "kwargs": { "handler": { "class": "Alpha158", "module_path": "qlib.contrib.data.handler", "kwargs": data_handler_config, }, "segments": { "train": ("2008-01-01", "2014-12-31"), "valid": ("2015-01-01", "2016-12-31"), "test": ("2017-01-01", "2020-08-01"), }, }, }, } # model initialization model = init_instance_by_config(task["model"]) dataset = init_instance_by_config(task["dataset"]) # start exp with R.start(experiment_name="workflow"): # train R.log_params(**flatten_dict(task)) model.fit(dataset) # prediction recorder = R.get_recorder() sr = SignalRecord(model, dataset, recorder) sr.generate()
备注
Alpha158 是
Qlib
提供的数据处理器,请参阅 Data Handler。 SignalRecord 是Qlib
中的 Record Template,请参阅 Workflow。
此外,上述示例已在 examples/train_backtest_analyze.ipynb
中提供。
从技术上讲,模型预测的含义取决于用户设计的标签设置。
默认情况下,该分数通常表示预测模型对工具的评级。分数越高,工具的盈利能力越强。
自定义模型¶
Qlib 支持自定义模型。如果用户有兴趣定制自己的模型并将其集成到 Qlib
中,请参阅 自定义模型集成。
API¶
请参阅 模型API。