dataclr.models#
To maximize the performance of dataclr models during feature selection:
- Single-threaded Execution: Ensure that models are configured to use a single thread (e.g., - n_jobs=1) if they support parallel execution. This avoids contention between the parallelized feature selection process and the model’s internal parallelism.
- Non-parallelized Solvers: For models like - LogisticRegressionin scikit-learn, use solvers that are not parallelized, such as- solver='liblinear'.
These adjustments ensure the distributed feature selection algorithms in dataclr operate efficiently without interference.
- class dataclr.models.BaseModel#
- Abstract base class for machine learning models. - This class defines the interface that models must adhere to for compatibility with feature selection methods. Subclasses must implement the - fitand- predictmethods.- Attributes for Wrapper Method Compatibility:
- feature_importances_: Attribute for feature importance scores
- (e.g., tree-based models). 
 
- coef_: Attribute for feature coefficients (e.g., linear models).
 
 - Subclasses must ensure that at least one of these attributes is implemented to support wrapper-based feature selection methods. - abstractmethod fit(X_train: DataFrame, y_train: Series) None#
- Abstract method to train the model. - Parameters:
- X_train (pd.DataFrame) – Feature matrix for training data. 
- y_train (pd.Series) – Target variable for training data. 
 
- Raises:
- NotImplementedError – This method must be implemented in a subclass. 
 
 - abstractmethod predict(X_test: DataFrame) ndarray#
- Abstract method to generate predictions. - Parameters:
- X_test (pd.DataFrame) – Feature matrix for testing data. 
- Returns:
- Array of predictions. 
- Return type:
- np.ndarray 
- Raises:
- NotImplementedError – This method must be implemented in a subclass.