Base Tasks

Base tasks form the core of the system’s operations, providing a foundational framework for data extraction, processing, and storage. These tasks can be implemented in both single-threaded and multi-threaded or multiprocessing environments. They are directly integrated with the database via Object-Relational Mapping (ORM), facilitating session management and ensuring data consistency.

Purpose

The BaseTaskInitializer class serves as an abstract base class that defines the common structure and behavior for all base tasks within the system. It provides essential methods for initializing database sessions, handling configuration constants, and defining abstract methods that must be implemented by subclasses to process specific bioinformatics data.

Customization

To create a custom task, subclass BaseTaskInitializer and implement the start, process, and store_entry methods. These methods define the logic for processing specific data sources and storing the processed data in the database.

Key Features

  • Session Management: Integrates seamlessly with the database to manage sessions and maintain data consistency.

  • Configuration Handling: Loads and processes configuration constants from YAML files to ensure that all tasks are initialized with the correct settings.

  • Extensibility: Abstract methods are provided to allow developers to define specific task logic for their bioinformatics workflows.

Example Usage

Here is an example of how to subclass BaseTaskInitializer:

from protein_information_system.tasks.base import BaseTaskInitializer

class MyCustomTask(BaseTaskInitializer):
    def start(self):
        # Implementation of the start method
        pass

    def process(self, target):
        # Processing logic for the target data
        pass

    def store_entry(self, record):
        # Logic to store the processed record in the database
        pass
class protein_information_system.tasks.base.BaseTaskInitializer(conf, session_required=True)

Bases: ABC

The BaseTaskInitializer class provides a foundation for creating bioinformatics tasks that interact with the database via ORM.

This class is abstract and should be subclassed to define specific task processing logic. It handles session management, loading configuration constants, and provides an interface for starting, processing, and storing task data.

Additionally, it initializes a task-specific logger.

conf

Configuration dictionary loaded from YAML or other sources.

Type:

dict

logger

Logger instance for logging task-specific information.

Type:

Logger

session

Database session used for ORM operations.

Type:

Session

load_constants(constants_path)

Load and handle predefined constants.

This method processes the constants YAML file and ensures that all required structural and prediction constants are up-to-date in the database.

Logs the process of loading constants.

Parameters:

constants_path (str) – Path to the YAML file containing configuration constants.

Raises:
  • FileNotFoundError – If the constants file cannot be found.

  • yaml.YAMLError – If there is an error parsing the YAML file.

abstract process(target)

Process the given target.

This method should be implemented by all subclasses to define the specific processing logic for each bioinformatics data source.

Parameters:

target – The target data to be processed.

Raises:

NotImplementedError – If the subclass does not implement this method.

session_init()

Initialize the database session using DatabaseManager.

Sets up the database connection and session using the DatabaseManager class. Logs the initialization process.

Raises:

Exception – If the session initialization fails.

abstract start()

Start the operation over the data process.

This method should be implemented by all subclasses to define the specific data operation logic for each bioinformatics data source.

Raises:

NotImplementedError – If the subclass does not implement this method.

abstract store_entry(record)

Store the processed entry.

This method should be implemented by all subclasses to define the specific storage logic for processed data.

Parameters:

record – The processed data record to be stored.

Raises:

NotImplementedError – If the subclass does not implement this method.