Accessions Module

class protein_information_system.operation.extraction.accessions.AccessionManager(conf, session_required=True)

Bases: BaseTaskInitializer

The AccessionManager class is responsible for managing accession data within the system. It extends the BaseTaskInitializer to handle tasks such as loading accession data from CSV files and fetching accession codes from the UniProt API.

Purpose

The AccessionManager class provides functionalities to manage and process accession codes for biological data, ensuring they are properly stored and maintained in the database.

Key Features

  • Load from CSV: Load accession data from a specified CSV file and process it.

  • Fetch from API: Fetch accession data from the UniProt API based on specified search criteria.

  • Database Integration: Seamlessly integrates with the database to store and manage accession codes.

  • Logging: Inherits logging capabilities from BaseTaskInitializer for tracking operations and errors.

Configuration

Parameters:
  • conf (dict) – Configuration dictionary loaded from YAML or other sources.

  • session_required (bool) – Whether a database session is required. Default is True.

Example Usage:

Below are examples showing how to use the key functionalities of the AccessionManager:

from protein_information_system.tasks.accessions import AccessionManager

# Initialize the AccessionManager with configuration
config = {
    'load_accesion_csv': 'path_to_csv_file.csv',
    'load_accesion_column': 'accession_column_name',
    'tag': 'example_tag',
    'search_criteria': 'example_criteria',
    'limit': 200,
    'debug': True
}
accession_manager = AccessionManager(config)

# Load accessions from CSV
accession_manager.load_accessions_from_csv()

# Fetch accessions from API
accession_manager.fetch_accessions_from_api()
fetch_accessions_from_api()

Fetches accession codes from the UniProt API based on the specified search criteria.

This method queries the UniProt API with compression and pagination enabled. It processes the results and stores new accession codes in the database.

Raises:
  • requests.RequestException – If there is an error in the API request.

  • gzip.BadGzipFile – If the response is not a valid gzip file.

  • Exception – For any other errors during processing.

load_accessions_from_csv()

Loads accessions from a specified CSV file and processes them for data fetching.

This method reads accession codes from a CSV file, ensures they are unique, and processes them by invoking the _process_new_accessions method.

Raises:

Exception – If there is an issue loading or processing the CSV file.

process(_)

(Not used)

start()

(Not used)

store_entry(record)

(Not used)