Structure Embeddings Module
- class protein_information_system.operation.embedding.structure_3di.Structure3DiManager(conf)
Bases:
QueueTaskInitializerManages the structural embedding process by encoding 3D atomic models into discrete 3Di representations.
This class handles the loading and preprocessing of 3D structures from mmCIF files, applies the mini3di encoder to generate embeddings, and stores the resulting data in the database. The process is designed to run asynchronously within a distributed task queue system.
- encoder
Encoder object used to compute 3Di states from atomic structures.
- Type:
mini3di.Encoder
- parser
Parser for reading and interpreting mmCIF files.
- Type:
Bio.PDB.MMCIFParser
- reference_attribute
Name of the reference attribute used in task publication (default: ‘model’).
- Type:
str
- enqueue()
Publishes structural embedding tasks for all available models (State entries).
This method queries the database for all structural models stored in the State table. It then publishes a separate task for each model to be processed asynchronously.
Behavior is optionally limited by the ‘limit_execution’ parameter in the configuration.
Notes
Each published task contains the dictionary representation of a State object, which includes model metadata such as file paths and identifiers.
- prepare_new_chain(bio_model)
Normalizes and rebuilds all chains in a structural model into a single renumbered chain.
This method traverses all chains in the given model, copies and renumbers their residues sequentially, and merges them into a synthetic chain labeled ‘A’. This ensures compatibility with the encoder, which expects a single continuous chain.
- Parameters:
bio_model (Bio.PDB.Model.Model) – The structural model extracted from a parsed mmCIF file.
- Returns:
A new chain object containing all residues from the original model, renumbered and unified.
- Return type:
Bio.PDB.Chain.Chain
- process(batch)
Loads and parses a 3D structural model, prepares it for encoding, and returns the embedding result.
Given the metadata of a structural model (including its file path), this method loads the corresponding mmCIF file, extracts the first model available, and normalizes its residue chains. It then proceeds to encode the resulting chain using the 3Di encoder.
- Parameters:
model_info (dict) – Dictionary containing information about the structural model, including: - ‘file_path’: Relative path to the mmCIF file. - ‘id’: Model identifier.
- Returns:
A dictionary with the model ID and its corresponding embedding, or None if an error occurs during parsing or encoding.
- Return type:
dict or None
Notes
Only the first model in the mmCIF file is processed.
If the structure file is empty or malformed, the method logs a warning and returns None.
- process_chain(chain, model_info)
Encodes a protein chain into a sequence of 3Di states and prepares the embedding result.
This method applies the mini3di.Encoder to convert the given protein chain into a symbolic representation of its local structural patterns. If encoding fails, it logs detailed information about the chain to facilitate debugging.
- Parameters:
chain (Bio.PDB.Chain.Chain) – Protein chain object prepared by prepare_new_chain.
model_info (dict) – Dictionary with metadata about the original structure, including its ‘id’.
- Returns:
Dictionary with keys: - ‘model_id’: identifier of the original structure. - ‘embedding’: encoded 3Di sequence. Returns None if encoding fails.
- Return type:
dict or None
- store_entry(records)
Persists the 3Di embedding of a structural model into the database.
This method creates a new Structure3Di object using the model ID and the encoded embedding sequence. It attempts to store the entry in the database and handles any errors by rolling back the transaction and logging the exception.
- Parameters:
record (dict) – Dictionary with the following keys: - ‘model_id’: Identifier of the structural model (State.id). - ‘embedding’: List or string representing the 3Di sequence.
Notes
Commits the transaction only if insertion is successful.
Logs success or failure events accordingly.