Overview
QTFPred (Quantum-based Transcription Factor Predictor) is a quantum-classical hybrid deep learning framework for predicting transcription factor (TF) binding signals at base-pair resolution. By integrating quantum convolutional layers with fully convolutional neural networks (FCNs), QTFPred achieves state-of-the-art performance, particularly in data-sparse scenarios where conventional methods struggle.
Download
The package of QTFPred can be downloaded from here (QTFPred_signal.v6.tar.gz)
Execute the Extraction Command
Use the tar -zxvf command to decompress and extract the archive. This will create the top-level repository folder.
tar -zxvf QTFPred_signal.v6.tar.gz
Implementation Reference
QTFPred builds upon the implementation approach of FCNsignal:
- FCNsignal Paper: Base-resolution prediction of transcription factor binding signals by a deep learning framework | PLOS Computational Biology, 2022
- FCNsignal GitHub: https://github.com/turningpoint1988/FCNsignal
Key Features
- Quantum Convolutional Layer (QConv): 4-qubit parameterized quantum circuit for enhanced feature extraction
- Hybrid Architecture: Seamless integration of quantum and classical neural network layers
- Comprehensive Evaluation: Evaluation metrics include RMSE, Pearson correlation (PR), AUROC, and AUPRC
- Pre-optimized Hyperparameters: Hyperparameters tuned via Optuna for optimal performance
- Interpretable Motifs: Extract position frequency matrices (PFMs) from learned quantum filters
Competing Methods
QTFPred can be compared with the following state-of-the-art methods for TF binding prediction:
- BPNet – Base-pair resolution neural network for TF binding and chromatin accessibility prediction
- FCNsignal – Fully convolutional network for TF binding signal prediction at base resolution
System Requirements
Hardware Requirements
- GPU: NVIDIA GPU with CUDA support (H100 or A100 recommended)
- GPU Memory: 5GB or more recommended
Software Requirements
- OS: Linux (Ubuntu 20.04 or later recommended)
- CUDA: 12.0.1
- cuDNN: 8.x
- Singularity: 3.8 or higher
Installation
Prerequisites
Ensure your working directory is set to the QTFPred_signal repository root:
cd /path/to/QTFPred_signal
export QTFPRED_ROOT=$(pwd)
Quick Setup (Recommended)
The repository includes a pre-built Singularity container (singularity/test.v2.sif, 11GB) with all dependencies installed. This is the recommended approach since the MEME Suite website is currently down.
# Verify the pre-built container exists
ls -lh singularity/test.v2.sif
# Expected: ~11GB file
# Test the container
singularity exec --nv singularity/test.v2.sif python3.11 -c "
import torch
import pennylane as qml
print(f'PyTorch: {torch.__version__}')
print(f'PennyLane: {qml.__version__}')
print(f'CUDA available: {torch.cuda.is_available()}')
"
Alternative: Build Container from Definition (Optional)
If you need to rebuild the container:
# Requires sudo privileges
sudo singularity build singularity/test.v2.sif singularity/project_FCNsignal.def
Note: Building takes approximately 15-20 minutes.
Before You Start: Configuring Paths
Understanding Singularity Bind Mounts
QTFPred uses Singularity containers to ensure reproducible execution environments. To use the scripts, you need to configure bind mounts that connect your host filesystem to the container’s internal filesystem.
What is a Bind Mount?
A bind mount allows the Singularity container to access files and directories on your host system. Without proper bind configuration, the container cannot read your data or save results.
Bind Mount Syntax
BIND_PATH="HOST_PATH:CONTAINER_PATH"
- HOST_PATH (left side): Your actual filesystem path where QTFPred_signal is located
- CONTAINER_PATH (right side): Fixed internal path inside the container (
/mnt/QTFPred_signal) - The colon (
:) separates the two paths
Why This Matters
When you run scripts, they execute inside the Singularity container. The container has its own isolated filesystem. By setting BIND_PATH, you tell Singularity to “mount” your host directory at a specific location inside the container, making your files accessible.
How to Configure Scripts
Each execution script contains a “User Configuration” section at the top that you must update before running:
Step 1: Locate Your QTFPred_signal Path
First, determine the absolute path to your QTFPred_signal repository:
cd QTFPred_signal
pwd
# Output example: /home/username/QTFPred_signal
Step 2: Edit Each Script
Open the script you want to run and find the “User Configuration” section (first 20-30 lines):
Example from execute_train_QTFPred_signal.sh:
# ============================================================================
# User Configuration (REQUIRED: Update these paths for your environment)
# ============================================================================
#
# Set the absolute path to your QTFPred_signal repository root directory.
# This path should point to where you cloned/extracted QTFPred_signal.
#
# Example configurations:
# PROJECT_ROOT="/home/username/QTFPred_signal"
# PROJECT_ROOT="/data/projects/QTFPred_signal"
# PROJECT_ROOT="/mnt/storage/research/QTFPred_signal"
#
PROJECT_ROOT="/path/to/QTFPred_signal" # <- CHANGE THIS LINE
# Singularity container path (relative to PROJECT_ROOT)
SINGULARITY_CONTAINER_PATH="${PROJECT_ROOT}/singularity/test.v2.sif"
# Bind path configuration (HOST:CONTAINER format)
BIND_PATH="${PROJECT_ROOT}:/mnt/QTFPred_signal" # <- This updates automatically
# Python path for container environment
SINGULARITYENV_PYTHONPATH="/mnt/QTFPred_signal/scripts:${PYTHONPATH}"
Step 3: Update PROJECT_ROOT
Replace /path/to/QTFPred_signal with your actual path:
# Before (default)
PROJECT_ROOT="/path/to/QTFPred_signal"
# After (your environment - example)
PROJECT_ROOT="/home/username/QTFPred_signal"
Step 4: Save and Verify
After editing, verify your configuration:
# Check that the container exists at the specified path
ls ${PROJECT_ROOT}/singularity/test.v2.sif
# Output: Should show the 11GB container file
The following scripts need path configuration before use:
| Script | Purpose | Configuration Required |
|---|---|---|
execute_train_QTFPred_signal.sh | Train quantum model | ✓ |
execute_train_FCNsignal_signal.sh | Train FCNsignal | ✓ |
execute_train_BPNet_signal.sh | Train BPNet | ✓ |
execute_bed2signal.sh | Data preprocessing | ✓ |
execute_download.sh | Download ChIP-seq data | ✓ |
extract_motif_from_QTFPred.sh | Extract motifs | ✓ |
run_tomtom_against_JASPAR.sh | TomTom analysis | ✓ |
Directory Structure
QTFPred_signal/
├── README.md
├── data/ # Data directory
│ ├── HeLa-S3/ # Cell line directory
│ │ ├── datalist.txt # List of TFs to download
│ │ └── ELK1/ # TF directory (example data included)
│ │ ├── thresholded.bed # IDR thresholded peaks
│ │ ├── p-value.bigWig # ChIP-seq signal
│ │ └── data/ # Preprocessed data
│ │ ├── ELK1_train.npz # Training data (~180MB)
│ │ ├── ELK1_test.npz # Test data (~23MB)
│ │ └── ELK1_neg.npz # Negative data (~22MB)
│ ├── K562/ # Other cell lines
│ │ └── datalist.txt
│ ├── GM12878/
│ │ └── datalist.txt
│ ├── Genome/ # Reference genome (※user downloads)
│ │ ├── hg38.fa # ※obtained via download_genome.sh
│ │ └── chromsize # ※obtained via download_genome.sh
│ └── JASPAR/ # Motif database
│ └── JASPAR2024_CORE_vertebrates_non-redundant_pfms_meme.txt
├── scripts/ # Execution scripts
│ ├── models/ # Model definitions
│ │ ├── QTFPred_signal.py # Quantum model
│ │ ├── FCNmotif.py # Classical model components
│ │ └── quantum_convolutional_layer.py # Quantum convolutional layer
│ ├── data_processing/ # Data processing
│ │ ├── download_genome.sh # Download genome reference
│ │ ├── download_encode_data.py # Download ChIP-seq data (Python)
│ │ ├── execute_download.sh # Download ChIP-seq data (Shell)
│ │ ├── bed2signal.py # Preprocessing (Python)
│ │ ├── execute_bed2signal.sh # Preprocessing (Shell)
│ │ └── datasets.py # Dataset class
│ ├── training_execution_sh/ # Training execution scripts
│ │ ├── execute_train_QTFPred_signal.sh # Train quantum model
│ │ ├── execute_train_FCNsignal_signal.sh # Train FCNsignal
│ │ └── execute_train_BPNet_signal.sh # Train BPNet
│ ├── run_model/ # Model execution
│ │ ├── run_QTFPred_signal.py # Run quantum model
│ │ ├── run_classical_signal.py # Run classical models
│ │ └── Trainer_signal.py # Trainer class
│ ├── motif/ # Motif analysis
│ │ ├── extract_motif_from_QTFPred.sh # Extract motifs (Shell)
│ │ ├── extract_motif_from_QTFPred.py # Extract motifs (Python)
│ │ └── run_tomtom_against_JASPAR.sh # TomTom analysis
│ └── utils/ # Utilities
│ ├── loss.py # Loss functions
│ └── check_npz_shapes.py # Data validation
├── singularity/ # Singularity container
│ ├── test.v2.sif # Pre-built image (11GB, recommended)
│ ├── project_FCNsignal.def # Container definition file
│ └── requirements.txt # Python dependencies
├── experiments/ # Experiment results (created after execution)
│ └── {model}_{cell}_{TF}_{date}/ # Experiment directory
│ ├── training/ # Training results
│ │ ├── model_best.pth # Best model weights
│ │ ├── record.txt # Evaluation metrics
│ │ ├── info.log # Execution log
│ │ ├── debug.log # Debug log
│ │ └── losscurve/ # Loss curves
│ │ └── LossCurve.png
│ └── motif/ # Motif analysis results
│ ├── motif.meme # Extracted motifs (MEME format)
│ ├── info.log
│ ├── debug.log
│ └── tomtom/ # TomTom analysis results
│ ├── tomtom.tsv # Matching results
│ └── tomtom.xml
├── notebooks/ # Tutorial Jupyter notebooks
│ ├── 01_quantum_computing_introduction.ipynb # Quantum computing basics
│ ├── 02_quantum_convolutional_layer_tutorial.ipynb # Quantum convolutional layer
│ ├── requirements.txt # Python dependencies for notebooks
│ └── dev/ # Development versions (archived)
├── docs/ # Documentation (empty)
└── logs/ # Logs (empty)
Notes:
- Files under
data/Genome/(hg38.fa, chromsize) must be downloaded by users viadownload_genome.sh docs/andlogs/directories are initially emptyexperiments/directory is automatically created during training execution
Quick Start
This quick start demonstrates training QTFPred using the pre-included HeLa-S3/ELK1 dataset, allowing you to immediately evaluate the model without downloading additional data.
# Set working directory
cd /path/to/QTFPred_signal
# Step 1: Verify pre-existing data
ls data/HeLa-S3/ELK1/data/
# Expected output:
# ELK1_train.npz (~180MB)
# ELK1_test.npz (~23MB)
# ELK1_neg.npz (~22MB)
# Step 2: Configure script paths (REQUIRED - First time only)
# Before running training, configure PROJECT_ROOT in the script
# See "Before You Start: Configuring Paths" section above for details
nano scripts/training_execution_sh/execute_train_QTFPred_signal.sh
# Change: PROJECT_ROOT="/path/to/QTFPred_signal"
# To: PROJECT_ROOT="/your/actual/path/to/QTFPred_signal"
# Step 3: Train QTFPred (quantum model)
# Note: This step requires GPU and takes approximately 30-60 minutes
bash scripts/training_execution_sh/execute_train_QTFPred_signal.sh HeLa-S3 ELK1
# Step 4: Check training results
# Results are saved in experiments/QTFPred_signal_HeLa-S3_ELK1_{date}/training/
cat experiments/QTFPred_signal_HeLa-S3_ELK1_*/training/record.txt
Expected Output in record.txt:
Test Results:
Regression - RMSE: 0.XXX, PR: 0.7X-0.8X
Classification - AUROC: 0.8X-0.9X, AUPRC: 0.7X-0.8X
Sample Size:
Train: ~8000, Test: ~1000, Negative: ~1000
Complete Workflow
This section describes the complete workflow from raw data download to motif analysis. If you want to process your own ChIP-seq data, follow these steps sequentially.
Important: Before executing any scripts in this workflow, you must configure the
PROJECT_ROOTpath in each script. See the “Before You Start: Configuring Paths” section for detailed instructions. The scripts that require configuration are listed in the configuration section.
Step 1: Singularity Container Setup
Recommended: Use the pre-built container included in the repository.
Note: The pre-built container is recommended because the MEME Suite website is currently experiencing downtime, which may cause build failures.
cd /path/to/QTFPred_signal
# Verify container exists
ls -lh singularity/test.v2.sif
# Expected: 11409920000 bytes (~11GB)
# Test container functionality
singularity exec singularity/test.v2.sif python3.11 --version
singularity exec singularity/test.v2.sif meme -version
Alternative: Build the container yourself (requires sudo privileges and ~20 minutes).
sudo singularity build singularity/test.v2.sif singularity/project_FCNsignal.def
Step 2: Data Download
2a. Download Genome Reference
Download the hg38 human genome reference and chromosome size information:
cd /path/to/QTFPred_signal
# Download hg38.fa (~938MB) and chromsize
bash scripts/data_processing/download_genome.sh
# Verify downloaded files
ls -lh data/Genome/
# Expected output:
# hg38.fa (~3GB uncompressed)
# chromsize (~3KB)
2b. Download ChIP-seq Data
Download ChIP-seq datasets (peak files and signal tracks) for specific cell lines. The repository includes datalist.txt files for three cell lines:
- HeLa-S3: 12 TFs (CTCF, E2F1, E2F6, ELK1, ELK4, JUND, MAFF, MAX, MAZ, REST, RFX5, TBP)
- K562: Multiple TFs (see
data/K562/datalist.txt) - GM12878: Multiple TFs (see
data/GM12878/datalist.txt)
# Example: Download all ChIP-seq data for HeLa-S3
bash scripts/data_processing/execute_download.sh HeLa-S3
# Output structure:
# data/HeLa-S3/{TF}/
# ├── thresholded.bed # IDR thresholded peaks
# └── p-value.bigWig # ChIP-seq signal track
Arguments:
<cell_line>– Cell line name (HeLa-S3, K562, or GM12878)[--force]– Force re-download existing files (optional)[--verbose]– Enable verbose logging (optional)[--dry_run]– Test URLs without downloading (optional)
Note: You can download specific cell lines only. For example, to start with HeLa-S3:
bash scripts/data_processing/execute_download.sh HeLa-S3
Step 3: Data Preprocessing (bed2signal)
Convert BED peak files and BigWig signal files into NPZ format suitable for model training.
cd /path/to/QTFPred_signal
# Example: Preprocess E2F6 data for HeLa-S3
bash scripts/data_processing/execute_bed2signal.sh HeLa-S3 E2F6
# Output: data/HeLa-S3/E2F6/data/
# ├── E2F6_train.npz
# ├── E2F6_test.npz
# └── E2F6_neg.npz
Arguments:
<cell_line>– Cell line name (e.g., HeLa-S3, K562, GM12878)<TF_name>– Transcription factor name (e.g., E2F6, ELK1, CTCF)
Processing steps:
- Extract 1,000 bp sequences centered on each peak position
- Apply random position shifts (-100 to 100 bp) for augmentation
- Filter samples with signal values in bottom 5%
- Generate negative samples from 3,000 bp upstream regions
- Normalize signal values:
log10(1 + signal)
Processing time: 5-15 minutes per TF depending on peak count.
Note: For quick testing with pre-processed data, see the Quick Start section which uses HeLa-S3/ELK1 with included NPZ files.
Step 4: Model Training
Tip for Large-Scale Experiments: For training across multiple TFs and cell lines, we recommend using a job management system such as SLURM. The provided scripts are compatible with SLURM array jobs for efficient parallel processing.
Train models to predict TF binding signals from DNA sequences. QTFPred supports both quantum and classical models.
4a. Quantum Model (QTFPred)
Train the quantum-enhanced model with 4-qubit quantum convolutional layers:
cd /path/to/QTFPred_signal
# Train QTFPred for HeLa-S3/E2F6
bash scripts/training_execution_sh/execute_train_QTFPred_signal.sh HeLa-S3 E2F6
# Output directory structure:
# experiments/QTFPred_signal_HeLa-S3_E2F6_{date}/training/
# ├── model_best.pth # Best model weights (saved at lowest validation loss)
# ├── record.txt # Evaluation metrics (RMSE, PR, AUROC, AUPRC)
# ├── info.log # Training progress log
# ├── debug.log # Detailed debug information
# └── losscurve/
# └── LossCurve.png # Training/validation loss curves
Arguments:
<cell_line>– Cell line name (e.g., HeLa-S3, K562, GM12878)<TF_name>– Transcription factor name (e.g., E2F6, ELK1, CTCF)
Evaluation metrics in record.txt:
- RMSE: Root Mean Square Error (regression task)
- PR: Pearson Correlation (regression task)
- AUROC: Area Under ROC Curve (classification task)
- AUPRC: Area Under Precision-Recall Curve (classification task)
4b. Classical Models (for comparison)
Train baseline classical models for performance comparison:
FCNsignal:
bash scripts/training_execution_sh/execute_train_FCNsignal_signal.sh HeLa-S3 E2F6
# Output: experiments/FCNsignal_HeLa-S3_E2F6_{date}/training/
Arguments:
<cell_line>– Cell line name (e.g., HeLa-S3, K562, GM12878)<TF_name>– Transcription factor name (e.g., E2F6, ELK1, CTCF)
BPNet:
bash scripts/training_execution_sh/execute_train_BPNet_signal.sh HeLa-S3 E2F6
# Output: experiments/BPNet_HeLa-S3_E2F6_{date}/training/
Arguments:
<cell_line>– Cell line name (e.g., HeLa-S3, K562, GM12878)<TF_name>– Transcription factor name (e.g., E2F6, ELK1, CTCF)
Step 5: Motif Extraction (Quantum Model Only)
Extract learned TF binding motifs from the quantum convolutional filters. This step applies only to QTFPred, as quantum filters learn interpretable sequence patterns.
cd /path/to/QTFPred_signal
# Extract motifs from trained QTFPred model
# Replace {experiment_name} with your actual experiment directory name
# Example: QTFPred_signal_HeLa-S3_E2F6_1027
bash scripts/motif/extract_motif_from_QTFPred.sh HeLa-S3 E2F6 QTFPred_signal_HeLa-S3_E2F6_1027
# Output: experiments/QTFPred_signal_HeLa-S3_E2F6_1027/motif/
# ├── motif.meme # 64 PFMs in MEME format (16 bp each)
# ├── info.log # Execution log
# └── debug.log # Debug information
Arguments:
<cell_line>– Cell line name (e.g., HeLa-S3, K562, GM12878)<TF_name>– Transcription factor name (e.g., E2F6, ELK1, CTCF)<experiment_name>– Experiment directory name from training step (e.g., QTFPred_signal_HeLa-S3_E2F6_1027)
What this step does:
- Processes test dataset through trained QTFPred model
- Identifies 100 bp sub-regions with highest predicted binding signals
- Calculates activation scores from 64 quantum convolutional filters
- Extracts 16 bp sub-sequences with highest activation for each filter
- Constructs Position Frequency Matrices (PFMs) from high-scoring sequences
- Outputs 64 PFMs representing learned motif patterns
Step 6: TomTom Analysis
Compare extracted motifs against the JASPAR 2024 vertabrate database to identify known TF binding motifs and discover cooperative binding patterns.
cd /path/to/QTFPred_signal
# Run TomTom analysis against JASPAR database
bash scripts/motif/run_tomtom_against_JASPAR.sh HeLa-S3 E2F6 QTFPred_signal_HeLa-S3_E2F6_1027
# Output: experiments/QTFPred_signal_HeLa-S3_E2F6_1027/motif/tomtom/
# ├── tomtom.tsv # Motif matching results (q-value < 0.1)
# └── tomtom.xml # Detailed XML output
Arguments:
<cell_line>– Cell line name (e.g., HeLa-S3, K562, GM12878)<TF_name>– Transcription factor name (e.g., E2F6, ELK1, CTCF)<experiment_name>– Experiment directory name from training step (e.g., QTFPred_signal_HeLa-S3_E2F6_1027)
Interpreting results (tomtom.tsv):
- Query_ID: Filter number (0-63)
- Target_ID: Matched JASPAR motif ID
- p-value: Statistical significance
- q-value: Multiple testing corrected p-value (threshold: < 0.1)
- Overlap: Number of overlapping positions
- Offset: Alignment offset
Tutorial Notebooks (Optional)
For users who want to:
- Understand quantum computing fundamentals and QTFPred implementation
- Apply quantum convolutional layers to custom use cases
- Interactively learn quantum circuit learning principles
We provide interactive Jupyter notebooks in the notebooks/ directory.
Running Notebooks in VS Code (Recommended)
Quick Start – 3 Steps:
- Open VS Code and open the
QTFPred_signalfolder - Install Extensions: Python + Jupyter (by Microsoft)
- Open Notebook:
notebooks/01_quantum_computing_introduction.ipynb - Select Kernel: Click top-right → Choose
.venv: Python 3.11.x - Run Cells: Press
Shift + Enterto execute sequentially
The repository includes a pre-configured virtual environment (.venv/) with Python 3.11 and all required dependencies (PennyLane, PyTorch, NumPy, Matplotlib, Jupyter).
Alternative: Jupyter Lab (Command Line)
# Activate the pre-configured environment
source .venv/bin/activate
# Launch Jupyter Lab
jupyter lab
# Opens browser at http://localhost:8888
Building Your Own Virtual Environment (Advanced)
If you prefer to create your own virtual environment instead of using the pre-configured .venv:
# Create virtual environment with Python 3.11
python3.11 -m venv my_qtfpred_env
# Activate environment
source my_qtfpred_env/bin/activate # Linux/macOS
# OR
my_qtfpred_env\Scripts\activate # Windows
# Install dependencies from notebooks/requirements.txt
pip install -r notebooks/requirements.txt
# Register kernel for Jupyter
python -m ipykernel install --user --name=my_qtfpred_env
# Launch Jupyter Lab or VS Code with this environment
jupyter lab
The notebooks/requirements.txt file contains all necessary dependencies including PennyLane, PyTorch, and visualization libraries.
Tutorial Contents
Notebook 01: Quantum Computing Introduction (01_quantum_computing_introduction.ipynb)
- Bra-ket notation and quantum state vectors
- Quantum gates (Hadamard, Pauli, CNOT, rotation gates)
- Multi-qubit systems and entanglement
- Measurement and expectation values
- 4-qubit circuits (QTFPred architecture foundation)
- Parametric quantum circuits for machine learning
- PennyLane and PyTorch integration basics
Notebook 02: Quantum Convolutional Layer Tutorial (02_quantum_convolutional_layer_tutorial.ipynb)
- Part 1: Quantum circuit fundamentals with 4-qubit examples
- Part 2: QTFPred’s quantum circuit architecture (36 parameters, data re-uploading)
- Part 3-4: Single and multi-channel quantum convolution operations
- Part 5: Kernel Division Strategy for receptive field extension (16 bp)
- Part 6: Production
QConv1dclass usage with realistic examples (L=1001) - Part 7: PennyLane broadcasting for efficient batch processing (100-1000× speedup)
- Part 8: Complete QTFPred model forward pass with base-resolution output
Prerequisites:
- Notebook 02 assumes completion of Notebook 01
- Basic understanding of machine learning and Python
- Familiarity with PyTorch (optional but helpful)
Total Tutorial Time: ~3-4 hours for complete walkthrough
Hyperparameter Optimization (Optional)
For users who need to optimize hyperparameters for custom datasets, we provide an Optuna-based hyperparameter tuning workflow.
This feature enables automatic optimization of model hyperparameters.
When to Use Hyperparameter Optimization
Consider using Optuna tuning when:
- Novel TF Targets: Working with TF targets not covered in the paper’s pre-optimized configurations
- Custom Model Architectures: Developing quantum convolutional layer-based custom models
The hyperparameters included in the paper were optimized using this Optuna implementation.
What Optuna Optimizes
The optimization process searches for the best combination of:
| Hyperparameter | Type | Search Range | Description |
|---|---|---|---|
| Learning rate | Log-scale | 1e-5 to 1e-1 | AdamW optimizer learning rate |
| Weight decay | Log-scale | 1e-5 to 1e-1 | AdamW optimizer weight decay |
| Batch size | Integer | 20 to 120 | Training batch size |
| Dropout | Float | 0.1 to 0.8 | Dropout rate for regularization |
| Init method | Categorical | Xavier, Default | Weight initialization method |
| Pooling type | Categorical | max, avg | Pooling layer type |
| Decoder kernel | Categorical | 3, 5, 7 | Decoder kernel size (odd only) |
| Activation | Categorical | elu, silu, gelu | Activation function |
| Bottleneck size | Integer | 1 to 50 | Bottleneck layer output size |
| GRU dropout | Float | 0.1 to 0.8 | GRU layer dropout rate |
| Quantum kernel | Categorical | 3, 5, 7 | Quantum kernel size (n_qubits) |
Optimization Objective: Maximize Pearson correlation on test set
Basic Usage
cd /path/to/QTFPred_signal
# Example: Optimize hyperparameters for HeLa-S3/ELK1
bash scripts/optuna/run_optuna_QTFPred_signal.sh HeLa-S3 ELK1
# With custom settings
bash scripts/optuna/run_optuna_QTFPred_signal.sh HeLa-S3 ELK1 \
--n_trials 50 \
--max_epoch 20 \
--study_name custom_study_name
Arguments:
<cell_line>– Cell line name (e.g., HeLa-S3, K562, GM12878)<TF_name>– Transcription factor name (e.g., ELK1, CTCF, E2F6)--n_trials– Number of optimization trials (default: 100)--max_epoch– Training epochs per trial (default: 30)--study_name– Optuna study name (default: QTFPred_{cell}_{TF})
Prerequisites:
- Training and test data must be preprocessed (Step 3: bed2signal)
- GPU recommended for reasonable optimization time
Understanding Output
After optimization completes, results are saved to:
experiments/optuna_QTFPred_signal_{cell}_{TF}_{date}/
├── optuna.log # Optuna framework logs
├── debug.log # Detailed execution logs
└── {study_name}.json # Best hyperparameters (JSON format)
Example best_params.json:
{
"batch_size": 64,
"lr": 0.0001234,
"weight_decay": 0.00567,
"dropout": 0.35,
"init_method_name": "Xavier",
"pooling_type": "max",
"decoder_kernel": 5,
"activation": "gelu",
"bottleneck_size": 25,
"gru_dropout": 0.42,
"kernel_size": 5
}
Advanced: Parallel Optimization
One of Optuna’s powerful features is parallel optimization. Multiple processes can contribute to the same optimization study simultaneously, dramatically accelerating the search process.
How it works:
- Multiple processes share the same SQLite database and study name
- Each process runs trials independently
- Results are synchronized through the shared database
- No manual coordination required
Example – Running 2 parallel optimization processes:
# Terminal 1: Start first optimization process
bash scripts/optuna/run_optuna_QTFPred_signal.sh HeLa-S3 ELK1 \
--study_name shared_study \
--n_trials 50
# Terminal 2: Start second process (simultaneously)
bash scripts/optuna/run_optuna_QTFPred_signal.sh HeLa-S3 ELK1 \
--study_name shared_study \
--n_trials 50
# Both processes contribute to the same study
# Total: 100 trials completed faster through parallel execution
Optuna Database Location:
- Shared database:
experiments/optuna_db/optuna_results.db - Studies persist across runs
- Resume interrupted optimizations by using the same study name
Acknowledgment
This hyperparameter optimization functionality is powered by Optuna, an open-source hyperparameter optimization framework designed for machine learning. We gratefully acknowledge the Optuna development team for providing this powerful and user-friendly optimization library.
Citation
If you use QTFPred in your research, please cite:
@article{matsubara2025qtfpred,
title={QTFPred: robust high-performance quantum machine learning modeling that predicts main and cooperative transcription factor bindings with base resolution},
author={Matsubara, Taichi and Machida, Shuto and Owusu, Samuel Papa Kwesi and Asakura, Akihiro and Hashimoto, Hiroki and Matsuoka, Masanori and Nagasaki, Masao},
journal={Briefings in Bioinformatics},
volume={26},
number={6},
pages={bbaf604},
year={2025},
publisher={Oxford University Press}
}
Contact
For questions, issues, or feedback:
First Author: Taichi Matsubara
– Division of Biomedical Information Analysis
– Medical Research Center for High Depth Omics
– Medical Institute of Bioregulation, Kyushu University
Corresponding Author: Masao Nagasaki, Ph.D.
– Division of Biomedical Information Analysis
– Medical Research Center for High Depth Omics
– Medical Institute of Bioregulation, Kyushu University
Acknowledgments
This work was supported by:
- ENCODE Project – ChIP-seq datasets for TF binding analysis
- JASPAR – TF binding motif database (JASPAR 2024)
- PennyLane – Quantum machine learning framework
- PyTorch – Deep learning infrastructure
- Optuna – Hyperparameter optimization framework
- Singularity – Container platform for reproducible environments
- MEME Suite – Motif analysis tools (TomTom, FIMO)
Last Updated: 2025-10-28
Version: 1.0.0
Repository: QTFPred_signal