Overview

QTFPred (Quantum-based Transcription Factor Predictor) is a quantum-classical hybrid deep learning framework for predicting transcription factor (TF) binding signals at base-pair resolution. By integrating quantum convolutional layers with fully convolutional neural networks (FCNs), QTFPred achieves state-of-the-art performance, particularly in data-sparse scenarios where conventional methods struggle.

Download

The package of QTFPred can be downloaded from here (QTFPred_signal.v6.tar.gz)

Execute the Extraction Command

Use the tar -zxvf command to decompress and extract the archive. This will create the top-level repository folder.

tar -zxvf QTFPred_signal.v6.tar.gz

Implementation Reference

QTFPred builds upon the implementation approach of FCNsignal:

FCNsignal Paper: Base-resolution prediction of transcription factor binding signals by a deep learning framework | PLOS Computational Biology, 2022
FCNsignal GitHub: https://github.com/turningpoint1988/FCNsignal

Key Features

Quantum Convolutional Layer (QConv): 4-qubit parameterized quantum circuit for enhanced feature extraction
Hybrid Architecture: Seamless integration of quantum and classical neural network layers
Comprehensive Evaluation: Evaluation metrics include RMSE, Pearson correlation (PR), AUROC, and AUPRC
Pre-optimized Hyperparameters: Hyperparameters tuned via Optuna for optimal performance
Interpretable Motifs: Extract position frequency matrices (PFMs) from learned quantum filters

Competing Methods

QTFPred can be compared with the following state-of-the-art methods for TF binding prediction:

BPNet – Base-pair resolution neural network for TF binding and chromatin accessibility prediction
FCNsignal – Fully convolutional network for TF binding signal prediction at base resolution

System Requirements

Hardware Requirements

GPU: NVIDIA GPU with CUDA support (H100 or A100 recommended)
GPU Memory: 5GB or more recommended

Software Requirements

OS: Linux (Ubuntu 20.04 or later recommended)
CUDA: 12.0.1
cuDNN: 8.x
Singularity: 3.8 or higher

Installation

Prerequisites

Ensure your working directory is set to the QTFPred_signal repository root:

cd /path/to/QTFPred_signal
export QTFPRED_ROOT=$(pwd)

Quick Setup (Recommended)

The repository includes a pre-built Singularity container (singularity/test.v2.sif, 11GB) with all dependencies installed. This is the recommended approach since the MEME Suite website is currently down.

# Verify the pre-built container exists
ls -lh singularity/test.v2.sif
# Expected: ~11GB file

# Test the container
singularity exec --nv singularity/test.v2.sif python3.11 -c "
import torch
import pennylane as qml
print(f'PyTorch: {torch.__version__}')
print(f'PennyLane: {qml.__version__}')
print(f'CUDA available: {torch.cuda.is_available()}')
"

Alternative: Build Container from Definition (Optional)

If you need to rebuild the container:

# Requires sudo privileges
sudo singularity build singularity/test.v2.sif singularity/project_FCNsignal.def

Note: Building takes approximately 15-20 minutes.

Before You Start: Configuring Paths

Understanding Singularity Bind Mounts

QTFPred uses Singularity containers to ensure reproducible execution environments. To use the scripts, you need to configure bind mounts that connect your host filesystem to the container’s internal filesystem.

What is a Bind Mount?

A bind mount allows the Singularity container to access files and directories on your host system. Without proper bind configuration, the container cannot read your data or save results.

Bind Mount Syntax

BIND_PATH="HOST_PATH:CONTAINER_PATH"

HOST_PATH (left side): Your actual filesystem path where QTFPred_signal is located
CONTAINER_PATH (right side): Fixed internal path inside the container (/mnt/QTFPred_signal)
The colon (:) separates the two paths

Why This Matters

When you run scripts, they execute inside the Singularity container. The container has its own isolated filesystem. By setting BIND_PATH, you tell Singularity to “mount” your host directory at a specific location inside the container, making your files accessible.

How to Configure Scripts

Each execution script contains a “User Configuration” section at the top that you must update before running:

Step 1: Locate Your QTFPred_signal Path

First, determine the absolute path to your QTFPred_signal repository:

cd QTFPred_signal
pwd
# Output example: /home/username/QTFPred_signal

Step 2: Edit Each Script

Open the script you want to run and find the “User Configuration” section (first 20-30 lines):

Example from execute_train_QTFPred_signal.sh:

# ============================================================================
# User Configuration (REQUIRED: Update these paths for your environment)
# ============================================================================
#
# Set the absolute path to your QTFPred_signal repository root directory.
# This path should point to where you cloned/extracted QTFPred_signal.
#
# Example configurations:
#   PROJECT_ROOT="/home/username/QTFPred_signal"
#   PROJECT_ROOT="/data/projects/QTFPred_signal"
#   PROJECT_ROOT="/mnt/storage/research/QTFPred_signal"
#
PROJECT_ROOT="/path/to/QTFPred_signal"  # <- CHANGE THIS LINE

# Singularity container path (relative to PROJECT_ROOT)
SINGULARITY_CONTAINER_PATH="${PROJECT_ROOT}/singularity/test.v2.sif"

# Bind path configuration (HOST:CONTAINER format)
BIND_PATH="${PROJECT_ROOT}:/mnt/QTFPred_signal"  # <- This updates automatically

# Python path for container environment
SINGULARITYENV_PYTHONPATH="/mnt/QTFPred_signal/scripts:${PYTHONPATH}"

Step 3: Update PROJECT_ROOT

Replace /path/to/QTFPred_signal with your actual path:

# Before (default)
PROJECT_ROOT="/path/to/QTFPred_signal"

# After (your environment - example)
PROJECT_ROOT="/home/username/QTFPred_signal"

Step 4: Save and Verify

After editing, verify your configuration:

# Check that the container exists at the specified path
ls ${PROJECT_ROOT}/singularity/test.v2.sif

# Output: Should show the 11GB container file

The following scripts need path configuration before use:

Script	Purpose	Configuration Required
`execute_train_QTFPred_signal.sh`	Train quantum model	✓
`execute_train_FCNsignal_signal.sh`	Train FCNsignal	✓
`execute_train_BPNet_signal.sh`	Train BPNet	✓
`execute_bed2signal.sh`	Data preprocessing	✓
`execute_download.sh`	Download ChIP-seq data	✓
`extract_motif_from_QTFPred.sh`	Extract motifs	✓
`run_tomtom_against_JASPAR.sh`	TomTom analysis	✓

Directory Structure

QTFPred_signal/
├── README.md
├── data/                              # Data directory
│   ├── HeLa-S3/                       # Cell line directory
│   │   ├── datalist.txt              # List of TFs to download
│   │   └── ELK1/                     # TF directory (example data included)
│   │       ├── thresholded.bed       # IDR thresholded peaks
│   │       ├── p-value.bigWig        # ChIP-seq signal
│   │       └── data/                 # Preprocessed data
│   │           ├── ELK1_train.npz    # Training data (~180MB)
│   │           ├── ELK1_test.npz     # Test data (~23MB)
│   │           └── ELK1_neg.npz      # Negative data (~22MB)
│   ├── K562/                          # Other cell lines
│   │   └── datalist.txt
│   ├── GM12878/
│   │   └── datalist.txt
│   ├── Genome/                        # Reference genome (※user downloads)
│   │   ├── hg38.fa                   # ※obtained via download_genome.sh
│   │   └── chromsize                 # ※obtained via download_genome.sh
│   └── JASPAR/                        # Motif database
│       └── JASPAR2024_CORE_vertebrates_non-redundant_pfms_meme.txt
├── scripts/                           # Execution scripts
│   ├── models/                        # Model definitions
│   │   ├── QTFPred_signal.py         # Quantum model
│   │   ├── FCNmotif.py               # Classical model components
│   │   └── quantum_convolutional_layer.py  # Quantum convolutional layer
│   ├── data_processing/               # Data processing
│   │   ├── download_genome.sh        # Download genome reference
│   │   ├── download_encode_data.py   # Download ChIP-seq data (Python)
│   │   ├── execute_download.sh       # Download ChIP-seq data (Shell)
│   │   ├── bed2signal.py             # Preprocessing (Python)
│   │   ├── execute_bed2signal.sh     # Preprocessing (Shell)
│   │   └── datasets.py               # Dataset class
│   ├── training_execution_sh/         # Training execution scripts
│   │   ├── execute_train_QTFPred_signal.sh   # Train quantum model
│   │   ├── execute_train_FCNsignal_signal.sh # Train FCNsignal
│   │   └── execute_train_BPNet_signal.sh     # Train BPNet
│   ├── run_model/                     # Model execution
│   │   ├── run_QTFPred_signal.py     # Run quantum model
│   │   ├── run_classical_signal.py   # Run classical models
│   │   └── Trainer_signal.py         # Trainer class
│   ├── motif/                         # Motif analysis
│   │   ├── extract_motif_from_QTFPred.sh     # Extract motifs (Shell)
│   │   ├── extract_motif_from_QTFPred.py     # Extract motifs (Python)
│   │   └── run_tomtom_against_JASPAR.sh      # TomTom analysis
│   └── utils/                         # Utilities
│       ├── loss.py                   # Loss functions
│       └── check_npz_shapes.py       # Data validation
├── singularity/                       # Singularity container
│   ├── test.v2.sif                   # Pre-built image (11GB, recommended)
│   ├── project_FCNsignal.def         # Container definition file
│   └── requirements.txt              # Python dependencies
├── experiments/                       # Experiment results (created after execution)
│   └── {model}_{cell}_{TF}_{date}/   # Experiment directory
│       ├── training/                 # Training results
│       │   ├── model_best.pth        # Best model weights
│       │   ├── record.txt            # Evaluation metrics
│       │   ├── info.log              # Execution log
│       │   ├── debug.log             # Debug log
│       │   └── losscurve/            # Loss curves
│       │       └── LossCurve.png
│       └── motif/                    # Motif analysis results
│           ├── motif.meme            # Extracted motifs (MEME format)
│           ├── info.log
│           ├── debug.log
│           └── tomtom/               # TomTom analysis results
│               ├── tomtom.tsv        # Matching results
│               └── tomtom.xml
├── notebooks/                         # Tutorial Jupyter notebooks
│   ├── 01_quantum_computing_introduction.ipynb        # Quantum computing basics
│   ├── 02_quantum_convolutional_layer_tutorial.ipynb  # Quantum convolutional layer
│   ├── requirements.txt               # Python dependencies for notebooks
│   └── dev/                           # Development versions (archived)
├── docs/                              # Documentation (empty)
└── logs/                              # Logs (empty)

Notes:

Files under data/Genome/ (hg38.fa, chromsize) must be downloaded by users via download_genome.sh
docs/ and logs/ directories are initially empty
experiments/ directory is automatically created during training execution

Quick Start

This quick start demonstrates training QTFPred using the pre-included HeLa-S3/ELK1 dataset, allowing you to immediately evaluate the model without downloading additional data.

# Set working directory
cd /path/to/QTFPred_signal

# Step 1: Verify pre-existing data
ls data/HeLa-S3/ELK1/data/
# Expected output:
#   ELK1_train.npz (~180MB)
#   ELK1_test.npz (~23MB)
#   ELK1_neg.npz (~22MB)

# Step 2: Configure script paths (REQUIRED - First time only)
# Before running training, configure PROJECT_ROOT in the script
# See "Before You Start: Configuring Paths" section above for details
nano scripts/training_execution_sh/execute_train_QTFPred_signal.sh
# Change: PROJECT_ROOT="/path/to/QTFPred_signal"
# To: PROJECT_ROOT="/your/actual/path/to/QTFPred_signal"

# Step 3: Train QTFPred (quantum model)
# Note: This step requires GPU and takes approximately 30-60 minutes
bash scripts/training_execution_sh/execute_train_QTFPred_signal.sh HeLa-S3 ELK1

# Step 4: Check training results
# Results are saved in experiments/QTFPred_signal_HeLa-S3_ELK1_{date}/training/
cat experiments/QTFPred_signal_HeLa-S3_ELK1_*/training/record.txt

Expected Output in record.txt:

Test Results:
Regression - RMSE: 0.XXX, PR: 0.7X-0.8X
Classification - AUROC: 0.8X-0.9X, AUPRC: 0.7X-0.8X
Sample Size:
Train: ~8000, Test: ~1000, Negative: ~1000

Complete Workflow

This section describes the complete workflow from raw data download to motif analysis. If you want to process your own ChIP-seq data, follow these steps sequentially.

Important: Before executing any scripts in this workflow, you must configure the PROJECT_ROOT path in each script. See the “Before You Start: Configuring Paths” section for detailed instructions. The scripts that require configuration are listed in the configuration section.

Step 1: Singularity Container Setup

Recommended: Use the pre-built container included in the repository.
Note: The pre-built container is recommended because the MEME Suite website is currently experiencing downtime, which may cause build failures.

cd /path/to/QTFPred_signal

# Verify container exists
ls -lh singularity/test.v2.sif
# Expected: 11409920000 bytes (~11GB)

# Test container functionality
singularity exec singularity/test.v2.sif python3.11 --version
singularity exec singularity/test.v2.sif meme -version

Alternative: Build the container yourself (requires sudo privileges and ~20 minutes).

sudo singularity build singularity/test.v2.sif singularity/project_FCNsignal.def

Step 2: Data Download

2a. Download Genome Reference

Download the hg38 human genome reference and chromosome size information:

cd /path/to/QTFPred_signal

# Download hg38.fa (~938MB) and chromsize
bash scripts/data_processing/download_genome.sh

# Verify downloaded files
ls -lh data/Genome/
# Expected output:
#   hg38.fa (~3GB uncompressed)
#   chromsize (~3KB)

2b. Download ChIP-seq Data

Download ChIP-seq datasets (peak files and signal tracks) for specific cell lines. The repository includes datalist.txt files for three cell lines:

HeLa-S3: 12 TFs (CTCF, E2F1, E2F6, ELK1, ELK4, JUND, MAFF, MAX, MAZ, REST, RFX5, TBP)
K562: Multiple TFs (see data/K562/datalist.txt)
GM12878: Multiple TFs (see data/GM12878/datalist.txt)

# Example: Download all ChIP-seq data for HeLa-S3
bash scripts/data_processing/execute_download.sh HeLa-S3

# Output structure:
# data/HeLa-S3/{TF}/
#   ├── thresholded.bed      # IDR thresholded peaks
#   └── p-value.bigWig       # ChIP-seq signal track

Arguments:

<cell_line> – Cell line name (HeLa-S3, K562, or GM12878)
[--force] – Force re-download existing files (optional)
[--verbose] – Enable verbose logging (optional)
[--dry_run] – Test URLs without downloading (optional)

Note: You can download specific cell lines only. For example, to start with HeLa-S3:

bash scripts/data_processing/execute_download.sh HeLa-S3

Step 3: Data Preprocessing (bed2signal)

Convert BED peak files and BigWig signal files into NPZ format suitable for model training.

cd /path/to/QTFPred_signal

# Example: Preprocess E2F6 data for HeLa-S3
bash scripts/data_processing/execute_bed2signal.sh HeLa-S3 E2F6

# Output: data/HeLa-S3/E2F6/data/
#   ├── E2F6_train.npz
#   ├── E2F6_test.npz
#   └── E2F6_neg.npz

Arguments:

<cell_line> – Cell line name (e.g., HeLa-S3, K562, GM12878)
<TF_name> – Transcription factor name (e.g., E2F6, ELK1, CTCF)

Processing steps:

Extract 1,000 bp sequences centered on each peak position
Apply random position shifts (-100 to 100 bp) for augmentation
Filter samples with signal values in bottom 5%
Generate negative samples from 3,000 bp upstream regions
Normalize signal values: log10(1 + signal)

Processing time: 5-15 minutes per TF depending on peak count.

Note: For quick testing with pre-processed data, see the Quick Start section which uses HeLa-S3/ELK1 with included NPZ files.

Step 4: Model Training

Tip for Large-Scale Experiments: For training across multiple TFs and cell lines, we recommend using a job management system such as SLURM. The provided scripts are compatible with SLURM array jobs for efficient parallel processing.

Train models to predict TF binding signals from DNA sequences. QTFPred supports both quantum and classical models.

4a. Quantum Model (QTFPred)

Train the quantum-enhanced model with 4-qubit quantum convolutional layers:

cd /path/to/QTFPred_signal

# Train QTFPred for HeLa-S3/E2F6
bash scripts/training_execution_sh/execute_train_QTFPred_signal.sh HeLa-S3 E2F6

# Output directory structure:
# experiments/QTFPred_signal_HeLa-S3_E2F6_{date}/training/
#   ├── model_best.pth              # Best model weights (saved at lowest validation loss)
#   ├── record.txt                  # Evaluation metrics (RMSE, PR, AUROC, AUPRC)
#   ├── info.log                    # Training progress log
#   ├── debug.log                   # Detailed debug information
#   └── losscurve/
#       └── LossCurve.png           # Training/validation loss curves

Arguments:

<cell_line> – Cell line name (e.g., HeLa-S3, K562, GM12878)
<TF_name> – Transcription factor name (e.g., E2F6, ELK1, CTCF)

Evaluation metrics in record.txt:

RMSE: Root Mean Square Error (regression task)
PR: Pearson Correlation (regression task)
AUROC: Area Under ROC Curve (classification task)
AUPRC: Area Under Precision-Recall Curve (classification task)

4b. Classical Models (for comparison)

Train baseline classical models for performance comparison:

FCNsignal:

bash scripts/training_execution_sh/execute_train_FCNsignal_signal.sh HeLa-S3 E2F6

# Output: experiments/FCNsignal_HeLa-S3_E2F6_{date}/training/

Arguments:

<cell_line> – Cell line name (e.g., HeLa-S3, K562, GM12878)
<TF_name> – Transcription factor name (e.g., E2F6, ELK1, CTCF)

BPNet:

bash scripts/training_execution_sh/execute_train_BPNet_signal.sh HeLa-S3 E2F6

# Output: experiments/BPNet_HeLa-S3_E2F6_{date}/training/

Arguments:

<cell_line> – Cell line name (e.g., HeLa-S3, K562, GM12878)
<TF_name> – Transcription factor name (e.g., E2F6, ELK1, CTCF)

Step 5: Motif Extraction (Quantum Model Only)

Extract learned TF binding motifs from the quantum convolutional filters. This step applies only to QTFPred, as quantum filters learn interpretable sequence patterns.

cd /path/to/QTFPred_signal

# Extract motifs from trained QTFPred model
# Replace {experiment_name} with your actual experiment directory name
# Example: QTFPred_signal_HeLa-S3_E2F6_1027
bash scripts/motif/extract_motif_from_QTFPred.sh HeLa-S3 E2F6 QTFPred_signal_HeLa-S3_E2F6_1027

# Output: experiments/QTFPred_signal_HeLa-S3_E2F6_1027/motif/
#   ├── motif.meme         # 64 PFMs in MEME format (16 bp each)
#   ├── info.log           # Execution log
#   └── debug.log          # Debug information

Arguments:

<cell_line> – Cell line name (e.g., HeLa-S3, K562, GM12878)
<TF_name> – Transcription factor name (e.g., E2F6, ELK1, CTCF)
<experiment_name> – Experiment directory name from training step (e.g., QTFPred_signal_HeLa-S3_E2F6_1027)

What this step does:

Processes test dataset through trained QTFPred model
Identifies 100 bp sub-regions with highest predicted binding signals
Calculates activation scores from 64 quantum convolutional filters
Extracts 16 bp sub-sequences with highest activation for each filter
Constructs Position Frequency Matrices (PFMs) from high-scoring sequences
Outputs 64 PFMs representing learned motif patterns

Step 6: TomTom Analysis

Compare extracted motifs against the JASPAR 2024 vertabrate database to identify known TF binding motifs and discover cooperative binding patterns.

cd /path/to/QTFPred_signal

# Run TomTom analysis against JASPAR database
bash scripts/motif/run_tomtom_against_JASPAR.sh HeLa-S3 E2F6 QTFPred_signal_HeLa-S3_E2F6_1027

# Output: experiments/QTFPred_signal_HeLa-S3_E2F6_1027/motif/tomtom/
#   ├── tomtom.tsv         # Motif matching results (q-value < 0.1)
#   └── tomtom.xml         # Detailed XML output

Arguments:

<cell_line> – Cell line name (e.g., HeLa-S3, K562, GM12878)
<TF_name> – Transcription factor name (e.g., E2F6, ELK1, CTCF)
<experiment_name> – Experiment directory name from training step (e.g., QTFPred_signal_HeLa-S3_E2F6_1027)

Interpreting results (tomtom.tsv):

Query_ID: Filter number (0-63)
Target_ID: Matched JASPAR motif ID
p-value: Statistical significance
q-value: Multiple testing corrected p-value (threshold: < 0.1)
Overlap: Number of overlapping positions
Offset: Alignment offset

Tutorial Notebooks (Optional)

For users who want to:

Understand quantum computing fundamentals and QTFPred implementation
Apply quantum convolutional layers to custom use cases
Interactively learn quantum circuit learning principles

We provide interactive Jupyter notebooks in the notebooks/ directory.

Running Notebooks in VS Code (Recommended)

Quick Start – 3 Steps:

Open VS Code and open the QTFPred_signal folder
Install Extensions: Python + Jupyter (by Microsoft)
Open Notebook: notebooks/01_quantum_computing_introduction.ipynb
Select Kernel: Click top-right → Choose .venv: Python 3.11.x
Run Cells: Press Shift + Enter to execute sequentially

The repository includes a pre-configured virtual environment (.venv/) with Python 3.11 and all required dependencies (PennyLane, PyTorch, NumPy, Matplotlib, Jupyter).

Alternative: Jupyter Lab (Command Line)

# Activate the pre-configured environment
source .venv/bin/activate

# Launch Jupyter Lab
jupyter lab

# Opens browser at http://localhost:8888

Building Your Own Virtual Environment (Advanced)

If you prefer to create your own virtual environment instead of using the pre-configured .venv:

# Create virtual environment with Python 3.11
python3.11 -m venv my_qtfpred_env

# Activate environment
source my_qtfpred_env/bin/activate  # Linux/macOS
# OR
my_qtfpred_env\Scripts\activate     # Windows

# Install dependencies from notebooks/requirements.txt
pip install -r notebooks/requirements.txt

# Register kernel for Jupyter
python -m ipykernel install --user --name=my_qtfpred_env

# Launch Jupyter Lab or VS Code with this environment
jupyter lab

The notebooks/requirements.txt file contains all necessary dependencies including PennyLane, PyTorch, and visualization libraries.

Tutorial Contents

Notebook 01: Quantum Computing Introduction (01_quantum_computing_introduction.ipynb)

Bra-ket notation and quantum state vectors
Quantum gates (Hadamard, Pauli, CNOT, rotation gates)
Multi-qubit systems and entanglement
Measurement and expectation values
4-qubit circuits (QTFPred architecture foundation)
Parametric quantum circuits for machine learning
PennyLane and PyTorch integration basics

Notebook 02: Quantum Convolutional Layer Tutorial (02_quantum_convolutional_layer_tutorial.ipynb)

Part 1: Quantum circuit fundamentals with 4-qubit examples
Part 2: QTFPred’s quantum circuit architecture (36 parameters, data re-uploading)
Part 3-4: Single and multi-channel quantum convolution operations
Part 5: Kernel Division Strategy for receptive field extension (16 bp)
Part 6: Production QConv1d class usage with realistic examples (L=1001)
Part 7: PennyLane broadcasting for efficient batch processing (100-1000× speedup)
Part 8: Complete QTFPred model forward pass with base-resolution output

Prerequisites:

Notebook 02 assumes completion of Notebook 01
Basic understanding of machine learning and Python
Familiarity with PyTorch (optional but helpful)

Total Tutorial Time: ~3-4 hours for complete walkthrough

Hyperparameter Optimization (Optional)

For users who need to optimize hyperparameters for custom datasets, we provide an Optuna-based hyperparameter tuning workflow.
This feature enables automatic optimization of model hyperparameters.

When to Use Hyperparameter Optimization

Consider using Optuna tuning when:

Novel TF Targets: Working with TF targets not covered in the paper’s pre-optimized configurations
Custom Model Architectures: Developing quantum convolutional layer-based custom models

The hyperparameters included in the paper were optimized using this Optuna implementation.

What Optuna Optimizes

The optimization process searches for the best combination of:

Hyperparameter	Type	Search Range	Description
Learning rate	Log-scale	1e-5 to 1e-1	AdamW optimizer learning rate
Weight decay	Log-scale	1e-5 to 1e-1	AdamW optimizer weight decay
Batch size	Integer	20 to 120	Training batch size
Dropout	Float	0.1 to 0.8	Dropout rate for regularization
Init method	Categorical	Xavier, Default	Weight initialization method
Pooling type	Categorical	max, avg	Pooling layer type
Decoder kernel	Categorical	3, 5, 7	Decoder kernel size (odd only)
Activation	Categorical	elu, silu, gelu	Activation function
Bottleneck size	Integer	1 to 50	Bottleneck layer output size
GRU dropout	Float	0.1 to 0.8	GRU layer dropout rate
Quantum kernel	Categorical	3, 5, 7	Quantum kernel size (n_qubits)

Optimization Objective: Maximize Pearson correlation on test set

Basic Usage

cd /path/to/QTFPred_signal

# Example: Optimize hyperparameters for HeLa-S3/ELK1
bash scripts/optuna/run_optuna_QTFPred_signal.sh HeLa-S3 ELK1

# With custom settings
bash scripts/optuna/run_optuna_QTFPred_signal.sh HeLa-S3 ELK1 \
    --n_trials 50 \
    --max_epoch 20 \
    --study_name custom_study_name

Arguments:

<cell_line> – Cell line name (e.g., HeLa-S3, K562, GM12878)
<TF_name> – Transcription factor name (e.g., ELK1, CTCF, E2F6)
--n_trials – Number of optimization trials (default: 100)
--max_epoch – Training epochs per trial (default: 30)
--study_name – Optuna study name (default: QTFPred_{cell}_{TF})

Prerequisites:

Training and test data must be preprocessed (Step 3: bed2signal)
GPU recommended for reasonable optimization time

Understanding Output

After optimization completes, results are saved to:

experiments/optuna_QTFPred_signal_{cell}_{TF}_{date}/
├── optuna.log                    # Optuna framework logs
├── debug.log                     # Detailed execution logs
└── {study_name}.json             # Best hyperparameters (JSON format)

Example best_params.json:

{
    "batch_size": 64,
    "lr": 0.0001234,
    "weight_decay": 0.00567,
    "dropout": 0.35,
    "init_method_name": "Xavier",
    "pooling_type": "max",
    "decoder_kernel": 5,
    "activation": "gelu",
    "bottleneck_size": 25,
    "gru_dropout": 0.42,
    "kernel_size": 5
}

Advanced: Parallel Optimization

One of Optuna’s powerful features is parallel optimization. Multiple processes can contribute to the same optimization study simultaneously, dramatically accelerating the search process.

How it works:

Multiple processes share the same SQLite database and study name
Each process runs trials independently
Results are synchronized through the shared database
No manual coordination required

Example – Running 2 parallel optimization processes:

# Terminal 1: Start first optimization process
bash scripts/optuna/run_optuna_QTFPred_signal.sh HeLa-S3 ELK1 \
    --study_name shared_study \
    --n_trials 50

# Terminal 2: Start second process (simultaneously)
bash scripts/optuna/run_optuna_QTFPred_signal.sh HeLa-S3 ELK1 \
    --study_name shared_study \
    --n_trials 50

# Both processes contribute to the same study
# Total: 100 trials completed faster through parallel execution

Optuna Database Location:

Shared database: experiments/optuna_db/optuna_results.db
Studies persist across runs
Resume interrupted optimizations by using the same study name

Acknowledgment

This hyperparameter optimization functionality is powered by Optuna, an open-source hyperparameter optimization framework designed for machine learning. We gratefully acknowledge the Optuna development team for providing this powerful and user-friendly optimization library.

Citation

If you use QTFPred in your research, please cite:

@article{matsubara2025qtfpred,
  title={QTFPred: robust high-performance quantum machine learning modeling that predicts main and cooperative transcription factor bindings with base resolution},
  author={Matsubara, Taichi and Machida, Shuto and Owusu, Samuel Papa Kwesi and Asakura, Akihiro and Hashimoto, Hiroki and Matsuoka, Masanori and Nagasaki, Masao},
  journal={Briefings in Bioinformatics},
  volume={26},
  number={6},
  pages={bbaf604},
  year={2025},
  publisher={Oxford University Press}
}

Contact

For questions, issues, or feedback:

First Author: Taichi Matsubara
– Division of Biomedical Information Analysis
– Medical Research Center for High Depth Omics
– Medical Institute of Bioregulation, Kyushu University

Corresponding Author: Masao Nagasaki, Ph.D.
– Division of Biomedical Information Analysis
– Medical Research Center for High Depth Omics
– Medical Institute of Bioregulation, Kyushu University

Acknowledgments

This work was supported by:

ENCODE Project – ChIP-seq datasets for TF binding analysis
JASPAR – TF binding motif database (JASPAR 2024)
PennyLane – Quantum machine learning framework
PyTorch – Deep learning infrastructure
Optuna – Hyperparameter optimization framework
Singularity – Container platform for reproducible environments
MEME Suite – Motif analysis tools (TomTom, FIMO)

Last Updated: 2025-10-28
Version: 1.0.0
Repository: QTFPred_signal