Install SWARM

Download the code and models

Simply clone from github (install lfs to download large h5 files):

git lfs install
git clone https://github.com/comprna/SWARM/ && cd SWARM 

If git lfs cannot be installed, download the models from the dropbox link:

git clone https://github.com/comprna/SWARM/ && cd SWARM 
rm -rf SWARM_models
wget 'https://www.dropbox.com/scl/fi/wghpvv9plhr4mbpwkuqjd/SWARM_models.tar.gz?rlkey=i1z1do97wbgn0stoaakh117qy&st=ih3xs5fa&dl=0' -O SWARM_models.tgz
tar -xzf SWARM_models.tgz && rm -f SWARM_models.tgz

Compile SWARM preprocessing

cd SWARM_scripts/preprocess/
#build and compile htslib, slow5tools, SWARM_preprocess
bash build.sh

Dependencies

SWARM supports GPU inference with tensorflow, tested with versions 2.8.0 and 2.15.0

GPU-configured tensorflow should be available on most HPC systems. Otherwise, you can install tensorflow configured for GPU as per https://www.tensorflow.org/install/

python requirements:

python==3.11.7
tensorflow==2.15.0
numpy==1.26.2
pandas==2.2.0
scikit-learn==1.4.0
pysam==0.22.1
scipy==1.14.1
statsmodels==0.14.4

Example for setting up the SWARM environment with conda:

conda create -n SWARM python==3.11.7 numpy==1.26.2 pandas==2.2.0 scikit-learn==1.4.0 pysam==0.22.1 scipy==1.14.1 statsmodels==0.14.4
conda activate SWARM

File tree

└── SWARM
    ├── README.md
    ├── SWARM_models
    │   ├── kmer_model
    │   │   ├── model_5-mer.RNA002.csv
    │   │   └── model_5-mer.RNA004.csv
    │   ├── Model1
    │   │   ├── RNA002
    │   │   │   ├── m5C
    │   │   │   │   └── Model_100_epoch_relu.h5
    │   │   │   ├── m6A
    │   │   │   │   └── Model_100_epoch_relu.h5
    │   │   │   └── pU
    │   │   │       └── Model_100_epoch_relu.h5
    │   │   └── RNA004
    │   │       ├── m5C
    │   │       │   └── Model_100_epoch_relu.h5
    │   │       ├── m6A
    │   │       │   └── Model_100_epoch_relu.h5
    │   │       └── pU
    │   │           └── Model_100_epoch_relu.h5
    │   └── Model2
    │       ├── RNA002
    │       │   ├── m5C
    │       │   │   └── Model_100_epoch_relu.h5
    │       │   ├── m6A
    │       │   │   └── Model_100_epoch_relu.h5
    │       │   └── pU
    │       │       └── Model_100_epoch_relu.h5
    │       └── RNA004
    │           ├── m5C
    │           │   └── Model_100_epoch_relu.h5
    │           ├── m6A
    │           │   └── Model_100_epoch_relu.h5
    │           └── pU
    │               └── Model_100_epoch_relu.h5
    └── SWARM_scripts
        ├── predict
        │   ├── DL_models.py
        │   ├── network_21122023.py
        │   ├── network_2132024.py
        │   ├── network_27082022.py
        │   ├── predict_model1_from_pickle.py
        │   ├── predict_model1_parallel_modbam.py
        │   └── predict_model1_parallel.py
        ├── preprocess
        │   ├── argagg.hpp
        │   ├── build.sh
        │   ├── check_RNA_kit.cpp
        │   ├── Makefile
        │   ├── split_bams.py
        │   ├── SWARM_preprocess.cpp
        │   ├── SWARM_preprocess.py
        │   ├── SWARM_preprocess_target_9mers.cpp
        │   └── SWARM_preprocess_targets.cpp
        ├── process_modbam.py
        ├── SWARM_diff.py
        ├── SWARM_read_level.py
        ├── SWARM_site_level.py
        └── train_models
            ├── assemble_data.py
            ├── network_27082022.py
            ├── split_training_by_9mers.py
            ├── train_model1.py
            └── trim_tsv_events.py