AnswerALS Data Portal

This data portal is designed for scientists to explore data from the most comprehensive studies of ALS.
It is organized as follows:

  • The Home page (this page) provides a description of the entities involved in the studies.
    Please take the time to read this page in its entirety to understand the data contained here.
  • The Search page provides a means to browse the available data, filter by various criteria, and download data or order iPSC lines.
  • The Analyze page provides interactive visualizations to interrogate the data at a high level.

This data portal is written with scientists in mind. For a general introduction to our effort, visit the AnswerALS home page.

The development of this portal is generously supported by AnswerALS partners and donors.


Consortia

NeuroLINCS is an NIH-funded collaborative effort between research groups with expertise in iPSC technology, disease modeling, OMICS methods, and computational biology. We seek to understand the causes of neurological diseases and to develop new therapies.
— more at neurolincs.org

The NeuroLINCS consortium ran a pilot study of 20 iPSC lines, with a focus on c9orf72 and other known ALS signatures, conducting experiments to test motor-neuron differentiation protocols, to lay the groundwork for the main AnswerALS study.

Our mission is to build the most comprehensive clinical, genetic, molecular & biochemical assessment of ALS, while openly sharing the results with the global research community.
— more at answerals.org

The AnswerALS consortium is an ongoing effort to collect comprehensive datasets characterizing the biology of 1000 ALS patients. Find the status of our progress here.


Patients

AnswerALS patients were recruited at clinics across the United States by the AnswerALS team of clinicians & scientists.

Clinical data for patients enrolled by AnswerALS were collected via NeuroBANK which provides uniform and comprehensive "deep phenotyping" for each patient. These records are de-identified and pre-processed by the NeuroBANK team, and further trimmed for inclusion in this portal. The clinical information for NeuroLINCS patients is less comprehensive.

All patients are identified by an AnswerALS ID (prefixed AALS).


Samples

iPSCs

Induced Pluripotent Stem Cells (iPSCs) are generated for each patient recruited by each consortium by the iPSC team at Cedars-Sinai.
Our iPSC technology is broadly described here and further details can be found here. Each iPSC line is identified by a Cedars ID (prefixed CS) which is associated with a patient NeuroGUID.

Whole Genomes

The New York Genome Center's Center for Genomics of Neurodegenerative Disease (NYGC-CGND) performs whole-genome sequencing for all AnswerALS patients. These datasets are designated by a CGND ID (prefixed CGND) which is associated with a patient NeuroGUID.


Experiments

Patient-derived iPSCs have been used in 6 sequential experiments, designed to simultaneously explore ALS biology and motor-neuron differentiation protocols.

This experiment was conducted as part of NeuroLINCS, to lay the groundwork for the AnswerALS study. The NeuroLINCS experiments profiled 20 subjects: 4 C9orf72-ALS, 4 SOD1-ALS, 4 Sporadic-ALS, 4 SMA and 4 control subjects. Whole Genome Sequencing (WGS) and iPSC generation were conducted for each subject.

This experiment's purpose (Exp 1: "iPSC") was to establish a molecular baseline for the cell lines used to generate motor neurons, as a first step towards using iPSC-derived neurons for mechanistic studies of disease.

This experiment was conducted as part of NeuroLINCS, to lay the groundwork for the AnswerALS study. The NeuroLINCS experiments profiled 20 subjects: 4 C9orf72-ALS, 4 SOD1-ALS, 4 Sporadic-ALS, 4 SMA and 4 control subjects. Whole Genome Sequencing (WGS) and iPSC generation were conducted for each subject.

This experiment's purpose was to explore differences

In this experiment (Exp 2: "iMN"), a two-step differentiation process of expansion into motor neurons was used. iPSC lines were first cultured to create intermediate progenitor populations termed iMPS (iPSC-derived Motor Neuron Precursor Spheres). The iMPS were then differentiated into motor neurons termed iMNs (iMPS derived Motor Neurons). Details about the protocol can be found here.

The purpose of this experiment was to comprehensively characterize a small group of SMA, C9orf72, and control iMNs, in search of the molecular signatures of disease, when the causal mutation is known.

This experiment was conducted as part of NeuroLINCS, to lay the groundwork for the AnswerALS study. The NeuroLINCS experiments profiled 20 subjects: 4 C9orf72-ALS, 4 SOD1-ALS, 4 Sporadic-ALS, 4 SMA and 4 control subjects. Whole Genome Sequencing (WGS) and iPSC generation were conducted for each subject.

In this experiment (Exp 3: "diMN18"), motor neurons were derived from from iPSCs using a shorter, “direct” differentiation protocol. This protocol produced a higher proportion of motor neurons relative to the indirect iMN method. Motor neurons generated using this direct differentiation are designated as diMNs. Details about the protocol can be found here.

The purpose of this experiment was to contrast the earliest molecular signatures of C9-ALS, SOD1-ALS, and Sporadic ALS versus control.


Assays

Several 'omics assays are performed to characterize each patients' biology. Our multi-omic approach is described here.

Whole Genome Sequencing (WGS) is performed by the New York Genome Center's Center for Genomics of Neurodegenerative Disease (NYGC-CGND).

Sequencing is performed at NYGC on illumina HiSeq X Tens, mostly PCR-free, mostly from whole blood samples. Reads are aligned to hg38 via BWA-MEM, and variants called via GATK 3.5 best practices. Details about the WGS workflow are available here.

ATAC-Seq is performed to assess the chromatin accessibility of the iPSC-derived motor neurons at the Fraenkel Lab at MIT.

Sequencing is performed at MIT on illumina Nextseqs. Reads are aligned to hg38 via Bowtie2, and peaks are called with MACS2. The ATAC-Seq bioinformatic workflow is available in our docker container. Details about the ATAC-Seq workflow available here.

Bulk RNA-Seq is performed to assess the transcriptional profiles of the iPSC-derived motor neurons at the Thompson Lab at UCI.

Sequencing is performed at UCI on illumina HiSeq 2500 / 4000. Reads are aligned to hg38 via HISAT2, and reads falling within genes are counted by featureCounts. The RNA-Seq bioinformatic workflow is available in our docker container. Details about the RNA-Seq workflow available here.

Data-Independent Acquisition (DIA) Mass Spectrometry (in particular, SWATH-MS) is performed to assess the proteomes of the iPSC-derived motor neurons at the Van Eyk Lab at Cedars-Sinai.

SWATH-MS is performed at Cedars-Sinai. Details about the SWATH-MS workflow available here.


Data Levels

Datasets are provided by this portal at multiple stages in processing, which are designated as "data levels" in keeping with the taxonomy defined by The Cancer Genome Atlas (TCGA).

Level 1 data is raw, immutable data coming off an instrument (e.g. a sequencer).

AssayLevel 1
Genomics.fastq
Epigenomics.fastq
Transcriptomics.fastq
Proteomics.wiff

Level 2 data is raw data mapped against the appropriate reference.

AssayLevel 1Level 2
Genomics.fastq.cram
Epigenomics.fastq.bam
Transcriptomics.fastq.bam
Proteomics.wiff.mzML

Level 3 data the most processed form of patient-specific data.

AssayLevel 1Level 2Level 3
Genomics.fastq.cram.raw.vcf (i.e. gVCF)
Epigenomics.fastq.bam.narrowPeaks
Transcriptomics.fastq.bam.tsv (genes by samples)
Proteomics.wiff.mzML.tsv (proteins by samples)

Level 4 data is attained from the joining of a cohort of patients' level 3 data from a particular assay.

AssayLevel 1Level 2Level 3Level 4
Genomics.fastq.cram.raw.vcf.vcf (joint call)
Epigenomics.fastq.bam.narrowPeaks.bed (diff. regions)
Transcriptomics.fastq.bam.tsv.tsv (diff. genes)
Proteomics.wiff.mzML.tsv.tsv (diff. proteins)

Level 5 data is attained from the integration of level 4 datasets across omics assays. Level 5 data represents the knowledge ultimately resulting from the experiment.