2013年11月8日星期五

List of Bioinformatics Workshops and Training Resources

http://gettinggeneticsdone.blogspot.com/search?updated-max=2013-05-15T09:39:00-05:00&max-results=8&start=8&by-date=false



List of Bioinformatics Workshops and Training Resources

I frequently get asked to recommend workshops or online learning resources for bioinformatics, genomics, statistics, and programming. I compiled a list of both online learning resources and in-person workshops (preferentially highlighting those where workshop materials are freely available online):

List of Bioinformatics Workshops and Training Resources

I hope to keep the page above as up-to-date as possible. Below is a snapshop of what I have listed as of today. Please leave a comment if you're aware of any egregious omissions, and I'll update the page above as appropriate.

From http://stephenturner.us/p/edu, April 4, 2013

In-Person Workshops:

Cold Spring Harbor Courses: meetings.cshl.edu/courses.html

Cold Spring Harbor has been offering advanced workshops and short courses in the life sciences for years. Relevant workshops include Advanced Sequencing Technologies & ApplicationsComputational & Comparative GenomicsProgramming for BiologyStatistical Methods for Functional Genomics, the Genome Access Course, and others. Unlike most of the others below, you won't find material from past years' CSHL courses available online.

Canadian Bioinformatics Workshops: bioinformatics.ca/workshops
Bioinformatics.ca through its Canadian Bioinformatics Workshops (CBW) series began offering one and two week short courses in bioinformatics, genomics and proteomics in 1999. The more recent workshops focus on training researchers using advanced high-throughput technologies on the latest approaches being used in computational biology to deal with the new data. Course material from past workshops is freely available online, including both audio/video lectures and slideshows. Topics include microarray analysisRNA-seq analysis, genome rearrangements, copy number alteration,network/pathway analysis, genome visualization, gene function prediction, functional annotation, data analysis using R, statistics for metabolomics, and much more.

UC Davis Bioinformatics Training Program: training.bioinformatics.ucdavis.edu
The UC Davis Bioinformatics Training program offers several intensive short bootcamp workshops on RNA-seq, data analysis and visualization, and cloud computing with a focus on Amazon's computing resources. They also offer a week-long Bioinformatics Short Course, covering in-depth the practical theory and application of cutting-edge next-generation sequencing techniques. Every course's documentation is freely available online, even if you didn't take the course.

MSU NGS Summer Course: bioinformatics.msu.edu/ngs-summer-course-2013
This intensive two week summer course will introduce attendees with a strong biology background to the practice of analyzing short-read sequencing data from Illumina and other next-gen platforms. The first week will introduce students to computational thinking and large-scale data analysis on UNIX platforms. The second week will focus on mapping, assembly, and analysis of short-read data for resequencing, ChIP-seq, and RNAseq. Materials from previous courses are freely available online under a CC-by-SA license.

Genetic Analysis of Complex Human Diseases: hihg.med.miami.edu/edu...
The Genetic Analysis of Complex Human Diseases is a comprehensive four-day course directed toward physician-scientists and other medical researchers. The course will introduce state-of-the-art approaches for the mapping and characterization of human inherited disorders with an emphasis on the mapping of genes involved in common and genetically complex disease phenotypes. The primary goal of this course is to provide participants with an overview of approaches to identifying genes involved in complex human diseases. At the end of the course, participants should be able to identify the key components of a study team, and communicate effectively with specialists in various areas to design and execute a study. The course is in Miami Beach, FL. (Full Disclosure: I teach a section in this course.) Most of the course material from previous years is not available online, but my RNA-seq & methylation lectures are on Figshare.

UAB Short Course on Statistical Genetics and Genomics: soph.uab.edu/ssg/...
Focusing on the state-of-art methodology to analyze complex traits, this five-day course will offer an interactive program to enhance researchers' ability to understand & use statistical genetic methods, as well as implement & interpret sophisticated genetic analyses. Topics include GWAS Design/Analysis/Imputation/Interpretation; Non-Mendelian Disorders Analysis; Pharmacogenetics/Pharmacogenomics; ELSI; Rare Variants & Exome Sequencing; Whole Genome Prediction; Analysis of DNA Methylation Microarray Data; Variant Calling from NGS Data; RNAseq: Experimental Design and Data Analysis; Analysis of ChIP-seq Data; Statistical Methods for NGS Data; Discovering new drugs & diagnostics from 300 billion points of data. Video recording from the 2012 course are available online.

MBL Molecular Evolution Workshop: hermes.mbl.edu/education/...
One of the longest-running courses listed here (est. 1988), the Workshop on Molecular Evolution at Woods Hole presents a series of lectures, discussions, and bioinformatic exercises that span contemporary topics in molecular evolution. The course addresses phylogenetic analysis, population genetics, database and sequence matching, molecular evolution and development, and comparative genomics, using software packages including AWTY, BEAST, BEST, Clustal W/X, FASTA, FigTree, GARLI, MIGRATE, LAMARC, MAFFT, MP-EST, MrBayes, PAML, PAUP*, PHYLIP, STEM, STEM-hy, and SeaView. Some of the course materials can be found by digging around the course wiki.


Online Material:


Canadian Bioinformatics Workshops: bioinformatics.ca/workshops
(In person workshop described above). Course material from past workshops is freely available online, including both audio/video lectures and slideshows. Topics include microarray analysisRNA-seq analysis, genome rearrangements, copy number alteration, network/pathway analysis, genome visualization, gene function prediction, functional annotation, data analysis using R, statistics for metabolomics, andmuch more.

UC Davis Bioinformatics Training Program: training.bioinformatics.ucdavis.edu
(In person workshop described above). Every course's documentation is freely available online, even if you didn't take the course. Past topics include Galaxy, Bioinformatics for NGS, cloud computing, and RNA-seq.

MSU NGS Summer Course: bioinformatics.msu.edu/ngs-summer-course-2013
(In person workshop described above). Materials from previous courses are freely available online under a CC-by-SA license, which cover mapping, assembly, and analysis of short-read data for resequencing, ChIP-seq, and RNAseq.

EMBL-EBI Train Online: www.ebi.ac.uk/training/online
Train online provides free courses on Europe's most widely used data resources, created by experts at EMBL-EBI and collaborating institutes. Topics include Genes and GenomesGene Expression,Interactions, Pathways, and Networks, and others. Of particular interest may be the Practical Course on Analysis of High-Throughput Sequencing Data, which covers Bioconductor packages for short read analysis, ChIP-Seq, RNA-seq, and allele-specific expression & eQTLs.

UC Riverside Bioinformatics Manuals: manuals.bioinformatics.ucr.edu
This is an excellent collection of manuals and code snippets. Topics include Programming in RR+BioconductorSequence Analysis with R and BioconductorNGS analysis with Galaxy and IGV, basicLinux skills, and others.

Software Carpentry: software-carpentry.org
Software Carpentry helps researchers be more productive by teaching them basic computing skills. We recently ran a 2-day Software Carpentry Bootcamp here at UVA. Check out the online lectures for some introductory material on Unix, Python, Version Control, Databases, Automation, and many other topics.

Coursera: coursera.org/courses
Coursera partners with top universities to offer courses online for anytone to take, for free. Courses are usually 4-6 weeks, and consist of video lectures, quizzes, assignments, and exams. Joining a course gives you access to the course's forum where you can interact with the instructor and other participants. Relevant courses include Data AnalysisComputing for Data Analysis using R, and Bioinformatics Algorithms, among others. You can also view all of Jeff Leek's Data Analysis lectures on Youtube.
Rosalind: http://rosalind.info
Quite different from the others listed here, Rosalind is a platform for learning bioinformatics through gaming-like problem solving. Visit the Python Village to learn the basics of Python. Arm yourself at theBioinformatics Armory, equipping yourself with existing ready-to-use bioinformatics software tools. Or storm the Bioinformatics Stronghold, implementing your own algorithms for computational mass spectrometry, alignment, dynamic programming, genome assembly, genome rearrangements, phylogeny, probability, string algorithms and others.


Other Resources:


  • Titus Brown's list bioinformatics courses: Includes a few others not listed here (also see the comments).
  • GMOD Training and Outreach: GMOD is the Generic Model Organism Database project, a collection of open source software tools for creating and managing genome-scale biological databases. This page links out to tutorials on GMOD Components such as Apollo, BioMart, Galaxy, GBrowse, MAKER, and others.
  • Seqanswers.com: A discussion forum for anything related to Bioinformatics, including Q&A, paper discussions, new software announcements, protocols, and more.
  • Biostars.org: Similar to SEQanswers, but more strictly a Q&A site.
  • BioConductor Mailing list: A very active mailing list for getting help with Bioconductor packages. Make sure you do some Google searching yourself first before posting to this list.
  • Bioconductor Events: List of upcoming and prior Bioconductor training and events worldwide.
  • Learn Galaxy: Screencasts and tutorials for learning to use Galaxy.
  • Galaxy Event Horizon: Worldwide Galaxy-related events (workshops, training, user meetings) are listed here.
  • Galaxy RNA-Seq Exercise: Run through a small RNA-seq study from start to finish using Galaxy.
  • Rafael Irizarry's Youtube Channel: Several statistics and bioinformatics video lectures.
  • PLoS Comp Bio Online Bioinformatics Curriculum: A perspective paper by David B Searls outlining a series of free online learning initiatives for beginning to advanced training in biology, biochemistry, genetics, computational biology, genomics, math, statistics, computer science, programming, web development, databases, parallel computing, image processing, AI, NLP, and more.
  • Getting Genetics Done: Shameless plug – I write a blog highlighting literature of interest, new tools, and occasionally tutorials in genetics, statistics, and bioinformatics. I recently wrote this post about how to stay current in bioinformatics & genomics.

A Mitochondrial Manhattan Plot

A Mitochondrial Manhattan Plot




Lior Pachter's lab

http://math.berkeley.edu/~lpachter/software.html

Software developed in the Pachter group and still under active development in the group
  • eXpress (2012) Streaming quantification for high-throughput sequencing.
  • SysCall (2011) Distinguishing heterozygous sites from systematic error in high-thoughput sequenced reads
  • Cufflinks (2010) Transcript assembly and abundance estimation for RNA-Seq (now a joint effort together with Cole Trapnell and the John Rinn Lab at Harvard University)
  • MetMap (2010) Analysis of Methyl-Seq experiments
Software developed in the Pachter group but now maintained/developed elsewhere
  • ReadSpy (2012) Assessment of uniformity in RNA-Seq reads (now supported by Valerie Hower and her group at the University of Miami)
  • TopHat (2009) Splice junction mapper for short RNA-seq reads (now supported by Steven Salzberg and his group at Johns Hopkins University)
  • FSA (2009) Fast Statistical Alignment (now supported by Robert Bradley and his group at FHCRC)
  • MERCATOR (2004) Homology mapping (now supported by Colin Dewey and his group at the University of Wisconsin)
  • VISTA (2000) Visualization tool for global alignments (now supported by Inna Dubchak and her group at the JGI)
Retired Software
These programs, originally developed in the Pachter group, are no longer under active development and are not being supported.
  • AMAP (2007) Protein multiple alignment (recommended instead: FSA)
  • GENEMAPPER (2006) Reference based gene annotation (recommended instead: an RNA-Seq experiment)
  • MJOIN (2006) Neighbor joining with subtree weights (archived here)
  • PARALIGN (2006) Alignment polytope construction (archived here)
  • SLIM (2003) Minimum network design for optimizing the search space for pair hidden Markov models (archived here)
  • SLAM (2003) Pairwise simultaneous alignment and gene finding (recommended instead: an RNA-Seq experiment)
  • MAVID (2003) Multiple alignment of large genomic sequences (recommended instead: FSA)
################################################################################
Submitted
L. Pachter, Models for transcript quantification from RNA-Seq, submitted.
In press
A. Roberts, L. Schaeffer and L. Pachter, Updating RNA-Seq analyses after re-annotation, in press.
M. Singer and L. Pachter, Bayesian networks in the study of genomewide DNA methylation, in press.
2013
A. Rahman and L. Pachter, CGAL: computing genome assembly likelihoods, Genome Biology, 14 (2013), R8.
2012
C. Trapnell, D.G. Hendrickson, M. Sauvageau, L. Goff, J.L. Rinn and L. Pachter, Differential analysis of gene regulation at transcript resolution with RNA-seq, Nature Biotechnology, advance online publication (2012).
S.A. Mortimer, C. Trapnell, S. Aviran, L. Pachter and J.B. Lucks, SHAPE-Seq: High throughput RNA structure analysis, Current Protocols in Chemical Biology, advance online publication.
A. Kleinman, M. Harel and L. Pachter, Affine and projective tree metric theorems, Annals of Combinatorics, advance online publication (2012).
A. Roberts and L. Pachter, Streaming fragment assignment for real-time analysis of sequencing experiments, Nature Methods, advance online publication (2012).
V. Hower, R. Starfield, A. Roberts, and L. Pachter, Quantifying uniformity in mapped reads, Bioinformatics, 28 (2012), 2680--2682.
L. Pachter, A closer look at RNA editing, Nature Biotechnology, 30 (2012), 246--247.
C. Trapnell, A. Roberts, L. Goff, G. Pertea, D. Kim, D.R. Kelley, H. Pimentel, S.L. Salzberg, J.L. Rinn and L. Pachter, Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks, Nature Protocols, 7 (2012), 562--578.

SysCall - Distinguishing heterozygous sites from systematic errors

http://bio.math.berkeley.edu/SysCall/

SysCall is a logistic regression based classifier.
Given a list of candidate heterozygous genomic locations and a sam file of sequenced reads SysCall classifies each genomic location as either a heterozygous site or a systematic error and outputs according lists, along with the assigned posterior probabilities.

The submitted manuscript describing SysCall can be found here and the lists of systematic errors reported in the paper are here .
The slides from a talk on SysCall given at the 2011 CSHL Meeting on The Biology of Genomes can be found here


Manual Click here to download the SysCall manual.

Paper
http://www.biomedcentral.com/1471-2105/12/451/

PubMed Commons: One post-publication peer review forum to rule them all?

http://gettinggeneticsdone.blogspot.com/2013/10/pubmed-commons-post-publication-peer-review.html

Useful Unix/Linux One-Liners for Bioinformatics

http://gettinggeneticsdone.blogspot.com/2013/10/useful-linux-oneliners-for-bioinformatics.html

Much of the work that bioinformaticians do is munging and wrangling around massive amounts of text. While there are some "standardized" file formats (FASTQ, SAM, VCF, etc.) and some tools for manipulating them (fastx toolkit, samtools, vcftools, etc.), there are still times where knowing a little bit of Unix/Linux is extremely helpful, namely awk, sed, cut, grep, GNU parallel, and others.

This is by no means an exhaustive catalog, but I've put together a short list of examples using various Unix/Linux utilities for text manipulation, from the very basic (e.g., sum a column) to the very advanced (munge a FASTQ file and print the total number of reads, total number unique reads, percentage of unique reads, most abundant sequence, and its frequency). Most of these examples (with the exception of the SeqTK examples) use built-in utilities installed on nearly every Linux system. These examples are a combination of tactics I used everyday and examples culled from other sources listed at the top of the page.



The list is available as a README in this GitHub repo. This list is a start - I would love suggestions for other things to include. To make a suggestion, leave a comment here, or better - open an issue, or even better still - send me a pull request.

Useful one-liners for bioinformatics: https://github.com/stephenturner/oneliners

Alternatively, download a PDF here.

De Novo Transcriptome Assembly with Trinity: Protocol and Videos

http://gettinggeneticsdone.blogspot.com/2013/10/de-novo-transcriptome-assembly-trinity.html