Openings

Please contact Paul Boutros (paul [dot] boutrosatoicr [dot] on [dot] ca) if interested in any of the lab openings below.

Software Engineer, Bioinformatics

We are looking for software engineers who want to solve the computationally complex problems we encounter in the search for the cure for cancer. We are a global leader in transforming big datasets into a meaningful understanding of cancer. Our work helps to improve the treatment and care for thousands of cancer patients across the world.

The data we deal with is truly huge: individual datasets are hundreds of terabytes, and can include numerical, textual and imaging data. The team is incredibly diverse, with biologists, statisticians, software engineers, bioinformaticians, mathematicians, computer scientists and machine learning experts working side-by-side toward the common goal of improving the outcomes of cancer patients.

Our software engineers are passionate about applying their knowledge of software development and design to improve scientific research. They develop scalable and distributed software solutions that maximize utilization of our ~10,000 CPU computing cluster and address the challenges of storing and accessing petabytes of genomic data.

Use your technical skills to tackle one of the hardest problems of our time in a world-leading research institute.

Responsibilities May Include:

  • Low-level optimization and parallelization
  • Development of computational pipelines for analyzing big data
  • Building and maintaining core infrastructure (e.g. build systems, automated code-quality assessment)
  • Database schema design and management
  • Web design and development
  • Systems administration
  • Applied machine learning
  • Cloud-based workflow development and deployment
  • Open-source software development and maintenance
  • Mentorship of junior software engineers and co-op students
  • Training of biologists and statisticians in software engineering fundamentals

Preferred Qualifications:

  • Bachelor's degree or further training in Software Engineering, Computer Science or Computer/Electrical Engineering
  • 3-5 years of experience
  • Excellent implementation skills (high-level and low-level languages) a must
  • Experience with all or most of the following: Perl, R, C, C++, Python
  • Knowledge of UNIX/Linux environments
  • Experience with version control systems (SVN, Git, Mercurial, etc)
  • Beneficial skills include:
      1. systems software
      2. algorithm design and analysis
      3. performance optimization
      4. SGE or other job scheduling systems
      5. experience with Amazon Web Services, Google Cloud Engine or other clouds
      6. distributed programming
      7. prior exposure to molecular biology, biochemistry or bioinformatics
      8. prior exposure to machine learning techniques
      9. RDBMS skills (e.g. PostgreSQL)

    Postdoctoral Fellow, Computational Cancer Biology

    Involves the analysis of data from two of the largest cancer genomics projects in the world: the ICGC pan-cancer study (incorporating whole-genome sequences of >2500 tumours) and the Canadian Prostate Cancer Genome Network (CPC-GENE), the largest prostate cancer genomics project in the world. CPC-GENE is characterizing the genomes and transcriptomes of 500 prostate cancer patients. We have generated multiple independent genomic analyses, including RNA-seq and whole genome-sequencing (PacBio and Illumina Hi-Seq). Rich clinical annotation is also available on these samples, and the successful candidate will work in a collaborative team of biologists, computer scientists, and statisticians on the integrative analysis of all these datasets. Projects can include aspects of methods development, pre-processing and univariate statistical analyses, integrative systems biology, machine-learning and clinically-relevant biomarker-discovery. Analysis of these data will be performed on cutting-edge IT infrastructure including a multi-petabyte storage system and a >8,500-core compute farm.

    For representative projects please consider the following references:

    NBN gain is predictive for adverse outcome following image-guided radiotherapy for localized prostate cancer
    Berlin A, Lalonde E, Sykes J, Zafarana G, Chu KC, Ramnarine VR, Ishkanian A, Sendorek DH, Pasic I, Lam WL, Jurisica I, van der Kwast T, Milosevic M, Boutros PC, Bristow RG.
    Oncotarget. 2014 Aug 27. [Epub ahead of print] PMID: 25415046 [PubMed - as supplied by publisher]

    Hotspot activating PRKD1 somatic mutations in polymorphous low-grade adenocarcinomas of the salivary glands
    Weinreb I, Piscuoglio S, Martelotto LG, Waggott D, Ng CK, Perez-Ordonez B, Harding NJ, Alfaro J, Chu KC, Viale A, Fusco N, da Cruz Paula A, Marchio C, Sakr RA, Lim R, Thompson LD, Chiosea SI, Seethala RR, Skalova A, Stelow EB, Fonseca I, Assaad A, How C, Wang J, de Borja R, Chan-Seng-Yue M, Howlett CJ, Nichols AC, Wen YH, Katabi N, Buchner N, Mullen L, Kislinger T, Wouters BG, Liu FF, Norton L, McPherson JD, Rubin BP, Clarke BA, Weigelt B, Boutros PC, Reis-Filho JS.
    Nat Genet. 2014 Nov;46(11):1166-9. doi: 10.1038/ng.3096. Epub 2014 Sep 21.

    Bioinformatician, Data Scientist

    Are you an expert in machine-learning and analyzing huge datasets? Do you admire the work done at fivethirtyeight.com? Do you enter kaggle or DREAM contests for fun? Do you want to impact the care of cancer patients? The Boutros Lab is looking for somebody with these skills to help us analyze some of the largest datasets in the world. We are sequencing the complete genomes of thousands of cancer patients, and need detail-minded, innovative Data Scientists to help us identify the key, clinically-relevant trends within them. Experience with cancer or genomics is certainly a benefit, but top-tier machine-learning and data science skills are critical and we will train highly-qualified individuals. This is not primarily an algorithm-development role; we are looking for a strong skillset in applying existing techniques, although new method development is sometimes needed.

    The datasets you would be studying can include (amongst others):

    • The Canadian Prostate Cancer Genome Network (CPC-GENE), the largest prostate cancer genomics study in the world, sequencing the whole genomes, methylomes and transcriptomes of 500 prostate cancers (up to 7 billion data-points per individual)
    • The ICGC pan-cancer initiative, performing a meta-analysis of >2500 tumour WGS
    • The ICGC-TCGA DREAM Somatic Mutation Calling Challenge, a global crowd-sourcing challenge aimed at standardizing NGS analysis methods for cancer data

    Your colleagues will be experts in cancer genome data-analysis and machine-learning, with a track-record of publications in the highest impact journals. It is an inter-disciplinary and integrated team of bioinformaticians, software-engineers and statisticians. There are people here at all levels of training from full-time staff (at the Bachelors, Master’s and PhD levels) to trainees (from undergraduate to post-doctoral). Software development is professional, with test-driven development and continuous integration, and we have a >8000 CPU computing cluster to support our research.

    This is a unique opportunity to develop your strong machine-learning skillset while making a major impact on our understanding and treatment of cancer. Join us in studying unique datasets that will help rewrite the textbook of cancer genomics.

    Qualifications (Essential)

    • BSc or equivalent education in computational biology, engineering, mathematics, computer sciences or molecular biology. Graduate degrees are a strong asset.
    • Strong applied experience in both machine-learning and general statistics is essential (i.e. documented applied analysis in grad-school or to real-world problems). Candidates are encouraged to submit documentation of their results on kaggle, DREAM or other equivalent Challenges, or blog entries outlining their analysis experience.
    • Strong R skills
    • Demonstrated experience visualizing complex data

    Qualifications (Desired)

    • Experience with cancer or genomics is a major asset, but we will train strong candidates
    • Experience working with test-driven programming and version control
    • Programming skills in perl/python/C
    • Strong communication skills (both verbal and written)
    • Previous experience with big-data problems (at a minimum thousands of variables, millions of records)

    Bioinformatician, Next-Generation Sequencing

    Are you an expert in analyzing next-generation sequencing data? The Boutros Lab is seeking talented Bioinformaticians to analyze some of the largest clinical NGS datasets in the world. We want somebody with a deep interest in big-data problems and an in-depth knowledge of NGS data. While experience with cancer NGS is preferred, a strong skillset is most important, and model-organism work is perfectly appropriate. This is a role for people with strong applied skills, and the ideal candidate will be able to move from IGV read-level views through to developing machine-learning ensemble models for variant-detection.

    The datasets you would be studying can include (amongst others):

    • The Canadian Prostate Cancer Genome Network (CPC-GENE), the largest prostate cancer genomics study in the world, sequencing the whole genomes, methylomes and transcriptomes of 500 prostate cancers
    • The ICGC pan-cancer initiative, performing a meta-analysis of >2500 tumour WGS
    • The ICGC-TCGA DREAM Somatic Mutation Calling Challenge, a global crowd-sourcing challenge aimed at standardizing NGS analysis methods for cancer data

    Your colleagues will be experts in NGS data-analysis and machine-learning, with a track-record of publications in the highest impact journals. It is an inter-disciplinary and integrated team of bioinformaticians, software-engineers and statisticians. There are people here at all levels of training from full-time staff (at the Bachelors, Master’s and PhD levels) to trainees (from undergraduate to post-doctoral). Software development is professional, with test-driven development and continuous integration, and we have a >8000 CPU computing cluster to support our research.

    This is a unique opportunity to develop your strong computational skillset and to make a major impact on our understanding of cancer. Join us in studying unique datasets that will help rewrite the textbook of cancer genomics.

    Qualifications (Essential)

    • BSc or equivalent education in computational biology, engineering, mathematics, computer sciences or molecular biology. Graduate degrees are a strong asset.
    • Extensive exposure to NGS data is mandatory (i.e. grad-school or real-world experience)
    • Strong background with unix/linux tools and administration
    • Solid programming skills, preferrably in R/perl, required

    Qualifications (Desired)

    • Background in machine-learning is a strong asset, but we will train for strong candidates
    • Background in biostatistics, particularly survival or Bayesian techniques
    • Experience working with test-driven programming and version control
    • Strong communication skills
    • Data-Visualization capabilities