Bioinformatics scientist Interview Questions

Given a string for four alphabets (ACGT), how would you find the count of all possible 6-mers that exist within the string? What is a heap? What are your hobbies? What would be the size of a feature vector used as an input for a 3D facial recognition algorithm?

Bioinformatics Scientist

Interviewed at Human Longevity

2.8★

Jun 26, 2015

Given a string for four alphabets (ACGT), how would you find the count of all possible 6-mers that exist within the string? What is a heap? What are your hobbies? What would be the size of a feature vector used as an input for a 3D facial recognition algorithm?

How would you find the homologous sequences to a given sequence?

Bioinformatics Scientist

Interviewed at Sequenom

3.3★

Feb 17, 2017

How would you find the homologous sequences to a given sequence?

Describe your previous experiences etc

Bioinformatics Scientist

Interviewed at Inivata

3.6★

Jan 11, 2021

Describe your previous experiences etc

A strange disease has spread across the land, many people seem to be affected in a way that is yet to be understood: when they are in daylight, odd looking marks appears on their skin that appear like burning tissue. A drug company trying to understand this disease's mechanism of action sent data over to us. They took normal and lesion skin biopsies from healthy and disease individuals respectively, and performed whole genome RNA-seq profiling in order to identify and understand the disease at the gene expression level. Analysis workflow Load the data into R and make sure the count and annotation data are consistent with each other. Filter the count data for lowly-expressed genes, using the strategy of your choice. For example: only keep genes with a CPM >= 1 in at least 75% samples, in at least one of the groups. Assign the library-size normalized log-CPM data into an object from a suitable data structure/class. Save it as a binary file (.rda or .rds). Generate basic plots of your choice to investigate its main properties and comment (library sizes, expression distribution densities per sample, PCA colored per group, etc.). Based on the previous plots, look for the presence of outlier/mislabeled samples in this dataset. Try to identify and remove them from the downstream analysis. Run a differential expression analysis to find genes whose expression is different in lesion vs. normal samples. This can be done according to your preference either on the count data or the normalized log-CPM data, using an appropriate statistical method. Generate a volcano plot (x-axis is the effect size and y-axis is the p-value) for this analysis. The selected 100 most significant genes should be colored. Re-write step 6. by wrapping it up into a single function that you implement -- and document: arguments: the expression data, the sample annotations and the name of the group variable return value: a data.frame of statistics of differential expression. (bonus) Write a function that identifies the outlier(s) based on the expression data and group variable only. Pointers Installing Bioconductor For a quick introduction to RNA-seq data in limma user guide - Section 15 Differential expression analysis: with limma: limma user guide - Section 16 with DESeq2 ExpressionSet class: Video introduction Class description

Bioinformatics Scientist

Interviewed at CytoReason

4.2★

Mar 31, 2022

A strange disease has spread across the land, many people seem to be affected in a way that is yet to be understood: when they are in daylight, odd looking marks appears on their skin that appear like burning tissue. A drug company trying to understand this disease's mechanism of action sent data over to us. They took normal and lesion skin biopsies from healthy and disease individuals respectively, and performed whole genome RNA-seq profiling in order to identify and understand the disease at the gene expression level. Analysis workflow Load the data into R and make sure the count and annotation data are consistent with each other. Filter the count data for lowly-expressed genes, using the strategy of your choice. For example: only keep genes with a CPM >= 1 in at least 75% samples, in at least one of the groups. Assign the library-size normalized log-CPM data into an object from a suitable data structure/class. Save it as a binary file (.rda or .rds). Generate basic plots of your choice to investigate its main properties and comment (library sizes, expression distribution densities per sample, PCA colored per group, etc.). Based on the previous plots, look for the presence of outlier/mislabeled samples in this dataset. Try to identify and remove them from the downstream analysis. Run a differential expression analysis to find genes whose expression is different in lesion vs. normal samples. This can be done according to your preference either on the count data or the normalized log-CPM data, using an appropriate statistical method. Generate a volcano plot (x-axis is the effect size and y-axis is the p-value) for this analysis. The selected 100 most significant genes should be colored. Re-write step 6. by wrapping it up into a single function that you implement -- and document: arguments: the expression data, the sample annotations and the name of the group variable return value: a data.frame of statistics of differential expression. (bonus) Write a function that identifies the outlier(s) based on the expression data and group variable only. Pointers Installing Bioconductor For a quick introduction to RNA-seq data in limma user guide - Section 15 Differential expression analysis: with limma: limma user guide - Section 16 with DESeq2 ExpressionSet class: Video introduction Class description

How would you analyze the RNA seq data of human?

Senior Bioinformatics Scientist

Interviewed at CytoReason

4.2★

Aug 9, 2022

How would you analyze the RNA seq data of human?

Where do you see yourself in next five years

Bioinformatics Scientist

Interviewed at Rothamsted Research

3.5★

Dec 19, 2017

Where do you see yourself in next five years

About SQL and Molecular biology

Bioinformatics Scientist

Interviewed at Aganitha Cognitive Solutions

4.1★

Jul 22, 2022

About SQL and Molecular biology

General Biology and drug discovery questions (depending on experience) Hackerrank-Style coding questions.

Data Scientist, Bioinformatics

Interviewed at BenevolentAI

3.6★

Jan 17, 2022

General Biology and drug discovery questions (depending on experience) Hackerrank-Style coding questions.

I did not expect to be asked about sorting algorithms.

Bioinformatics Scientist

Interviewed at Illumina

3.4★

Sep 28, 2012

I did not expect to be asked about sorting algorithms.

All sorts of questions about Binary search trees, Dynamic programming, Machine Learning and Probabilistic Modelling approaches ( explain MCMC, EM, etc.)

Bioinformatics Scientist I

Interviewed at Illumina

3.4★

Oct 15, 2015

All sorts of questions about Binary search trees, Dynamic programming, Machine Learning and Probabilistic Modelling approaches ( explain MCMC, EM, etc.)

Bioinformatics Scientist Interview Questions

236 bioinformatics scientist interview questions shared by candidates

See Interview Questions for Similar Jobs