STIN300 Statistical Programming in R
Credits (ECTS):5
Course responsible:Hilde Vinje
Campus / Online:Taught campus Ås
Teaching language:Engelsk, norsk
Limits of class size:150
Course frequency:Annually
Nominal workload:Lectures/exercises 60 hours. Individual studies 65 hours.
Teaching and exam period:This course starts in the January block. This course has teaching/evaluation in the January block
About this course
In this intensive course you will use the R programming language to apply your statistical skills to scientific data. You should have prior programming experience or else be prepared to put in a lot of effort, see "recommended prerequisites".
Most course participants are MSc or PhD students and thus have chosen a research topic. Use your actual data if you can, or else ask your supervisor for an illustrative sample of similar data. If you haven't chosen a research topic yet, you may borrow someone else's data.
You will write an R markdown report which covers the following:
- Introduction: Describe the real-world phenomenon that you study and concisely state your research question. Briefly describe the origin of their data, making clear how the measurements relate to your real-world topic.
- Data import: Use R to get your data from the file(s) into R data structures. Convert data types if necessary, so that numbers are not misinterpreted as text, categorical variables are coded as R "factors", TRUE/FALSE values are represented as such, etc.
- Outline of data structure: Use R to explore and describe the size and structure of your data (number of variables, number of samples, data types, what possible values the categorical variables can take, etc.).
- Data visualization: Design and implement at least one data graphic that provides a useful overview of your data or answers a research question. Describe in words what the data shows and interpret what it means.
- Statistical analysis: Choose a suitable statistical model or procedure that clarifies some pattern or relationship relevant to your research question. Implement it using R. Translate the results back to real-world terms. Discuss what the results mean.
The resulting report will be fully reproducible with executable code. It should be a helpful starting point for your future work, and will facilitate discussions with your supervisor.
Think of all course activities as leading towards this final assignment. You can learn from free online textbooks, daily tutorial documents with some screencasts, and by asking good questions in Discussions on the Canvas course page.
The tutorial documents comprise an introduction to R scripting, with focus on the use of the tidyverse packages ggplot2 and dplyr. Emphasis is on visualization and structuring and manipulation of data in a table format. We will also focus on topics like operators, variables, data types and basic data structures, control structures (loops, conditionals), more general handling of files and text, and user-defined functions.
Learning outcome
Upon completion of the course the students should be capable of performing statistical analyses using a programming approach in R. The students should be able to visualize and manipulate data and make their own functions utilizing/modifying available functions in order to solve specific statistical problems. The students should also be able to present the output from statistical analyses in an accessible and scientific form using text and graphics.
KNOWLEDGE: Students will acquire
- an understanding of how programming can automate demanding statistical computations.
- a working knowledge of concepts, syntax and conventions for describing, fitting and interpreting statistical models in R.
SKILLS: Students will be able to
- interpret output from R's functions for statistical modelling, such as lm().
- read in data from various file formats including Excel, comma-separated text, and FASTA.
- develop their own functions which use existing functions, to solve nontrivial challenges more efficiently than by nonstructured programming.
- present results of statistical analysis in a scientific, clear form through reproducible, executable reports which weave together expository text, program code, and output such as tables and graphics.
- troubleshoot problems by locating errors, reproducing them on a small subset of the data, step through code line by line, etc.
- orient themselves in documentation for R packages that implements statistical methods the student knows.
GENERAL COMPETENCES: Students will be well prepared to apply statistical methods in R on datasets they encounter in later studies and working life. This includes loading data into R, transforming it to a structure that the analysis function can use, run analyses with appropriate settings, and interpret and present the results in a form that is useful to the end user.
Learning activities
Teaching support
Prerequisites
Recommended prerequisites
Assessment method
Examiner scheme
Mandatory activity
Notes
Teaching hours
Preferential right
Admission requirements