Bücher Wenner
Vorlesetag - Das Schaf Rosa liebt Rosa
15.11.2024 um 15:00 Uhr
Population Genomics with R
von Emmanuel Paradis
Verlag: Taylor & Francis
E-Book / PDF
Kopierschutz: kein Kopierschutz


Speicherplatz: 45 MB
Hinweis: Nach dem Checkout (Kasse) wird direkt ein Link zum Download bereitgestellt. Der Link kann dann auf PC, Smartphone oder E-Book-Reader ausgeführt werden.
E-Books können per PayPal bezahlt werden. Wenn Sie E-Books per Rechnung bezahlen möchten, kontaktieren Sie uns bitte.

ISBN: 978-0-429-88243-2
Auflage: 1. Auflage
Erschienen am 05.05.2020
Sprache: Englisch
Umfang: 394 Seiten

Preis: 67,99 €

Klappentext
Biografische Anmerkung
Inhaltsverzeichnis

Population Genomics With R presents a multidisciplinary approach to the analysis of population genomics. The methods treated cover a large number of topics from traditional population genetics to large-scale genomics with high-throughput sequencing data. Several dozen R packages are examined and integrated to provide a coherent software environment with a wide range of computational, statistical, and graphical tools. Small examples are used to illustrate the basics and published data are used as case studies. Readers are expected to have a basic knowledge of biology, genetics, and statistical inference methods. Graduate students and post-doctorate researchers will find resources to analyze their population genetic and genomic data as well as help them design new studies.

The first four chapters review the basics of population genomics, data acquisition, and the use of R to store and manipulate genomic data. Chapter 5 treats the exploration of genomic data, an important issue when analysing large data sets. The other five chapters cover linkage disequilibrium, population genomic structure, geographical structure, past demographic events, and natural selection. These chapters include supervised and unsupervised methods, admixture analysis, an in-depth treatment of multivariate methods, and advice on how to handle GIS data. The analysis of natural selection, a traditional issue in evolutionary biology, has known a revival with modern population genomic data. All chapters include exercises. Supplemental materials are available on-line (http://ape-package.ird.fr/PGR.html).



Emmanuel Paradis is senior researcher in the French Institute of Research for Development (IRD). His research focuses on evolutionary models and their applications. The development and publication of software associated to his research has been an important aspect of his activities for more than twenty years. He adopted R as his main software for data analysis in 2000 and has since published and maintained several packages, including ape since 2002 and pegas since 2009. He gives regular workshops and trainings in several countries.




1. Introduction


Heredity, Genetics, and Genomics

Principles of Population Genomics

Units

Genome Structures

Mutations

Drift and Selection

R Packages and Conventions

Required Knowledge and Other Readings


2. Data Acquisition


Samples and Sampling Designs

How Much DNA in a Sample?

Degraded Samples

Sampling Designs

Low-Throughput Technologies

Genotypes From Phenotypes

DNA Cleavage Methods

Repeat Length Polymorphism

Sanger and Shotgun Sequencing

DNA Methylation and Bisulfite Sequencing

High-Throughput Technologies

DNA Microarrays

High-Throughput Sequencing

Restriction Site Associated DNA

RNA Sequencing

Exome Sequencing

Sequencing of Pooled Individuals

Designing a Study With HTS

The Future of DNA Sequencing

File Formats

Data Files

Archiving and Compression

Bioinformatics and Genomics

Processing Sanger Sequencing Data With sangerseqR

Read Mapping With Rsubread

Managing Read Alignments With Rsamtools

Simulation of High-Throughput Sequencing Data

Exercises




3. Genomic Data in R

What is an R Data Object?

Data Classes for Genomic Data

The Class "loci" (pegas)

The Class "genind" (adegenet)

The Classes "SNPbin" and "genlight" (adegenet)

The Class "SnpMatrix" (snpStats)

The Class "DNAbin" (ape)

The Classes "XString" and "XStringSet" (Biostrings)

The Package SNPRelate

Data Input and Output

Reading Text Files

Reading Spreadsheet Files

Reading VCF Files

Reading PED and BED Files

Reading Sequence Files

Reading Annotation Files

Writing Files

Internet Databases

Managing Files and Projects

Exercises



4. Data Manipulation

Basic Data Manipulation in R

Subsetting, Replacement, and Deletion

Commonly Used Functions

Recycling and Coercion

Logical Vectors

Memory Management

Conversions

Case Studies

Mitochondrial Genomes of the Asiatic Golden Cat

Complete Genomes of the Fruit Fly

Human Genomes

Influenza HN Virus Sequences

Jaguar Microsatellites

Bacterial Whole Genome Sequences

Metabarcoding of Fish Communities

Exercises



5. Data Exploration and Summaries

Genotype and Allele Frequencies

Allelic Richness

Missing Data

Haplotype and Nucleotide Diversity

The Class "haplotype"

Haplotype and Nucleotide Diversity From DNA Sequences

Genetic and Genomic Distances

Theoretical Background

Hamming Distance

Distances From DNA Sequences

Distances From Allele Sharing

Distances From Microsatellites

Summary by Groups

Sliding Windows

DNA Sequences

Summaries With Genomic Positions

Package SNPRelate

Multivariate Methods

Matrix Decomposition

Eigendecomposition

Singular Value Decomposition

Power Method and Random Matrices

Principal Component Analysis

adegenet

SNPRelate

flashpcaR

Multidimensional Scaling

Case Studies

Mitochondrial Genomes of the Asiatic Golden Cat

Complete Genomes of the Fruit Fly

Human Genomes

Influenza HN Virus Sequences

Jaguar Microsatellites

Bacterial Whole Genome Sequences

Metabarcoding of Fish Communities

Exercises



6. Linkage Disequilibrium and Haplotype Structure

Why Linkage Disequilibrium is Important?

Linkage Disequilibrium: Two Loci

Phased Genotypes

Theoretical Background

Implementation in pegas

Unphased Genotypes

More Than Two Loci

Haplotypes From Unphased Genotypes

The Expectation-Maximization Algorithm

Implementation in haplostats

Locus-Specific Imputation

Maps of Linkage Disequilibrium

Phased Genotypes With pegas

SNPRelate

snpStats

Case Studies

Complete Genomes of the Fruit Fly

Human Genomes

Jaguar Microsatellites

Exercises



7. Population Genetic Structure

Hardy-Weinberg Equilibrium

F-Statistics

Theoretical Background

Implementations in pegas and in mmod

Implementations in snpStats and in SNPRelate

Trees and Networks

Minimum Spanning Trees and Networks

Statistical Parsimony

Median Networks

Phylogenetic Trees

Multivariate Methods

Principles of Discriminant Analysis

Discriminant Analysis of Principal Components

Clustering

Maximum Likelihood Methods

Bayesian Clustering

Admixture

Likelihood Method

Principal Component Analysis of Coancestry

A Second Look at F-Statistics

Case Studies

Mitochondrial Genomes of the Asiatic Golden Cat

Complete Genomes of the Fruit Fly

Influenza HN Virus Sequences

Jaguar Microsatellites

Exercises



8. Geographical Structure

Geographical Data in R

Packages and Classes

Calculating Geographical Distances

A Third Look at F-Statistics

Hierarchical Components of Genetic Diversity

Analysis of Molecular Variance

Moran I and Spatial Autocorrelation

Spatial Principal Component Analysis

Finding Boundaries Between Populations

Spatial Ancestry (tessr)

Bayesian Methods (Geneland)

Case Studies

Complete Genomes of the Fruit Fly

Human Genomes

Exercises



9. Past Demographic Events

The Coalescent

The Standard Coalescent

The Sequential Markovian Coalescent

Simulation of Coalescent Data

Estimation of _

Heterozygosity

Number of Alleles

Segregating Sites

Microsatellites

Trees

Coalescent-Based Inference

Maximum Likelihood Methods

Analysis of Markov Chain Monte Carlo Outputs

Skyline Plots

Bayesian Methods

Heterochronous Samples

Site Frequency Spectrum Methods

The Stairway Method

CubSFS

Popsicle

Whole-Genome Methods (psmcr)

Case Studies

Mitochondrial Genomes of the Asiatic Golden Cat

Complete Genomes of the Fruit Fly

Influenza HN Virus Sequences

Bacterial Whole Genome Sequences

Exercises



10. Natural Selection



Testing Neutrality

Simple Tests

Selection in Protein-Coding Sequences

Selection Scans

A Fourth Look at F-Statistics

Association Studies (LEA)

Principal Component Analysis (pcadapt)

Scans for Selection With Extended Haplotypes

FST Outliers

Time-Series of Allele Frequencies

Case Studies

Mitochondrial Genomes of the Asiatic Golden Cat

Complete Genomes of the Fruit Fly

Influenza HN Virus Sequences

Exercises


A Installing R Packages

B Compressing Large Sequence Files

C Sampling of Alleles in a Population



andere Formate