Category Archives: R

Dissertation thesis online

On the 28th of March 2017 my dissertation thesis was published on the document server of the library of the Freie Universität Berlin and I can call myself Dr. rer. nat. Arnd Weber finally.


Metadata are available here and a link to the final electronic version of my doctoral thesis is available here. Since I conducted most of the statistical analyses in R they are reproducible. To rerun the analyses and reproduce the results you can follow a description in the Appendix of the thesis and download the data and scripts from the webserver.

Article online!

Yesterday, a new publication for which I programmed the GIS analysis and co-authored, has been put online:

Heuner, M., Weber, A., Schröder, U., Kleinschmit, B., Schröder, B. (2016). Facilitating political decisions using species distribution models to assess restoration measures in heavily modified estuaries. Marine Pollution Bulletin 110(1), 250–260. DOI: 10.1016/j.marpolbul.2016.06.056.

With the supplementary material of data, R- and Python-scripts you can reproduce the analysis.

DSL Geschwindigkeiten

Da ich in den vergangenen Wochen einen neuen DSL-Vertrag abgeschlossen habe, bei dem vermehrt Probleme bei der Erreichbarkeit meines privaten Webservers auftraten, habe ich beschlossen Verbindungsgeschwindigkeiten und Erreichbarkeit zu überwachen. Basierend auf diesem Blogpost und dem zugrundeliegenden Python-Tool zur Messung von Verbindungsgeschwindigkeiten von Janis Jansons habe ich eine dauerhafte Geschwindigkeitsüberwachung, die alle 10 Minute ausgeführt wird, eingerichtet.
Da die produzierte *.csv-Datei allerdings reichlich unübersichtlich ist, habe ich ein kleines Skript in R geschrieben, um die Daten, die in die *.csv-Datei geschrieben werden, zu visualisieren. Zunächst muss also R installiert werden. Unter Ubuntu erfolgt das mit folgendem Befehl:
sudo apt-get install r-base
Danach speichert man das folgende Skript lokal unter ~/scripts/speedtest_plot.R ab.

Und führt es anschliessend ebenso zeitgesteuert, wie den Speedtest, als Cronjob aus. Dazu öffnet man mit crontab -e die Crontab und fügt folgende Zeile ein:
1 * * * * Rscript ~/scripts/speedtest_plot.R 2>&1 /dev/null
Dieser Befehl, der stündlich ausgeführt wird, produziert drei unterschiedliche Plots, von denen einer – nämlich der mit den Ergebnissen der Geschwindigkeitsmessungen der letzten Woche – hier dargestellt ist. Hinter dem verlinkten Bild findeet sich eine interaktive R Shiny-Animation:

Geschwindigkeitsdaten der letzten Woche

Parallelize automatic model selection using glmulti and Rmpi

glmulti is a Java-based package for automated model selection and model-averaging. It provides a wrapper to select optimized glm’s and other model functions with specified response and explanatory variables based on an information criterion (AIC, AICc or BIC). With an increasing number of potential explanatory variables the number of candidate models increases exponentially making an exhaustive screening of the candidates computationally intensive and time-consuming. The package-related article by Calcagno & de Mazancourt (2010) in the Journal of Statistical Software gives a general overview about model selection and the way how this can be achieved using the package glmulti.

To speed up this process the function glmulti already provides two variables (chunk and chunks) to split the model selection process into subprocesses. The results of the subprocesses can be stored as objects of the S4 class glmulti to join the results and select the overall “best” models using the function consenus. The parallelization of the computations can be achived through a variety of methods, but here I want to present a possibility using the R-package Rmpi, an interface to MPI (Message-Passing Interface). The Acadia Centre for Mathematical Modelling and Computation provides a very nice tutorial about the use of Rmpi, describing three different methods including example scripts. I applied the Task Pull method to a working example from the original article by Calcagno & de Mazancourt (2010) based on the birth weight dataset provided by the MASS package. In the following script you will find the outcome of a combination of the task pull approach from the Rmpi tutorial and the real data example, given in chapter 3.3 of the research article by Calcagno & de Mazancourt (2010). I hope you are able to re-use it for your personal needs and speed up your computations, too.

These scripts were developed as part of my employment at the Federal Institute of Hydrology (BfG) in Koblenz, Germany.