Posts filed under 'R'

r on youtube

I often use YouTube to find out all kinds of stuff - when I want to learn new gym exercises for example, it is easy to find all kinds of instruction videos there. I hadn’t really thought of using it to find R tutorials yet, however, which could be quite useful for students. So far I found one series, with some useful parts.

Useful for students:
Help search
Plots and history
External files
Large datasets

Less urgent ones:
CRAN on the web
Vector arithmetic
Matrix operations

I have not actually checked them - going by the titles. If there are mistakes on them, let me know.

Add comment November 17th, 2008

beanplot

Beanplot … Just a nice alternative to the boxplot. I should use this diagram from Wikipedia to explain the boxplot in class - it looks very nice.

Add comment November 15th, 2008

sweave / latex compile script

Ah, just because I got tired of typing latex; bibtex; latex; latex; dvipdf every single time I wrote a little script a few months ago to do that. Of course, the script is super simple:
latex $1; bibtex $1; latex $1; latex $1; dvipdf $1; cleanup
whereby cleanup refers to another little script that just deletes all Latex’s intermediary files. (Perhaps there is a Latex setting to have them saved in an entirely different place anyway, wouldn’t there be?)

Later I added one for the rare occasion that I use Sweave as well to integrate R code into my Latex files. Now I am also starting to become a fan of the pic language to add diagrams to my files and it also integrates very well with Latex. So now I upgraded my script a little and just post it here in the odd case that it turns out to be useful for someone else as well. The Sweave version:

#!/bin/sh

R CMD Sweave $1

fn=c`date +"%M%S"`

pic -t $1.tex > $fn.tex
rm $1.tex

latex $fn
bibtex $fn
latex $fn
latex $fn

dvipdf $fn $1.pdf

rm *.aux
rm $fn*

I had tried Sweave earlier but was not too satisfied, since most things I was estimating took too long to have them run every time I compile my text. But now I have been working on a document with a lot of straightforward regressions, several plots, and several diagrams, and the final layout is only of importance as an internal working document - not for publication - so I have been making heavy use of both Sweave and pic and I am really enthousiast about the result.

Add comment October 5th, 2008

talking R

Based on this post in Andrew Gelman’s blog, which I read religiously, I wrote a little R template for the bootstrap procedure. Well, the template is simple, but the cool thing is that it tells you when it is finished, so you can stop going back and forth to the R window to check whether it’s done. This one is fast, but for a long, slow procedure, that’s pretty cool. Alas, only works on a Mac (and perhaps Linux?).

x <- rnorm(100,3,2)

bs <- NULL
for (i in 1:1000)
{
bs.sample <- sample(x, length(x), replace=T)
bs[i] <- mean(bs.sample)
}
system(sprintf("say The bootstrap has finished. The result is a mean of %4.2f, with standard error %4.2f. You can access the result with the b s variable.", mean(bs), sd(bs)))

Add comment October 28th, 2007

r tutorial

This video tutorial is an interesting way to get a basic idea of how the statistical package R, which I use for teaching statistics, works.

Add comment October 10th, 2007

a nicer ls for R

R already has a function called ls(), which simply lists the names of all objects in the current environment, and ls.str(), which does so with a lot more information. I wanted one that looks a little more similar to ls -l in a Unix environment, so here’s a start:

my.ls <- function(envir = as.environment(-1)) {

       names <- .Internal(ls(envir, all.names=T))
       for (item in names) {

               l1 <- length(get(item))
               l2 <- ""
               if (!is.null(dim(get(item)))) {
                       l1 <- dim(get(item))[1]
                       l2 <- sprintf("%10d", dim(get(item))[2])
               }

               cat(sprintf("%-30s  %-10s  %-10s  %10d %10sn",
                       item, class(get(item)), mode(get(item)), l1, l2))
       }
}

Add comment April 29th, 2007

merging in R on name

Often people want to merge datasets and have names of countries or locations that they want to merge on. These names are often somewhat similar, but not exactly. A function in R that is hugely useful to merge in this case is called agrep. With this function you can do approximate matching of names (or rather, or strings as subset of other strings). To merge properly, though, you do want to avoid matching the same name twice and you want to prioritize exact matches over very fuzzy matches. The idea is not mine, but Eduardo’s. To do so, I wrote a little R function, which is here in beta version:

agrep.wrapper < - function(x, y, names.x = "name", names.y = "name", ids.x = "id", ignore.case=TRUE, max.threshold=1) {

    x <- as.data.frame(x, stringsAsFactors=FALSE)
    y <- as.data.frame(y, stringsAsFactors=FALSE)

    unique.x.select <- !duplicated(x[,ids.x])
    unique.x.names <- x[,names.x][unique.x.select]
    unique.x.ids <- x[,ids.x][unique.x.select]
    
    unique.y.select <- !duplicated(y[,names.y])
    unique.y.names <- y[,names.y][unique.y.select]
    unique.y.ids <- rep(NA,length(unique.y.names))
    
    matching.x.names <- unique.x.names
    matching.x.ids <- unique.x.ids
    
    for (threshold in seq(from=0, to=max.threshold, by=.1)) {
        
        i <- 1
        while (i <= length(matching.x.names)) {
            
            select <- (1:length(unique.y.ids) %in% agrep(matching.x.names[i], unique.y.names, ignore.case=ignore.case, max.distance=threshold)) & is.na(unique.y.ids)
                
            if (sum(select) > 0) {
            
                unique.y.ids[select] <- matching.x.ids[i]
                matching.x.ids <- matching.x.ids[-i]
                matching.x.names <- matching.x.names[-i]
            } else
                i <- i + 1
        }
    }
            
    unique.data <- merge(data.frame(unique.x.names, unique.x.ids), data.frame(unique.y.names, unique.y.ids), by.x=”unique.x.ids”, by.y=”unique.y.ids”, all=TRUE)
    
    list(matches = unique.data)
}

Add comment March 8th, 2007


Calendar

November 2008
M T W T F S S
« Oct    
 12
3456789
10111213141516
17181920212223
24252627282930

Posts by Month

Posts by Category