Posts filed under 'social science'

electoral behaviour in indonesia

Being a good political scientist traveling to Indonesia, the second day after my arrival I dutifully bought a book on Indonesian politics. Because during the past few months I have been working on analysing the voting behaviour in the Lisbon referendum in Ireland my mind is set on electoral studies a bit more again - something I had not seriously worked on since leaving Leiden - so I bought Indonesian Electoral Behaviour: a statistical perspective by Ananta Aris, Evi N. Arifin & Leo Suryadinata. Now you should know that in this particular bookstore, most books are wrapped, so I had no way of looking inside the book. It looked like a thick book (480 pp), it was slightly pricey (well, just slightly), so, hey, it must be good, right? The disappointment that followed lead to this post …

In the first place, the authors appear to have a somewhat limited knowledge of statistics. They make remarks like “It should be noted that the Indonesian population, aged 20 and above, was 119 million in 1999. However, Liddle and Mujani’s study was based on a small sample of 2,500 individuals, and the regression analysis was based on an even smaller number of 1,100 individuals.” (p. 5) 1000 respondents is not at all that small - depending on how many different subgroups you are studying - and more importantly, the population size of Indonesia has absolutely nothing to do with this! The sample size is important for drawing statistical inferences, not the population size - a widely known fact.

I now read well over half the book, and all I have seen so far is raw data. And not even about elections - it is the independent variables that are extensively discussed. And nothing about measurement, or reliability, or quality, or relevance for the problem at hand - no, a pure and simple description of the actual data. In region X so many people live in cities, while, remarkably, in region Y only so many people live in cities. *yawn* And then the include the data - almost the entire data appears to be described in the text itself, but lets put the tables as well. Pages and pages and pages of data. The book is 480 pages, but I bet only about 80 contain actual substantive text - probably much less than that. The rest is data, pure data.

And as I said, they are entirely without critique or analytical perspective. For example, they happily describe how the Jakarta region has this unusually high level of education (ch. 3) - for Indonesian standards, that is - without mentioning at all that perhaps educated people are more likely to move to the capital than to stay in the village. Perhaps the reason for the distribution is irrelevant when explaining voting behaviour (although, I doubt it), but if you go through the hassle of explicitly describing every single data point, why not demonstrate that you actually thought about it a bit more than just reading the table itself? Or statements like: “This finding implies that the distribution of per capita income is heavier toward the districts with low per capita income” (p. 219) - surely is it not a finding to discover that income has a skewed distribution? Everybody knows that!

The political analysis is of a similar level. First for two chapters the vote distribution of the various parties is discussed - or rather, presented. There is absolutely no analysis. And then the final chapters contain regression analysis of all these results, explaining the number of votes by the number of Muslims, Javanese, poor, educated, etc. It absolutely ignores the effect of population size - in other words, if there are two districts, each with the same proportion of Muslims and the same proportion of votes for a party, but with different population sizes, their regression picks this up as a clear correlation. Perhaps because all the independent variables have this effect, so the variation in the population size should be captured by the multicollinearity between the variables, this issue is not too serious, but it still seems very odd not to study proportions instead of raw numbers.

When they split their sample to study Java separately from the other regions, they ask why it would be that the effect of the percentage of Javanese or Muslims is so much weaker than in the other regions. But surely, with an area where about 95+% of the population is Muslim, you would not expect much effect of the variation in numbers on the vote? While in the non-Java regions, the variation is much higher, so also more potential for an effect. They also make very easy comparisons of coefficients across models, despite the fact that the population sizes differ, the set of control variables differ (they dropped insignificant variables from the regression - bad habit as such), and the variance in the independent and dependent variables differ. I.e. utterly inappropriate comparisons.

When you write a book using solely aggregate statistics, such as this one, the very first concern would be the problem of ecological fallacy - drawing conclusions about individuals on the basis of aggregate data. In this book they are perfectly happy to claim that Muslims clearly voted this, or that for some party the number of Muslims was very important, but the number of non-Muslims is irrelevant (well, that’s not an example of ecological fallacy, but absolutely no idea what that is based on!), or that the Javanese in the Outer Islands clearly voted for …

On a more positive note, I should state that the data they have is actually quite interesting. And interesting to analyse statistically, as well. Their critique on using a survey of individuals is not reasonable, and such a survey would provide a lot of leverage that this data cannot provide, but the advantage of the aggregate data is that you properly cover the entire country, which with survey analysis is doubtful. But such statistical analysis should be more careful with the conclusions drawn - i.e. avoid the ecological fallacy - and it should also take population size and spatial autocorrelation into account. It would also help if not all independent variables were reduced to dummy variables - why is “ethnicity” only about “Javanese” and “religion” only about “Muslim”?

But I think the worst offense of the book is the title: this is not a statistical analysis of voting behaviour in Indonesia, as it suggests, but rather a data book cataloging a set of variables in Indonesia’s provinces and districts.

Add comment December 24th, 2008

bayes

The Future of Bayes is a nice blog entry on, well, the future of Bayesian analysis. Note also the reference in the comments to PyMC, a Markov Chain Monte Carlo class in Python. Perhaps there is no reason to use R at all for my classes, nor the frequentist paradigm, and I can just switch to a pure programming language! (If students read this: just kidding …)

Add comment November 20th, 2008

understanding society

Daniel Little, the author of a textbook of which I use a couple of chapters in my research design course, has a very interesting blog about the philosophy of social science. As he puts it himself: “The blog is an experiment in writing a book, one bite at a time.” A nice blog to keep an eye on for any social scientist …

Add comment November 16th, 2008


Calendar

January 2009
M T W T F S S
« Dec    
 1234
567891011
12131415161718
19202122232425
262728293031  

Posts by Month

Posts by Category