Oracles and Science: The Trouble with Predictions

Mortagne-reeds and elms refracted in the ball
“Mortagne-reeds and elms refracted in the ball” by Mo is licensed under CC BY 2.0

We all want to know the future, but what is the best way to predict what will happen? Assuming we don’t have a crystal ball or a time machine, we have to find patterns in the available information and use that make our best, informed guess. This is what scientists do.

There is spectrum of things we like about science. For one, science discovers fascinating things. On the more practical side though, people want science to provide reliable predictions. These predictions help people solve or avoid problems. This isn’t that different from the oracles of the ancient world. The difference, however, is that scientists need to be very systematic and open about how they came to make their predictions.

As diligent as a scientist may do these things, there are two nagging problems when it comes to the public making use of the information in scientific predictions. First, we know that there is usually variability in a system, which could lead to the actual result deviating from what was predicted. Through statistics, scientists try to account for this variability and produce quantified descriptions about our confidence in our predictions. Unfortunately, these aren’t always very useful because of our difficulty in processing numerical probabilities. The second problem is that the problem of induction could potentially pop up and surprise us. When this problem does rear its ugly head, it compromises our predictive ability, including the reliability of our descriptions about confidence.

Let’s start by discussing some of the troubles we have with understanding probability. One of the problems is that it is easy to believe something with a small probability is not likely to happen. Even with an extremely small probability, if there are sufficient opportunities for the event to occur, then it becomes likely to happen sometime. Another example of how our intuition interferes with understanding probability is our tendency to believe past experience affects the likelihood of future events. In gambling it isn’t uncommon to think that a number that hasn’t appeared for awhile becomes more likely to appear in the future. It would be equally fallible to assume that because a pair of dice summed to seven for the past ten rolls, the next roll will also sum to seven. Compounding issues like these is our tendency to misperceive probabilities by overly focusing on anecdotes. Because we tend to filter out less dramatic events, it is very easy for us to form confirmation biases. Having those biases can lead to someone even rejecting the results of more systematic data analysis because of conflicts with their perceived understanding.

It should also be noted that the resolution of a prediction and our confidence in it are inversely related. The oracles were infamous for providing vague predictions, but this increased their odds of being right! To use the classic dice as an example, I can more confidently predict that the next roll of two dice will sum to between five and nine, than I could predict the next roll will sum to exactly seven. As scientists serving society, we try to give the highest resolution predictions possible with the information we have. The marker for our limitations in that resolution is generally the acceptable uncertainty or risk.

Even if everyone grasped what the estimated probability of something meant, there is still the issue of not knowing if we know everything. As Donald Rumsfeld described it, “there are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don’t know. But there are also unknown unknowns. There are things we don’t know we don’t know.” This may sound like some kind of philosophical loop, but it’s really a practical recognition of our limitations to predict what we will find in the future. For example, say your friend wants to have lunch with you and have tomato soup. You suggest a certain restaurant because you make the prediction that it would have the best, most fresh tomato soup that day. You make this prediction because it is Thursday, and for the past two years, every time that you’ve gone to that restaurant on Thursdays the soup of the day was the best, most fresh tomato soup. But when you and your friend arrive, you find the soup of the day is chicken soup; there is no tomato soup. Was your prediction flawed? Not really. Based on the information you had, your prediction fit the pattern. Unfortunately, there were factors that you were not aware of. Maybe there was a shortage of tomatoes that week or the chef’s child was sick last night and they didn’t have time to boil down the tomatoes. This is an illustration of the ‘problem of induction.’ Essentially, induction is a great procedure for formulating ideas about how our world works, but it can’t account for situations that we haven’t encountered yet. So should we just give up on predicting the future? Probably not, but we do need to keep things in perspective.

If the public does not understand the process by which scientists make predictions, we risk scientists being viewed as oracles. For a time this could be a positive relationship, but we know in science that eventually there will be surprises. To someone who understands the scientific process, surprises are exciting opportunities to improve understanding. However, to someone who is relying on scientists’ predictions like one would the predictions from an oracle, only one mistake is enough to be discarded as a false prophet. Thus it is important for everyone to understand that as good and useful science is, there is always a chance of a prediction based on current scientific knowledge to be wrong. This should not reduce the confidence we put into science. Instead it should simply moderate our perception of it. If science didn’t lead to reliable predictions, we wouldn’t have the technology that has produced the quality of life we enjoy today.

For more on related topics, take a look at:

Bonus: Here is an example of the vague language I think should be avoided when making predictions. Maybe the audience doesn’t like hearing the possibility that a prediction can be wrong, but avoiding analysis and using overly vague language helps no one. Tips from The Enterprise System Spectator.

Soil mapping, classification, and pedologic modeling: History and future directions

Brevik, E.C., A. Baumgarten, C. Calzolari, B.A. Miller, A. Jordán, P. Pereira, and C. Kabala. 2015. Soil mapping, classification, and modeling: history and future directions. Geoderma 264:256-274. doi: 10.1016/j.geoderma.2015.05.017. Continue reading “Soil mapping, classification, and pedologic modeling: History and future directions”

Comparison of spatial association approaches for landscape mapping of soil organic carbon stocks

Miller, B.A., S. Koszinski, M. Wehrhan, and M. Sommer. 2015. Comparison of spatial association approaches for landscape mapping of soil organic carbon stocks. SOIL 1(1):217-233. doi:10.5194/soil-1-217-2015. Continue reading “Comparison of spatial association approaches for landscape mapping of soil organic carbon stocks”

CLORPT: Spatial Association in Soil Geography

From as early as 500 BCE, humans have recognized that some things vary together in space. This is essentially correlation, but the spatial aspect sometimes adds a special twist. Also, correlation requires evaluation of quantitative data, while this concept is not limited to quantitative characteristics. For example, Diophanes of Bithynia observed that “you can judge whether land is fit for cultivation or not, either from the soil itself or from the vegetation growing on it.” Although used frequently in the history of science (e.g. Humboldtian science), the first naming of this principle that I have found appears in a book by F.D. Hole and J.B. Campbell, published in 1985. They referred to it as spatial association. Because I am not aware of another term that covers this concept, I will continue with their use of it. Unfortunately, in the 1990s some began to use this term to describe clustering. In order to be clear, I define spatial association as the degree to which phenomena are similarly arranged over space.

Eugene W. Hilgard
(Online Archive of California)

The first scientific application of spatial association to soil mapping that we know about was by E.W. Hilgard. In 1860, he published his report on the ‘geology and agriculture’ of the state of Mississippi, USA. Hilgard observed that knowledge of the geology and type of vegetation were useful indicators for predicting soil type. In 1883, V.V. Dokuchaev added climate, relief, organisms (both plants and animals), and time to that list of useful spatial predictors. Because these spatial covariates are connected to processes, thinking about their geography enabled Dokuchaev to formulate ideas about soil formation. His descriptions of these factors of soil formation were key in the establishment of modern soil science.

Coinciding with the ‘quantitative revolution,’ H. Jenny wrote a landmark book entitled Factors of Soil Formation (1941). In this book, Jenny accomplished two main things. First, he coined an acronym for the soil formation factors: CLORPT (CL=climate, O=organisms, R=relief, P=parent material, and T=time). This easy to remember abbreviation popularized the concept and became the standard framework for teaching about soil formation. Second, Jenny proposed a system to experimentally control geographic variables so that a single variable could be better studied. He advocated for research to be designed so that soils that formed under similar factors, except for one, could be quantitatively compared. This way, differences between the soils compared could be directly attributed to the one factor that had changed. In practice this is a bit harder than it sounds because the different factors influence one another, but this was a greatly improved strategy for advancing soil science.

Before the factors of soil formation were assigned an acronym, soil mappers were regularly using them to design their maps. Notably, Hilgard’s application of geology and vegetation as predictors was primarily focused on producing a better spatial description of where different soils were. Dokuchaev’s work prior to and after writing the list of five factors was driven by the Russian government’s desire for better soil maps. Most of the soil maps made at that time were at the continental or national scale and the limited information available led to a heavy reliance on large scale climate. However, later work – particularly more detailed soil maps – began to utilize the other factors as predictors of soil variation. As T.M. Bushnell synthesized these concepts – along with G. Milne’s catena concept – in the 1940s, he applied them to what he could see in aerial photographs. Those images provided more spatial information about vegetation and relief than had been previously available.

Recognizable by changes in the image’s tone and texture, shifts in vegetation helped the soil mapper decide where to delineate different soils.

Soil mapping in the 20th century continued to build on field experience to better understand the local variations of CLORPT. It was still difficult to quantify many of the indicators for soil formation factors, so soil mappers tended to develop unique mental models of the soil landscape. These models were based on their experience in a region for key indicators that marked shifts from one soil series to another, usually in connection with one of the soil formation factors. However, within those mental models, certain factors tended to become emphasized due to the limited spatial information available, map scale, purpose of the map, and the particular conditions of the area.

Today in digital soil mapping, we still utilize these concepts. Because we use much more quantitative variables – still primarily related to CLORPT – we typically describe our method as spatial regression, or something related to that. However, the geographic principle for why spatial regression works remains rooted in the idea of spatial association.

Error Propagation Toolbox

New estimated errors are calculated for each raster cell based on the combination of the two input rasters.

Quantifying uncertainty can be a very useful and often important aspect of evaluating results of calculations, particularly in modelling. The same applies for spatial layer mashups where the grids provide the input variables for equations that are calculated spatially (i.e. raster calculator). This toolbox for ArcGIS uses standard error propagation equations to simultaneously calculate the result of basic math expressions along with the estimated error of that result. The measured or estimated errors for the input variables are required. Error covariances can also be included in the calculation of error propagation, but are not required.

[wpdm_package id=’655′]

[download id=”2434″]

Impact of multi-scale predictor selection for modeling soil properties

Miller, B.A., S. Koszinski, M. Wehrhan, and M. Sommer. 2015. Impact of multiscale predictor selection for modeling soil properties. Geoderma 239-240:97-106. doi:10.1016/j.geoderma.2014.09.018. Continue reading “Impact of multi-scale predictor selection for modeling soil properties”

Fundamentals of Spatial Prediction

In the process of creating a map, geographers often have to engage in the activity of spatial prediction. Although there are many tools we use to accomplish this task, they generally boil down to the use of one or two fundamental concepts.

Waldo Tobler is credited for identifying the ‘first law of geography’, stating “Everything is related to everything else, but near things are more related than distant things.” This concept is essentially synonymous with spatial dependence and spatial autocorrelation. Spatial interpolation methods rely on this principle to make predictions about the attributes of areas between sampled locations. For example, kriging utilizes an observed spatial lag relationship to determine the range of autocorrelation. In other words, it quantifies the degree to which locations are similar with respect to their distance from each other (i.e. semivariogram). Kriging then uses that information to optimize predictions for the unobserved locations.

If spatial autocorrelation is the first law of geography, then spatial association should be the second. Actually, spatial association has arguably been in use longer, so maybe it should be the first law. In any case, spatial association describes how phenomena are similarly distributed. To put it in a phrase parallel to Tobler’s law, “Everything is related to everything else, but things sharing similar conditions are more related than things under dissimilar conditions.” A quantitative form of spatial association is spatial correlation or regression. A more focused use of spatial association is often called environmental correlation, which uses environmental covariates as predictors. Soil science has heavily relied on this concept. Although vegetation was recognized as an indicator of soil quality by the ancient Greeks, E.W. Hilgard formally described the relationship of soil properties with the more readily observable characteristics of vegetation in 1860. V.V. Dokuchaev went further in 1883 by recognizing that soil characteristics could be predicted by considering the factors of climate, organisms, relief, parent material, and time. The guiding principle that where these five factors are the same, similar soil will be found remains the primary strategy for mapping soils today.

This colorized version of a diagram from the USDA-NRCS illustrates spatial association by identifying similar soils on similar landscape positions. It is also true that adjacent soils are more likely to be more similar than non-adjacent soils, but with the exception of distant soils that have the same formation factors.
This colorized version of a diagram from the USDA-NRCS illustrates spatial association by identifying similar soils on similar landscape positions. It is also true that adjacent soils are more likely to be more similar than non-adjacent soils, but with the exception of distant soils that have the same processes of soil formation.

The concept of spatial association has also been referred to as regionalization, but that term is easily confused with different forms of spatial analysis. The term ‘regionalization’ has also been used to describe the process of identifying regions based on clusters and it has been used to describe spatial interpolation methods, such as kriging. To add to the confusion, spatial association has also been used to describe statistics that evaluate the existence of clusters. To make things worse, spatial autocorrelation has been sometimes described as a type of spatial association. For these reasons, I think it is important that we establish spatial autocorrelation and spatial association as independent, fundamental concepts in spatial prediction.

Recognizing these two separate concepts in geography makes it easier to explain the variety of spatial prediction methods that attempt to utilize some blend of both. For example, co-kriging adds information from covariates with similar spatial distributions to improve upon interpolations based on spatial autocorrelation. Conversely, geographically weighted regression identifies spatial association relationships within different spatial units, which can be based on cluster analysis or some other form of analysis that recognizes similarity by spatial autocorrelation.

The importance and utility of spatial autocorrelation and spatial association, as defined here, is clear. However, the consistent use of these terms, especially for spatial association, has clouded the recognition of their widespread use. Regardless, these are fundamental and unifying concepts in geography.

Multi-scale Parameter Selection for Predicting Soil Organic Carbon (2014 Digital Soil Mapping Workshop)

[embeddoc url=”” download=”all”]

Potential benefits of wetland filters for tile drainage systems

Crumpton, W.G., G.A. Stenback, B.A. Miller, and M.J. Helmers. 2006. Potential benefits of wetland filters for tile drainage systems: impact of nitrate loads to Mississippi River subbasins. U.S. Department of Agriculture, Project Report IOW06682.34 pgs. Continue reading “Potential benefits of wetland filters for tile drainage systems”