SFGate recently ran a bit called “Is the bubble back?” highlighting a “creeping” median sales price from August to September in San Francisco as evidence of increasing prices and a real estate “comeback.”
Ignoring the fact that the featured RE Report summary data for August doesn’t tie to their own District level data (288 “Home” and 207 “Condo” sales according to their summary versus 202 and 188 sales respectively when we sum their District data), perhaps a basic understanding of what’s driving the change in median sales price is in order.
Repeating the down and dirty analysis we outlined a year ago, if we rank order average District medians in August and September from low-cost to high-cost areas (considering condos and single-family homes as two distinct “Districts”), establish a median “District” or cutoff based on total transactions, and then compare the number of sales in Districts above and below said median we see a nominal 1% decrease in “low-cost” District sales versus a 10% increase in “high-cost” district sales.
Isolating single-family home and condo sales, we see a 23% decrease in “low-cost” district sales versus a 2% decrease in “high-cost” districts for single-family homes. And for condos it’s a 9% increase in “low-cost” districts versus a 22% increase in “high-cost” districts.
In other words, absent any change in underlying “prices,” or even despite a decrease, the median sales price in San Francisco was bound to increase as the proportion (mix) of high-cost home sales increased.
And for the last time (we can dream), while median sales price isn’t a bad measure of what people are buying, using changes in median sales price as a proxy for market appreciation (or depreciation) is a lousy if not misleading measure when mix is changing as well.
∙ Is the bubble back? Median prices creeping up in San Francisco [SFGate]
∙ SocketSite’s San Francisco Listed Housing Inventory Update: 8/05/08 [SocketSite]

33 thoughts on “Medians Are Up, But Don’t Confuse That With Increasing “Prices””

I don’t think there is anything I disagree with in the above, and I am a big fan of the apples methodolgy, but I’d note that the standard disclaimer advanced for upticks in sales and descreases in inventory has been the opposite mix argument. I guess the higher cost home still closes occassionally in SF?

I don’t think I disagree with much of the post either. I didn’t really consider medians something to rely on on the way up during the boom, on the way down after it, and with its slight upticks now, so I don’t think I should start any time soon. Medians are heavily affected by mix.
As steve mentions, apples are much better. Price per square foot can be helpful on occasion too, although they’re a little complicated because they can be affected by mix as well sometimes.
It’s interesting to see the folks (e.g. NAR/CAR) who occasionally claimed, when prices were clearly dropping, that the median was heavily affected by foreclosures in low-cost places relying on medians now to claim the market is going up.
I think part of the problem is that there isn’t enough data publicly available to be able to draw too many conclusions.

Perhaps real estate reporters and real estate bloggers should steer clear of statistical analysis. I’m sorry but this is a really convoluted way of demonstrating what a median is supposed to provide for a given a data set… If I read this right, you created individual monthly median sales prices for each district and then rank-ordered the districts and then divided them into two camps and then compared growth rate of sales volume between the two camps? What’s the point? You can compare median and mean across the entire dataset over time–far more reliable that this wacky method, btw–and determine the skew. That is the exact purpose of median… The way this sight slices and dices data sometimes gives me the willies. Just like when the Chron does it.

Wow… I read this site a lot, hardly ever comment. While I’m not a career statistician, I feel I do enough statistical analysis in my job to say that this is one hell of a cherry pick.
You typically provide some valuable insight on this market, but every now and then you clearly drift into bias, and this is one of those times.
I could write four pages on why this is flawed, but the above post summarized it quite well.
[Editor’s Note: Sorry, but there’s no cherry picking or bias, it is what it is and basic math to boot. In terms of those four pages, we’d love to see them. Heck, we’d settle for four sentences and will feature them on the front page if well done.]

Oh Socketsite. You are the eternal “bears” of the real estate market. Good times, bad times, and neutral times…you always look for a negative spin. Are we bitter?
Why not short some REIT’s or buy some bonds?
You make me laugh.
[Editor’s Note: Sorry, but this isn’t “bearish” (or bullish) and there’s nothing to be bitter about. As noted above, it is what it is and basic math.]

Sorry, I am not a fan of apples methodology. It’s a great way to compare on a house to house basis but I can’t see how the market trend could be correctly predicted. The data set will be full of outliers and exceptions that will need to be filtered out leaving very few “true” apples to make a meaningful conclusion.
The problem with statistics is incorrect data analysis which happens all too often. In the example above, they assumed that the population in question is the same as the year before. That assumption is obviously incorrect. Similarly an uptick in YOY sales figures does not mean the market is back when tons of foreclosures and short-sales are sold.

“The problem with statistics is incorrect data analysis which happens all too often.”
Good point, we should just use straight math instead of using statistics.
What’s statistics again?

I do use stastical analysis frequently in my profession, and do not have a problem with Socketsite’s analysis above. He is saying, the increase in median price is due to a change in mix or properties closing much more than a year over year price increase on those properties. I agree.
Now, you can argue if you want that the mix skewing to more upper-end properties closing is a portent of a new bubble. But shooting the messenger (or at least hanging a “bear” label on the messsenger) is stupid when the messenger is factually correct.

I got to admit, I’m surprised how sticky prices have been in San Francisco over these past 18 months. We’ve been through the biggest downturn in history, and it’s still hard to afford a SFH 3/2.5 in a nice part of town.
I’m just afraid of what happens once another several thousand millionaires get rich from Facebook. It’s like it’s a never ending parade of wealth. Google at 400+ doesn’t help affordability either.
Ohwell, at least it’s good the markets and economy is back for us all.

The editor and others seized on median as evidence — without a disclaimer in sight — many, many times over in the past. When the trend was down, that is. Nice to see this site regaining some balance.
[Editor’s Note: How about a few examples of those “many, many times over” on our part?]

In a previous life, I designed fractional factorial experiments for process development studies for a large Pharma (yes, it kind of sucked after a while).
I would argue there is a better way to look at the market than the one discussed above.
In general, I think the most appropriate method, is a case by case one. Many other analysis are confounded, or moot.

Generating a single number to describe a heterogenous data set is hard.
Or as they say on TV, “Kids! Don’t try this at home.”
“What’s statistics again?”
A very useful and well developed branch of mathematics that can provide useful insights if you use it correctly on applicable problems. When used properly it generates both an estimate of the underlying quantity that you are trying to measure and an estimate of how close that estimate could be to the real value.
For instance, if I want to test if flipping a coin has an equal probabilty of coming up heads (0.5) or tails (0.5) I can flip a coin multiple times and use that to estimate the coin’s probabilities.
In 2008 I flip a coin 10 times and it comes up heads 6 times and tails 4 times.
In 2009 I flip the same coin 10 times and it comes up heads 4 times and tails 6 times.
Have I proven that the heads probability has changed from 0.6 to 0.4 over that period? Or are both trials consistent with a probability of 0.5?
Until I calculate the error on those estimates I don’t know. The error on the determination of probability is .154 for each trial of 10 flips. Which means that in 2008 there was a 60% chance that the real value of probability was within the range 0.45 – 0.75 and in 2009 there was a 60% chance that the real value was within the range of 0.25 – 0.55.
So what can I determine about the change in probability over that period? Pretty much nothing. The statistics are too crummy to say much of anything. If instead I had flipped the coin 1000 times in each trial then my error would be .0154 and I could say that in 2008 there was a 60% chance that the real value of probability was within the range 0.585 – 0.615 and in 2009 there was a 60% chance that the real value was within the range of 0.385 – 0.485 and then I would be able to definitively say that there was a change.
Since no one ever publishes the error estimate on their number it’s very hard to know how seriously to take changes in any of these indices.

Editor,
You must know someone with JMP (does it exist anymore? I don’t know), stick your data in a Box Behnken design (just google it) and see if it comes up 3 sigma.
Paul

Need some variables (8 is good):
1. price
2. income demographic
3. nearest school standardized testing result
4. price per square foot
5. time of year
6. cpi
7. dow?
8. view (1-10 scale)
9. proximity to jobs/income
10. walkability (walk score?)
what else? any suggestions?
any suggestions on other designs and confounding?

11. interest rates
12. credit “tightness” (I don’t know how to quantify)
13. fear (again don’t know)
14. greed (commodity Futures?)
15. population trend (maybe same as jobs variable or at least highly correlated).
16. equivalent rents?
17.
Maybe this is too complicated.

This isn’t an issue of statistical significance or establishing a multivariate predictive model (which would be great). This is basic math with existing data and an understanding of median. And the fact remains that in September there were proportionately more high-cost homes sold in San Francisco than in August.
It’s not bearish, or bullish, or bitter. It is what it is and a point we’ve made before (with respect to how mix alone can move the median). We will admit, however, we’re rather impressed with how well “statistics!” is being used as subterfuge.

Isolating single-family home and condo sales, we see a 23% decrease in “low-cost” district sales versus a 2% decrease in “high-cost” districts for single-family homes. And for condos it’s a 9% increase in “low-cost” districts versus a 22% increase in “high-cost” districts.
But I thought the high end was in total collapse?
OK, am much confused now.
Either way, mix happens, sometimes, like winter last year it’s biased towards the low end, somtimes the high end.
what is clear is that it’s only ever mentioned on this site when it’s the latter.

true, but most of those variables are highly correlated to unemployment.
The problem is that you seem to be obfuscating reality, while the double digit unemployment figure is starring you dead in the face and underscoring why demand is collapsing…

REp, you can have a huge decline in prices for higher-end places, but if the mix shifts to include a higher percentage of them in the total, even at the reduced prices, the median will still increase. Hence, the oft-repeated line here that medians are not prices.

Thanks for the analysis. For those who are confused, what socket site has done is to look at the city at a greater level of resolution than just a single number. This makes sense because market behavior is *very* different in different parts of the city. Peoples access to money, loans, jobs, etc differs across the economic spectrum, and the neighborhoods folks are trying to buy into at different price points is also shifting.
This is really about making the analysis *local*. Just like it is easy for us to hear the news about how badly housing has tanked in california but proudly point to how SF proper has had stickier prices. This type of analysis is just trying to be even more “local” by giving a district by district view.
So, I would love to see the actual district by district numbers from the above analysis. I would find it very interesting to see how the various districts rank and how the mix is changing in each.
thanks!

Hahahhaa. How about the countless times I pointed to the southeastern portion of the city affecting mix? This was the very reason the term “Real SF” was coined by cynics. Please. Try every single time the median report was issued.

REp, you can have a huge decline in prices for higher-end places, but if the mix shifts to include a higher percentage of them in the total, even at the reduced prices, the median will still increase. Hence, the oft-repeated line here that medians are not prices.
I wasn’t talking about a decline in prices..more a huge decline in sales at the higher end reported here by many – which would lead unlimately to a fall in prices.
But that is certainly inconsistent with the mix argument here.

So, I would love to see the actual district by district numbers from the above analysis.
Try following the link to “data for August” as provided above (and from there you can get to the September data set as well).

Breaking things down to look at distributions is the latest way of doing statistical investigations. Folks who are really interested in this might want to take a look at the recently published Flaw of Averages by Sam Savage.

Unemployment is certainly a valuable variable, let’s add it to the mix. It’s kind of a trailing indicator.
It’s a trailing indicator for new housing starts, but does NOT trail house values.

Trying to make any sense of district medians is just silly- the monthly volume is too low to make any meaningful conclusions, and property sizes vary too much for any reasonable comparisons. If median $/sq ft were reported it would be *much* more useful (although still flawed).

I don’t think there is anything I disagree with in the above, and I am a big fan of the apples methodolgy, but I’d note that the standard disclaimer advanced for upticks in sales and descreases in inventory has been the opposite mix argument. I guess the higher cost home still closes occassionally in SF?

I don’t think I disagree with much of the post either. I didn’t really consider medians something to rely on on the way up during the boom, on the way down after it, and with its slight upticks now, so I don’t think I should start any time soon. Medians are heavily affected by mix.

As steve mentions, apples are much better. Price per square foot can be helpful on occasion too, although they’re a little complicated because they can be affected by mix as well sometimes.

It’s interesting to see the folks (e.g. NAR/CAR) who occasionally claimed, when prices were clearly dropping, that the median was heavily affected by foreclosures in low-cost places relying on medians now to claim the market is going up.

I think part of the problem is that there isn’t enough data publicly available to be able to draw too many conclusions.

Perhaps real estate reporters and real estate bloggers should steer clear of statistical analysis. I’m sorry but this is a really convoluted way of demonstrating what a median is supposed to provide for a given a data set… If I read this right, you created individual monthly median sales prices for each district and then rank-ordered the districts and then divided them into two camps and then compared growth rate of sales volume between the two camps? What’s the point? You can compare median and mean across the entire dataset over time–far more reliable that this wacky method, btw–and determine the skew. That is the exact purpose of median… The way this sight slices and dices data sometimes gives me the willies. Just like when the Chron does it.

Wow… I read this site a lot, hardly ever comment. While I’m not a career statistician, I feel I do enough statistical analysis in my job to say that this is one hell of a cherry pick.

You typically provide some valuable insight on this market, but every now and then you clearly drift into bias, and this is one of those times.

I could write four pages on why this is flawed, but the above post summarized it quite well.

[

Editor’s Note:Sorry, but there’s no cherry picking or bias, it is what it is and basic math to boot. In terms of those four pages, we’d love to see them. Heck, we’d settle for four sentences and will feature them on the front page if well done.]Oh Socketsite. You are the eternal “bears” of the real estate market. Good times, bad times, and neutral times…you always look for a negative spin. Are we bitter?

Why not short some REIT’s or buy some bonds?

You make me laugh.

[

Editor’s Note:Sorry, but this isn’t “bearish” (or bullish) and there’s nothing to be bitter about. As noted above, it is what it is and basic math.]Sorry, I am not a fan of apples methodology. It’s a great way to compare on a house to house basis but I can’t see how the market trend could be correctly predicted. The data set will be full of outliers and exceptions that will need to be filtered out leaving very few “true” apples to make a meaningful conclusion.

The problem with statistics is incorrect data analysis which happens all too often. In the example above, they assumed that the population in question is the same as the year before. That assumption is obviously incorrect. Similarly an uptick in YOY sales figures does not mean the market is back when tons of foreclosures and short-sales are sold.

“The problem with statistics is incorrect data analysis which happens all too often.”

Good point, we should just use straight math instead of using statistics.

What’s statistics again?

I do use stastical analysis frequently in my profession, and do not have a problem with Socketsite’s analysis above. He is saying, the increase in median price is due to a change in mix or properties closing much more than a year over year price increase on those properties. I agree.

Now, you can argue if you want that the mix skewing to more upper-end properties closing is a portent of a new bubble. But shooting the messenger (or at least hanging a “bear” label on the messsenger) is stupid when the messenger is factually correct.

I got to admit, I’m surprised how sticky prices have been in San Francisco over these past 18 months. We’ve been through the biggest downturn in history, and it’s still hard to afford a SFH 3/2.5 in a nice part of town.

I’m just afraid of what happens once another several thousand millionaires get rich from Facebook. It’s like it’s a never ending parade of wealth. Google at 400+ doesn’t help affordability either.

Ohwell, at least it’s good the markets and economy is back for us all.

The editor and others seized on median as evidence — without a disclaimer in sight — many, many times over in the past. When the trend was down, that is. Nice to see this site regaining some balance.

[

Editor’s Note:How about a few examples of those “many, many times over” on our part?]In a previous life, I designed fractional factorial experiments for process development studies for a large Pharma (yes, it kind of sucked after a while).

I would argue there is a better way to look at the market than the one discussed above.

In general, I think the most appropriate method, is a case by case one. Many other analysis are confounded, or moot.

Generating a single number to describe a heterogenous data set is hard.

Or as they say on TV, “Kids! Don’t try this at home.”

“What’s statistics again?”

A very useful and well developed branch of mathematics that can provide useful insights if you use it correctly on applicable problems. When used properly it generates both an estimate of the underlying quantity that you are trying to measure and an estimate of how close that estimate could be to the real value.

For instance, if I want to test if flipping a coin has an equal probabilty of coming up heads (0.5) or tails (0.5) I can flip a coin multiple times and use that to estimate the coin’s probabilities.

In 2008 I flip a coin 10 times and it comes up heads 6 times and tails 4 times.

In 2009 I flip the same coin 10 times and it comes up heads 4 times and tails 6 times.

Have I proven that the heads probability has changed from 0.6 to 0.4 over that period? Or are both trials consistent with a probability of 0.5?

Until I calculate the error on those estimates I don’t know. The error on the determination of probability is .154 for each trial of 10 flips. Which means that in 2008 there was a 60% chance that the real value of probability was within the range 0.45 – 0.75 and in 2009 there was a 60% chance that the real value was within the range of 0.25 – 0.55.

So what can I determine about the change in probability over that period? Pretty much nothing. The statistics are too crummy to say much of anything. If instead I had flipped the coin 1000 times in each trial then my error would be .0154 and I could say that in 2008 there was a 60% chance that the real value of probability was within the range 0.585 – 0.615 and in 2009 there was a 60% chance that the real value was within the range of 0.385 – 0.485 and then I would be able to definitively say that there was a change.

Since no one ever publishes the error estimate on their number it’s very hard to know how seriously to take changes in any of these indices.

Editor,

You must know someone with JMP (does it exist anymore? I don’t know), stick your data in a Box Behnken design (just google it) and see if it comes up 3 sigma.

Paul

Hmmmm…. This may be interesting. Who’s got access to JMP? I’ve got the data and the design. Call me.

Need some variables (8 is good):

1. price

2. income demographic

3. nearest school standardized testing result

4. price per square foot

5. time of year

6. cpi

7. dow?

8. view (1-10 scale)

9. proximity to jobs/income

10. walkability (walk score?)

what else? any suggestions?

any suggestions on other designs and confounding?

11. interest rates

12. credit “tightness” (I don’t know how to quantify)

13. fear (again don’t know)

14. greed (commodity Futures?)

15. population trend (maybe same as jobs variable or at least highly correlated).

16. equivalent rents?

17.

Maybe this is too complicated.

Yeah, after all that, you left out the thing that matters most: unemployment rate.

This isn’t an issue of statistical significance or establishing a multivariate predictive model (which would be great). This is basic math with existing data and an understanding of median. And the fact remains that in September there were proportionately more high-cost homes sold in San Francisco than in August.

It’s not bearish, or bullish, or bitter. It is what it is and a point we’ve made before (with respect to how mix alone can move the median). We will admit, however, we’re rather impressed with how well “statistics!” is being used as subterfuge.

J,

true, but most of those variables are highly correlated to unemployment.

Paul

Isolating single-family home and condo sales, we see a 23% decrease in “low-cost” district sales versus a 2% decrease in “high-cost” districts for single-family homes. And for condos it’s a 9% increase in “low-cost” districts versus a 22% increase in “high-cost” districts.

But I thought the high end was in total collapse?

OK, am much confused now.

Either way, mix happens, sometimes, like winter last year it’s biased towards the low end, somtimes the high end.

what is clear is that it’s only ever mentioned on this site when it’s the latter.

true, but most of those variables are highly correlated to unemployment.The problem is that you seem to be obfuscating reality, while the double digit unemployment figure is starring you dead in the face and underscoring why demand is collapsing…

REp, you can have a huge decline in prices for higher-end places, but if the mix shifts to include a higher percentage of them in the total, even at the reduced prices, the median will still increase. Hence, the oft-repeated line here that medians are not prices.

Thanks for the analysis. For those who are confused, what socket site has done is to look at the city at a greater level of resolution than just a single number. This makes sense because market behavior is *very* different in different parts of the city. Peoples access to money, loans, jobs, etc differs across the economic spectrum, and the neighborhoods folks are trying to buy into at different price points is also shifting.

This is really about making the analysis *local*. Just like it is easy for us to hear the news about how badly housing has tanked in california but proudly point to how SF proper has had stickier prices. This type of analysis is just trying to be even more “local” by giving a district by district view.

So, I would love to see the actual district by district numbers from the above analysis. I would find it very interesting to see how the various districts rank and how the mix is changing in each.

thanks!

Hahahhaa. How about the countless times I pointed to the southeastern portion of the city affecting mix? This was the very reason the term “Real SF” was coined by cynics. Please. Try every single time the median report was issued.

REp, you can have a huge decline in prices for higher-end places, but if the mix shifts to include a higher percentage of them in the total, even at the reduced prices, the median will still increase. Hence, the oft-repeated line here that medians are not prices.

I wasn’t talking about a decline in prices..more a huge decline in sales at the higher end reported here by many – which would lead unlimately to a fall in prices.

But that is certainly inconsistent with the mix argument here.

J,

Unemployment is certainly a valuable variable, let’s add it to the mix. It’s kind of a trailing indicator.

P

So, I would love to see the actual district by district numbers from the above analysis.Try following the link to “data for August” as provided above (and from there you can get to the September data set as well).

Breaking things down to look at distributions is the latest way of doing statistical investigations. Folks who are really interested in this might want to take a look at the recently published Flaw of Averages by Sam Savage.

Unemployment is certainly a valuable variable, let’s add it to the mix. It’s kind of a trailing indicator.It’s a trailing indicator for new housing starts, but does NOT trail house values.

Trying to make any sense of district medians is just silly- the monthly volume is too low to make any meaningful conclusions, and property sizes vary too much for any reasonable comparisons. If median $/sq ft were reported it would be *much* more useful (although still flawed).

Median prices are meaningless. They serve no purpose.

Wow, this is a whole lot of analysis and comments to make one simple point …

“Medians are not Prices!”

AMEN SS!