Skip to main content
replaced http://meta.stackexchange.com/ with https://meta.stackexchange.com/
Source Link
  • An automatic feature is that this accounts for questions which are popular but lack good answers.

  • I chose only to include consider upvotes here, since I consider this the better measure for helpfulness to some extent (also considering downvotes would be more complicated since there may be a relevant number of visitors that can upvote but not downvote).

  • An obvious tweak would be to give more weight to upvotes by the asker as well as the accepted answer (with due consideration to the existence of askers who never upvoted or accepted anything).

  • Another take on this would be to ignore views, views by potential voters and votes happening shortly after the question was posted, with exception of votes and checkmarks by the asker. This way, you would get a better estimate for people who probably had a problem similar to the asker. This would again require a good estimate of the temporal distribution of visits and visits by users who could vote, as described in the first part of this answer. (If you want to go even further, exclude everything happening shortly after the question has been bumped.)

  • Of course, this requires that you have in some way recorded the number of visitors who could vote or at least have a good estimate for it, such as the number of logged-in visitors. If not, the next-best comparable score arguably is the one suggested by Mysticialthe one suggested by Mysticial.

  • An automatic feature is that this accounts for questions which are popular but lack good answers.

  • I chose only to include consider upvotes here, since I consider this the better measure for helpfulness to some extent (also considering downvotes would be more complicated since there may be a relevant number of visitors that can upvote but not downvote).

  • An obvious tweak would be to give more weight to upvotes by the asker as well as the accepted answer (with due consideration to the existence of askers who never upvoted or accepted anything).

  • Another take on this would be to ignore views, views by potential voters and votes happening shortly after the question was posted, with exception of votes and checkmarks by the asker. This way, you would get a better estimate for people who probably had a problem similar to the asker. This would again require a good estimate of the temporal distribution of visits and visits by users who could vote, as described in the first part of this answer. (If you want to go even further, exclude everything happening shortly after the question has been bumped.)

  • Of course, this requires that you have in some way recorded the number of visitors who could vote or at least have a good estimate for it, such as the number of logged-in visitors. If not, the next-best comparable score arguably is the one suggested by Mysticial.

  • An automatic feature is that this accounts for questions which are popular but lack good answers.

  • I chose only to include consider upvotes here, since I consider this the better measure for helpfulness to some extent (also considering downvotes would be more complicated since there may be a relevant number of visitors that can upvote but not downvote).

  • An obvious tweak would be to give more weight to upvotes by the asker as well as the accepted answer (with due consideration to the existence of askers who never upvoted or accepted anything).

  • Another take on this would be to ignore views, views by potential voters and votes happening shortly after the question was posted, with exception of votes and checkmarks by the asker. This way, you would get a better estimate for people who probably had a problem similar to the asker. This would again require a good estimate of the temporal distribution of visits and visits by users who could vote, as described in the first part of this answer. (If you want to go even further, exclude everything happening shortly after the question has been bumped.)

  • Of course, this requires that you have in some way recorded the number of visitors who could vote or at least have a good estimate for it, such as the number of logged-in visitors. If not, the next-best comparable score arguably is the one suggested by Mysticial.

Source Link
Wrzlprmft
  • 28.6k
  • 5
  • 78
  • 153

Some of what I am writing may seem or even clearly is horribly unfeasible – but I will address these issues in the end:


[…] nor can we count just the views on a question page that came in after a given answer was posted.

But you have the times of votes, which are correlated in time with the visits of a user that can vote, which in turn is correlated with overall views. Thus, as a first estimate you can estimate the temporal distribution of views by taking the temporal distribution of votes and just renormalise it by multiplying it with [number of views]/[number of votes]. Two tweaks to this:

  • Only regard the first votes of a user on a post instead of all votes, since this gives a better estimate of the visits of a user capable of voting.

  • The above estimate is certainly not perfect, e.g., I would expect the temporal mean of the distribution of actual views to be somewhat later than that of the distribution of votes, because the first votes come from SE power users and not from people having a similar problem. However, the error due to this is mostly systematic, i.e., it behaves similarly for all questions. Thus in order to correct for it, you only need to look at the actual temporal distribution of views (and votes) for a few questions, not all of them. Even if you do not have any data about this at all, it may suffice to start generating this data now, as the biggest discrepencies between those two distributions will likely occur shortly after the question is asked – or with other words: Those distributions have similar tails.


But, of course, you can deal without estimating the actual temporal distribution of views, because the stat needs (or even should) not estimate how many people viewed a question but to how many people it was useful to some extent. As already mentioned in some of the other answers, the only way we have to evaluate this are votes. Unfortunately, not every visitor is able to vote, but we can use the voters as proxies for the viewers. Given that you can only vote on what’s actually there, this also automatically addresses the problem of late answers to some extent. The resulting stat would be:

                               number of visitors 
score (per question) = ———————————————————————————————————  ×  upvotes
                        number of visitors that could vote

Some remarks on this:

  • An automatic feature is that this accounts for questions which are popular but lack good answers.

  • I chose only to include consider upvotes here, since I consider this the better measure for helpfulness to some extent (also considering downvotes would be more complicated since there may be a relevant number of visitors that can upvote but not downvote).

  • An obvious tweak would be to give more weight to upvotes by the asker as well as the accepted answer (with due consideration to the existence of askers who never upvoted or accepted anything).

  • Another take on this would be to ignore views, views by potential voters and votes happening shortly after the question was posted, with exception of votes and checkmarks by the asker. This way, you would get a better estimate for people who probably had a problem similar to the asker. This would again require a good estimate of the temporal distribution of visits and visits by users who could vote, as described in the first part of this answer. (If you want to go even further, exclude everything happening shortly after the question has been bumped.)

  • Of course, this requires that you have in some way recorded the number of visitors who could vote or at least have a good estimate for it, such as the number of logged-in visitors. If not, the next-best comparable score arguably is the one suggested by Mysticial.


While I am no database expert, I can guess that some of my (and other) suggestions may seem unfeasible due to requiring too many queries. However, it may very well be, that a fraction of an individual user’s post makes up for most of his score. In that case, it suffices to be precise for only those important posts.