Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

Blending the informed analysis of The Signal and the Noise with the instructive iconoclasm of Think Like a Freak, a fascinating, illuminating, and witty look at what the vast amounts of information now instantly available to us reveals about ourselves and our world—provided we ask the right questions.By the end of an average day in the early twenty-first century, human bei...

DownloadRead Online
Title:Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
Author:Seth Stephens-Davidowitz
Rating:
Edition Language:English

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Reviews

  • Lori

    When sociologist ask people if they waste food, people give the only correct answer. It's wrong to waste food.

    When sociologist survey the contents of the same people's garbage, they get a more accurate answer.

    Just imagine how much more information is available trolling through internet searches.

  • Will Byrnes

    There’s lies, damned lies and then there are statistics. One must won

    There’s lies, damned lies and then there are statistics. One must wonder. Do the lies get bigger as the datasets grow? Seth Stephens-Davidowitz posits that the availability of vast sums of new data not only allows researchers to make better predictions, but offers them never-before-available tools that can offer insight that direct questioning never could.

    We have seen steps up of this type before. Malcolm Gladwell has made a career of such, with

    ,

    , and

    .

    is the one I would expect most folks would know. Nate Silver put his data expertise into

    . All these looks at data and how we interpret it rely on the analyst, regardless, pretty much, of the data. While the same might be true of Stephens-Davidowitz’s approach, he focuses on the availability of materials that have not been there in the past. The smarts that must be applied to get the most interesting results can now be applied to new oceans of data. It is more possible than it has ever been to draw inferences and actually test them out.

    In addition to the

    of data that is now available, there is the

    . The author looks at Google and FB data for evidence of underlying realities. Surveys can sometimes offer inaccurate outcomes, when the people being queried do not provide honest answers.

    . But one can look at what people enter into Google to get a sense of possible racism by geographic area.

    Looking for queries on jokes involving the N-Word, for example, turns out to yield a telling portrait of anti-black sentiment, which also correlates with lower black life expectancy. (And pro-Trump vote totals)

    We are treated to looks into a variety of research subjects, from picking the ponies, to seeing what

    interests/concerns people sexually, looking for patterns of child abuse, selecting the best wine, using the texts of a vast number of books and movie scripts to come up with six simple plot structures.

    I thought the most interesting piece was on the use of associations, and provoking curiosity, rather than relying on overt statements to influence how people feel about a different group of people. Another was on using a data comparison of one’s (anonymous) medical information to others who share many characteristics to improve medical diagnoses.

    There are some areas in which it was not entirely persuasive that the methodology in question was tracking what was claimed. SS-D sees in searches of Pornhub, for example,

    Really? I expect that what people check out on-line does not necessarily track with what might be of interest in real life. It would be like someone with an interest in mysteries being thought to have homicidal tendencies after searching for a variety of homicide related titles. Should a writer doing research into a dark subject like child pornography, human trafficking or cannibalism expect the heavy knock of the police on his/her door? Where is the line between an academic or titillation search and one made for planning?

    SS-D makes a point about there being a significant difference between searches that offer projections for groups or areas, and their inapplicability for predicting individual behavior, although that will not necessarily remain the case. In baseball, for example, the explosion of available information may very well be applied to specific players to diagnose and even correct flaws in technique, or recognize patterns that might expose underlying medical issues, or predict their arrival. The Big Data related here is much more macro, looking at group proclivities. Useful for spotting trends, measuring public sentiment, but in more detail than has been heretofore possible.

    And of course there is the impact of dark players. Those with the resources and motivation could manipulate the Big Data produced by Google and Facebook. Such players would not necessarily be limited to Russian cyber-spies and pranksters, but corporate and ideological players as well, like

    . There could have been a bit more in here on those concerns.

    The book offers plenty of anecdotal bits that could have been lifted from any of the other data books noted at the top of this review. What one needs, ultimately is smart, insightful analysis. Having all the data in the world (that means you, NSA) is merely a burden unless there is someone insightful enough to figure out the right questions to ask, and how to ask them.

    SS-D notes several Google (Trends, Ngrams, Correlate) services that might be familiar to folks doing actual research, but which were news to me. It might be useful to check out some of these, maybe even come up with meaningful queries to shed light on pressing, or even completely frivolous questions.

    Not all problems can be solved, or even examined by the addition of ever more data. Sometimes, many times, the information that is available is perfectly sufficient to the task, but other factors prevent the joining together of its various pieces to create a meaningful whole. The now classic example is from 9/11, when an absence of coordination between the CIA and FBI resulted in suicide bombers who could have been foiled succeeding in their mission. Politics and the culture of nations and organizations figure into how data is used

    So if everybody lies, is Seth Stephens-Davidowitz telling us the truth? I am sure there is a query one could construct that would look at diverse data sources, pull them all together and give us a fuller picture, but for now, we will have to make do with reading his book and articles, checking out his videos, applying the analytical tools already incorporated into our brains, and seeing if there is enough information there with which to come to a well-grounded conclusion. And that’s no lie.

    Review posted – May 5, 2017

    Publication date – May 9, 2017

    =============================

    Links to the author’s

    ,

    , and

    pages

    VIDEOS – SS-D speaking

    -----

    -----

    - Arts & Ideas at the JCCSF

    -----

    - The Julis-Rabinowitz Center for Public Policy and Finance

    The June 2017

    cover story has particular relevance to the treatment of actual truth in today's political environment. It is illuminating, if not exactly uplifting. -

    - By Yudhijit Bhattacharjee

    July 12, 2017 - Washington Post - one of the very serious applications of big data -

    - by Philip Bump

    July 15, 2017 - One of the ways big data gets compromised is via automated dishonesty -

    by Tim Wu - Thanks to Henry B for letting us know about the article

  • David

    This is an engaging book about how big data can be used to improve our understanding of human behavior, thinking, emotions, and preference. The basic idea is that if you ask people about their behavior or their preferences in surveys, even anonymous surveys, they will often lie. People do not like to admit to low-brow preferences; racists do not want to admit to their prejudices, most people who watch pornography do not want to admit to it, and even voting is often misrepresented; some people wh

    This is an engaging book about how big data can be used to improve our understanding of human behavior, thinking, emotions, and preference. The basic idea is that if you ask people about their behavior or their preferences in surveys, even anonymous surveys, they will often lie. People do not like to admit to low-brow preferences; racists do not want to admit to their prejudices, most people who watch pornography do not want to admit to it, and even voting is often misrepresented; some people who voted for Trump would not admit to it.

    But, by analyzing immense datasets from Google, public archives, social media, and the like, Seth Stephens-Davidowitz has been able to unearth a lot of fascinating answers to puzzling questions. For example, he is able to predict, through Google searches for various symptoms, who is likely to have early stages of pancreatic cancer. He can predict epidemic breakouts of some contagious diseases well before they are announced by the CDC (Center for Disease Control). He shows that the single factor that correlates with voting for Trump is that of racism.

    Then there are the fun factoids, about the sorts of things that people search for most often on Google. Most commonly, the search "Is my son ..." is followed by "gifted", while the search "Is my daughter ..." is followed by "overweight". That tells us something about stereotypes for the way people think about their children. Interestingly, the release of a new violent movie in a city is correlated with a

    in violent crime in that city. Perhaps the reason is that violent people who are watching the movie are not out on the streets, committing crimes.

    And here we get to the main problem with this sort of analysis. Undoubtedly, the research and analysis of big datasets is done correctly. However, once a surprising result is found, understanding the motivations behind the online activity are often subjective and open to interpretation. While this book is very careful about its underlying assumptions, it is a slippery road to getting the correct interpretations and explanations.

    This is an easy, well-paced book that should appeal to anybody who enjoys books like

    .

  • Trish

    Maybe everyone does lie. But they don’t lie all the time. Stephens-Davidowitz makes the good point that asking people directly doesn’t always, in fact may not often, yield true answers. People have their own reasons for answering pollsters untruthfully, but it is clear that this is a documented fact. People sometimes lie to pollsters.

    Stephens-Davidowitz was told by mentors and advisors not to consider Google searches worthwhile data, but the more he looked at it, the more he was convinced that G

    Maybe everyone does lie. But they don’t lie all the time. Stephens-Davidowitz makes the good point that asking people directly doesn’t always, in fact may not often, yield true answers. People have their own reasons for answering pollsters untruthfully, but it is clear that this is a documented fact. People sometimes lie to pollsters.

    Stephens-Davidowitz was told by mentors and advisors not to consider Google searches worthwhile data, but the more he looked at it, the more he was convinced that Google searches contained the best data for determining what people are concerned about. He has uncovered some interesting trends that are not apparent through direct questioning because people are sometimes ashamed of their fears, feelings, prejudices, and predilections.

    I didn’t really like this book. Partly the reason is because I listened to it, and Stephens-Davidowitz gives charts, graphs, data points that obviously cannot be represented in the audio version. These usually help me to grasp things easily and maybe bypass pages of material that is not as interesting to me. It wasn’t that his material was hard, it was that I oftentimes did not like what he was talking about. He had a tendency to focus on deviant behavior, e.g., sexual predators, abuse, porn, etc. One might make the argument that these behaviors are important to understand and therefore worth looking at. Possibly. However, if ‘everybody lies,’ one might make the argument that we do not have to look at deviance to find untruthfulness.

    What we discover is that to test Stephens-Davidowitz’s thesis that ‘everybody lies,’ we have to spend quite a lot of time with statistics and creating studies, or as he is wont to do, studying big data. Big data probably irons out discrepancies in the reasons for our Google searches, e.g., that it is not me that is interested in the herpes virus, it is my brother, because in the end it doesn’t matter why we did the search; what matters is that we did the search. Besides, maybe I’m lying about my brother having the virus, but my interest in the topic is not a lie.

    Stephens-Davidowitz has made a career so far out of the study of big data, showing us ways to slice and dice it so that it is useful to our view of the world. Only thing is, I am not as interested in what big data tells us as he is. He’d trained as an economist, and towards the end of the book he hit a couple of areas I did find more interesting, like the notion of regression discontinuity, a term used to describe a statistical tool created to measure the outcomes of people very close to some arbitrary cut-off.** S-D talks about using this tool on federal inmates, discovering criminals treated more harshly committed more crimes upon their release. But S-D also studied students on either side of the admissions cut-off for the prestigious Stuyvesant High School: those who attended Stuyvesant did not have a significant performance difference in later life than students who did not.

    Apparently Stephens-Davidowitz went into data science because of

    , the bestselling book by Steven D. Levitt. He believes that many of the next generation of scientists in every field will be data scientists. I did finish the audiobook, another study he took note of in the last pages. Apparently few readers finish ‘treatises’ by economists. He believes this is his big contribution to our knowledge base, and there is no doubt his contrariness did highlight ways big data can be used effectively.

    If I may be so bold, I might be able to suggest a reason why many female readers may not be as interested in the material presented, or in Stephens-Davidowitz himself (he was/is apparently looking for a girlfriend). Stay away from the deviant sex stuff, Seth. It may interest you but I can guarantee that fewer women are going to find that appealing or reassuring conversation or reading material.

    An interesting corollary to this economists’ data view is the question of whether the truth matters, which is how I came to pick up this book. Recently on PBS’ The Third Rail with Ozy, Carlos Watson asked whether the truth matters. At first blush the answer seems obvious, and two sides debated this question. One side said of course truth matters…but most of us know one man’s truth to be another man’s lie. The other side said ‘everybody lies.’ It got me to thinking…I do think the two ways of coming to the notion of lying dovetail at some point, and one has to conclude that truth may not matter as much as we think. What matters is what we believe to be true.

    Finally, it appears Stephens-Davidson agrees to some degree with Cathy O'Neill, author of

    , in that he agrees you best not let algorithms run without human tweaking and interference. The best outcomes are delivered when humans apply their particular observations and knowledge and expertise

    big data.

    ** S-D describes it this way:

  • Jessica

    This book tries too hard to be Freakonomics. The first two parts are full of random examples of interesting but mostly pointless things that can learned via Google search trends. However, a whole lot of assumptions are made off these bits of data that don't seem to have much basis in factual scientific methods of research. Unprofessional jokes are thrown in randomly. If you need a footnote to explain why a joke was not homophobic maybe you should have just skipped the joke. And any book of less

    This book tries too hard to be Freakonomics. The first two parts are full of random examples of interesting but mostly pointless things that can learned via Google search trends. However, a whole lot of assumptions are made off these bits of data that don't seem to have much basis in factual scientific methods of research. Unprofessional jokes are thrown in randomly. If you need a footnote to explain why a joke was not homophobic maybe you should have just skipped the joke. And any book of less than 300 pages of text should not need to use the same example three times, especially when it's about how the author can't believe women are concerned about the smell of their vagina.

    The last section of the book explains the limitations big data holds and is really the most grounded section, the rest being almost hagiography. It would have done a lot to work the third section into the examples of the first two sections. It would have balanced out the praise and also would have done much to explain the flaws present in some of the examples included.

    Some cool facts buried in a lot of murky oddness.

    Disclaimer: I was given this book in a Goodreads giveaway.

WISE BOOK is in no way intended to support illegal activity. Use it at your risk. We uses Search API to find books/manuals but doesn´t host any files. All document files are the property of their respective owners. Please respect the publisher and the author for their copyrighted creations. If you find documents that should not be here please report them


©2018 WISE BOOK - All rights reserved.