Connect: AIEthics

 

 

What is Big Data? Small / Big Data Material

The material of small data is indistinguishable from the substance of big data. Both compose from numbers that measure (age, height, weight), words that describe (gender, race), directions that map (my phone constantly transmitting my movements), colors that shape (the color of my car, my house, my shirt), sounds that resonate (Amazon’s Alexa hears when I'm scolding my daughter), desires that impel (Google knows my searches), and all the rest that we can render as descriptive and predictive of humans in the world.

Small data / Big data: processing

Small data can be processed biologically; big data must be subjected to tables that organize, formulas that quantify, and then electric algorithms that render.

Big data: definition

Information increasing past human comprehension in three directions: Volume, Velocity, Variety.

 

 

  • Volume: Amount of information. Not a question of number. If you are house hunting and visit six homes in one afternoon, later on you will confuse yourself trying to remember which one had the large, sunny master bedroom. So, in this case six is big data, it's beyond human comprehension. At the other extreme, 6 billion coin-flips is not big data. If we see a spreadsheet showing 2,999,127,184 heads, we conceive how many landed tails without problem. So, big data is not about numbers, even though big data numbers are frequently very big. Instead, it is about the line between how many we can, and cannot manage.
  • Velocity: Speed at which information gathered, organized and applied to lived experience. Big data velocity is not a raw number, instead it is a human limit. For example, a human can understand a Van Gogh painting far more rapidly than any machine, but long division out past a hundred digits bogs us down, but not the big data machine called a calculator.
  • Variety: Information sources multiply: image information gleaned from video cameras synthesizes with naming information associated with facial recognition, and with consumption preferences gleaned from credit card data and so on.

Data pools: transparent, dark

Transparent pools of data derive from explicit exchanges of personal information for services. Instead of paying with cash, you sell a layer of yourself: privacy becomes a currency.

  • Tinder and Enolytics charge for their services, but not dollars, instead, they're paid with streams of knowledge about their users', details about their identity, location, disposition, and more.

 

 

 

The privacy debate surrounding transparent data pools involves informed consent. It's true that users check the small box acknowledging their acceptance of the terms of service before we download the app or before our account is activated, and so we all know, on some level, that we're willingly trading away our personal information, but questions remain:

  • Do users fully understand the extent of their exposure?
    • If they don't, is that on them? If users' lack the patience to carefully read and understand the agreements they claim they've read and understood, then can they reasonably complain when they've signed away more privacy than they thought?
    • Or, are users being manipulated by small print and intentionally impenetrable language? Can users reasonably claim to be duped and exploited?

In dark data pools, third parties purchase information from direct collectors, then combine and process the data into richer profiles, before selling back into the consumer-oriented marketplace.

  • Economic example: The data traded to Tinder (romance), Enolytics (wine), and WebMD (health), in exchange for zero-dollar convenience, is sold for money to Acxiom, and combined with other purchased information before being resold (again for money) to strategic merchants.
  • Human example: You delete your Tinder account. That suggests the app may have accomplished its purpose: you’ve entered a serious romance. Tinder doesn’t care too much about that information because their platform is oriented toward the young and single, but a travel company may want to combine the romantic news with your particular interest in wine. They offer you a romantic trip to an obscure wine-growing region that happens to produce the bottles you love. You may take the trip (with the partner Tinder set up), and it may be a great ten days. Later, another company may be interested in a mix of all that romantic data with some more recent information about your queries to WebMD. That may lead to targeted banner ads – biodegradable diapers, the virtues of Montessori preschool for cognitive development – popping up on your screens.

In both economic and human terms, datasets don’t accumulate as addition, but as multiplication. When information about different aspects of your life gets put together, the emerging profile can be very telling.

The privacy debate surrounding dark data pools involves ownership. When you download an app after trading a layer of privacy for the convenience, are you trading:

  1. A kind of license allowing that specific information to be used by that one platform, perhaps for a limited set of purposes?
  2. The information itself, which now becomes wholly the property of the platform, and may be packaged and resold and used in any way whatsoever?

There's a legal answer to this question (check the small print in the service agreement), but the ethical question involves the relation between individuals and the information describing them.

It could be argued that the bond between me and my data resembles the one between me and my biological life: the data can't be entirely separated from what it means to be me. If that's persuasive, then I maintain some claim on my personal information regardless of the boxes I’ve checked. Like my life will always ultimately be mine, so too the data that describes my life.

By contrast, if the information is conceived as something I fabricate, like a carpenter makes a chair, then when the data is sold, whoever acquires it bears no responsibility to the originator. (It has never happened that a carpenter barged into a client's house and asked for his chair back because the owner decided to paint it a different color.)

There is a curious middle ground. Sometimes artists will object to uses made of their works, and architects will object to remodeling efforts they view as crudely destructive of their buildings. Probably there's no legal force behind the protests, but they may find traction in human terms.

From small to big data: threshold

The elementary content of small data is indistinguishable from big data (numbers, words, directions, colors, sounds, desires). But, as the volume of information and the velocity of processing surge past human comprehension, the experience of the material changes.

The historical analogy is Zeno's paradox, but the contemporary comparison is the movement from still images to video. When a string of images are racked and flipped at a rate of 30 frames per second, what we see is not a vast number of individual pictures at a very high speed. Instead, something radically different. The single video is other than the frames; it’s not an evolution, it's revolution, a different kind of vision and reality.

There’s a threshold: the volume and speed of the individual images increases until the sequence collapses back into the unity. Multiple pictures become a single video.

Similarly in the movement from small to big data: it's not more of the same, only faster. It’s a threshold. Big data doesn't evolve from small data experience, instead it's a leap into a different reality.

Same material / Different realities = Threshold

 

Small data, big data, art

Small data is information humans can comprehend.

There is one exception: art. Any single true piece of art is small data bursting with significance that exceeds human comprehension: it always escapes full understanding, that's why we can always return and see something new each time.

So, along one vector, art can be partly defined as this paradox: small data that surges to big data, but without the addition of superhuman velocity/volume/variety.

Same material / Different realities