CIS 2050 Lecture Notes - Lecture 6: Big Data, Data Science, Data Warehouse
Describe the rationale of privacy in society and the trade-offs
between privacy and information exchange
1.
Explain the big data technology and some of the design in
extracting information from aggregate patterns
2.
Identify the scenarios when privacy can be degraded and
describe their impact of the consequences on society
3.
Learning Outcomes:
Privacy of information is of a serious concern to society,
especially with the development of many innovative designs of
computer algorithms in extracting personal information from
the massively available data
!
The government, corporations and individuals have
responsibilities to ensure that privacy of information is
protected
!
Essentially, the method of "big data uses aggregate data to
extract underlying patterns
○
The information of the patterns can be interpreted and
used by decision-makers to make more informed
decisions
○
The big data and data mining technology can be used in
marketing product preferences, identifying voting groups
in elections and identifying consumer habits
○
Recent developments in big data and data mining technology
have opened a big window that may infringe on the individual
privacy
!
3D printing technologies that copy many existing designs
may also affect innovations in manufacturing
○
A serious privacy concern is related to how data are used in
social media such as Facebook, and search log data
!
The applications of big data analysis may not be fully
appreciated by the general public
!
Key Points:
Recently been a call for new professional role (data
scientist) to implement and diffuse analytic
methodologies into and across organizations
○
There is a concern that there will not be enough of these
new professionals to meet the growing demand for this
analytics speciality
○
Academic community is promoting an emerging view of
analytics
!
Ex. Information processing has become increasingly more
powerful and flexible, with faster and higher-capacity
storage and networks
○
Globalization and other competitive factors have exerted
strong pressures to improve efficiencies and
effectiveness, and to strengthen business and customer
relationships
○
Each successive stage of this competition requires more
data and more analysis to support strategic, managerial
and operational decision making
○
Quest for more and better analytics technology and this
technology in turn helps to make competition more
intense
○
More effective analytics enables a higher degree of
competition which creates further imperatives to make
analytics more effective
○
In wave after disruptive wave of technological and
organizational change, business leaders face a host of powerful
forces
!
Ex. Correlations, cluster analysis, filtering, decision trees,
Bayesian analysis, neural network analysis, regression
analysis, textual analysis…etc are all in the analytics
arsenal and none of these is particularly new
○
Software and data complexities can impede effective
analysis, and interpreting the results of complex analyses
accurately can be potentially perilously misleading
○
Change appears to be essentially incremental and does not
embody any fundamental paradigm shifts
○
New analytics employ essentially the same mutlivariate
inferential and descriptive statistical methods and mathematical
modeling techniques that have long been used by businesses for
analyzing data to support instances of complex decision making
!
Database management
!
Data warehousing
!
Data mining
!
Dashboards
!
Associated technologies
!
Essential tools and techniques for dealing with big data
include:
○
Concepts including data ambiguity, data filtering, data
context, data interruption, data conversion and data
redundancy are not new
○
Aside from a few progressive data management
refinements, none of this is new
○
Big data is characterized by vast collections of variously
structures (even unstructured) data that, when appropriately
rationalized, can provide understanding and insight into various
issues that reside embedded within that data
!
Big data consists of expansive collections of data = large
volumes
○
Updated quickly and frequently = high velocity
○
Exhibit huge range of different formats and content =
wide variety
○
Big data is different due to: volume, velocity and variety (the
three V's)
!
Big data has been accelerated by the growth of the web
!
Technologies for collecting, manipulating, transmitting
and analyzing data for a long time
○
Technologies have reached and are surpassing a capacity
threshold for processing and storing data that is swamping
conventional levels of organizational ability to cope with
volumes of data being generated
○
Large scale analytics are critical for both sustaining
business competitiveness and enhancing day-to-day
decision making
○
Application of business analytics methods leads to
improvement in a organization's overall decision-making
capacity, which enhances its ability to conduct its
business intelligently
○
Businesses are experiencing ever-expanding cycles of change
caused by the interaction of competitive forces and the
harnessing of analytics and big data technologies as essential
competitive weaponry
!
Usage threshold is that of utter dependence upon timely
information for basic competitive viability
○
Since the advent of computing (and networking) as a
profession, computing individuals believe that this technology
over time is destined to eventually become wholly integrated
into every operating and managerial function in every part of
every organization
!
Read: Beyond data and analysis
Warren and Brandeis: there is a "right to be left alone"
based on the principle of "inviolate personality"
○
Privacy debate has co-evolved with the development of
information technology
○
The first refers to the freedom to make one's own
decisions without interference by others in regard to
matters seen and intimate and personal
!
The second is concerned with interest of individuals
in exercising control over access to information
about themselves
!
A distinction can be made between (1) constitutional (or
decisional) privacy and (2) tort (or informational privacy)
○
Information about oneself
"
Situations in which others could acquire
information about oneself
"
Technology that can be used to generate,
process or disseminate information about
oneself
"
Informational privacy in a normative sense refers
typically to a non-absolute moral right of persons to
have direct or indirect control over access to…
!
Statements about privacy can be either descriptive or
normative
○
First is held by many in the IT and R&D
Industries: we have zero privacy in the digital
age and there is no way we can protect it
"
Second: our privacy is more important than
ever and we must attempt to protect it
"
There are two reactions to the flood of new
technology and its impact on personal information
and technology
!
Debates about privacy are almost always revolving
around new technology, ranging from genetics and the
extensive study of biomarkers, brain imaging, drones,
wearable sensors/ sensor networks, social media, smart
phones, closed circuit television, government
cybersecurity programs, direct marketing, RFID tags, Big
Data, head-mounted displays and search engines
○
Value of privacy is reducible to these other values
or sources of value
!
Proposal mention property rights, security,
autonomy, intimacy or friendship
!
Hold that the importance of privacy should be
explained and its meaning clarified in terms of
those other values and sources of value
!
*opposing view holds that privacy is valuable in
itself and its value and importance are not derived
from other considerations
!
Reductionist accounts argue that privacy claims are really
about other values and other things that matter from a
moral point of view
○
Notion of privacy is analyzed primarily in terms of
knowledge or other epistemic states
!
Subject
"
Set of propositions
"
Set of individuals
"
Three arguments:
!
A new type of privacy account has been proposed in
relation to new information technology, that
acknowledges that there is a cluster of related moral
claims (cluster accounts) underlying appeals to privacy,
but maintains that there is no single essential core of
privacy concerns
○
First conceptualizes issues of information privacy
in terms of 'data protection' and the second in terms
of 'privacy
!
There is a difference between the US and European
approach
○
Referential use = type of use that is made on the
basis of a acquaintance relationship of the speaker
with the object of his knowledge
!
Vs non-referential use
!
Personal data = data that can be linked with a natural
person
○
Prevention of harm
!
Information inequality
!
Information injustice and discrimination
!
Encroachment on moral autonomy
!
Moral reasons for protecting personal data:
○
Basic moral principle underlying these laws is the
requirement of informed consent for processing by
the data subject
!
Processing of personal information requires that its
purpose be specified, its use be limited, individuals
be notified and allowed to correct inaccuracies, and
the holder of the data to be accountable to oversight
authorities
!
Challenge with privacy in the 21st century is to
assure that technology is designed in such a way
that it incorporates privacy requirements in the
software, architecture, infrastructure, and work
processes in a way that makes privacy violations
unlikely to occur
!
Data protection laws are in force in almost all countries
○
Conceptions of privacy and the value of privacy:
!
Information technology = automated systems for storing,
processing and distributing information
○
Rapid changes have increased the need for careful
consideration of the desirability of effects
○
As connectivity increases access to information, it also
increases the possibility for agents to act based on new
sources of information
○
Cookies = small pieces of data that web sites store
on the user's computer, in order to enable
personalization of the site
!
Some cookies can be used to track the use across
multiple sites (=tracking cookies)
!
Major theme in the discussion of Internet privacy
revolves around use of cookies
○
In cloud computing, both data and programs are
online and it is not always clear what the user-
generated and system-generated data are used for
!
Recent development of cloud computing increases the
many privacy concerns
○
Not only data explicitly entered by the user, but
also numerous statistics on user behaviour
!
Data mining can be employed to extract patterns
from such data, which can then be used to make
decisions about the user
!
Big Data may used in profiling the user, creating
patterns of typical combinations of user properties,
which can then be used to predict interests and
behaviour
!
All of this data could be used to profile
citizens
"
How to obtain permission when the
user does not explicitly engage in a
transaction
!
How to prevent "function creep" (data
being used for different purposes after
they are collected)
!
Specific challenges:
"
Concern could arise from genetics data
"
Similarly, data may be collected when shopping,
when be recorded by surveillance cameras, or when
used smartcard-based pubic transport payment
systems
!
Users generate loads of data when online
○
"reconfigurable technology" that handles personal
data raises the question of user knowledge of the
configuration
!
Cell phones typically contain a range of data-generating
sensors, including GPS, movement sensors, and cameras,
and may transmit the resulting data via networks
○
Many devices contain chips or are connected to the
Internet of Things
!
Radio frequency chips can be read from a lipid
distance
!
EU and US passports have RFID chips with
protected biometric data
!
"smart" RFIDs are also embedded in public
transport payment systems
!
"dumb" RFIDs basically only containing a number,
appear in many kinds of products as a replacement
of the barcode and for use in logistics
!
Such devices generate statistics and can be
used for mining and profiling
"
Ambient intelligence and ubiquitous
computing, along with the Internet of Things,
also enable automatic adaptation of the
environment to the user, based on explicitly
preferences and implicit observations
"
In the home there are smart meters for
automatically reading and sending electricity
consumption, and thermostats that can be remotely
controlled by the owner
!
Devices connected to the internet are not limited to user-
owned computing devices
○
Ex. Biometric passports, online e-government
services, coting systems, a variety of online citizen
participation tools and platforms or online access to
recordings of sessions of parliament and
government committee meetings
!
Government and public administration have undergone
radical transformations as a result of the availability of
advanced IT systems
○
Impact of information technology on privacy:
!
Provides set of rules and guidelines for designing a
system with a certain value in mind (such as
privacy)
!
Value Sensitive Design provides a "theoretically
grounded approach to the design of technology that
accounts for human values in a principled and
comprehensive manner throughout the design process"
○
Data protection needs to be viewed in
proactive rather than reactive terms
"
Provides high-level guidelines in the form of seven
principles for designing privacy-preserving systems
!
Privacy by Design approach specifically focuses on
privacy
○
Privacy Impact Assessment approach proposes "a
systematic process for evaluating the potential effects on
privacy of a project, initiative or proposed system or
scheme"
○
Payment Card Industry Data Security: gives clear
guidelines for privacy and security sensitive
systems design in the domain of the credit card
industry and its partners
!
Various International Organization for
Standardization (ISO standards: serve as a source of
best practices and guidelines
!
EU Data Protection Directive are based on Fair
Information Practices --> transparency, purpose,
proportionality, access, transfer
!
There are several industry guidelines that can be used to
design privacy preserving IT systems
○
Ex. Privacy Coach supports customers in making
privacy decisions when confronted with RFID tags
!
Specific solutions to privacy problems aim at increasing
the level of awareness and consent of the user
○
Allow users to anonymously browse the web
or share content
"
Employ a number of cryptographic
techniques and security protocols in order to
ensure their goal of anonymous
communication
"
Use the property that numerous users use the
system at the same time which provides k-
anonymity
"
Downside: susceptible to an attack
where the anonymity of the user is no
longer guaranteed
!
Tor: messages are encrypted and routed
among numerous different computers,
thereby obscuring the original sender of the
message
"
Freenet: content is stored in encrypted form
from all users of the system
"
*could be infected by a Trojan horse
that monitors all communication and
knows identity of user
!
Provides plausible deniability and privacy
"
Ex. Communication-anonymizing tools (Tor,
Freenet), and identity-management systems
!
Another tool for providing anonymity is the
anonymization of data through special software
!
Growing number of software tools are available that
provide some form of privacy for their users = privacy
enhancing technologies
○
Modern cryptographic techniques are essential in
any IT system
!
Various techniques exists for searching through
encrypted data, which provides a form of privacy
protection and selective access to sensitive data
!
Allows data processor to process encrypted
data
"
Original user can then again decrypt the
result and use it without revealing any
personal data to the data processor
"
Could be used to aggregate encrypted data
thereby allowing both privacy protection and
useful aggregate information
"
New technique used for designing privacy-
preservation systems = 'homomorphic encryption'
!
Cryptography has been used as a means to protect data
○
Companies can gather a large amount of data
and build detailed profile of users
"
Profiling becomes even easier if the profile
information is combined with other
techniques such as implicit authentication via
cookies and tracking cookies
"
Requiring a direct link between online and 'real
world' identities is problematic from a privacy
perspective, because they allow profiling of users
!
Users could no longer be tracked to different
services because they can use different
attributes to access different services, which
makes it difficult to trace online identities
over multiple transactions
"
From a privacy perspective a better solution would
be the use of attribute-based authentication which
allows access of online services based on the
attributes of users
!
'Single sign on' frameworks provided by independent
third parties make it easy for users to connect to
numerous online services using a single online identity
○
How can information technology itself solve privacy concerns:
!
Computers are connected directly to the brain, not
only behavioural characteristics are subject to
privacy considerations, but even one's thoughts
might run the risk of becoming public
!
It could become possible to change one's behaviour
by means of this technology
!
Brain-computer interfaces:
○
Oversharing may become an accepted practice
within certain groups
!
Technological changes influence the privacy norms
themselves
○
It may be more feasible to protect privacy by
transparency
!
Question: is it feasible to protect privacy by trying to hide
information from parties who may use it in undesirable
ways?
○
Could be used to impose restrictions at a regulatory
level, in combination with or as an alternative to
empowering users, thereby potentially contributing
to the prevention of moral or informational
overload on the users site
!
Challenges lie in its translation to social effects and
social sustainability
!
Precaution principle might have a role in dealing with
emerging information technologies
○
Ex. Effects of social network sites on friendship,
and the verifiability of results of electronic
elections
!
Note that not all social effect of information privacy
concern privacy
○
Emerging technologies and our understanding of privacy:
!
Read: Privacy and Information Technology
When combined with the Supreme Court of Canada's recent
decisions that emphasized the importance of fair dealing as
users' rights, the law now features considerable flexibility that
allows Canadians to make greater use of works without prior
permission or fear of liability
!
New Bill C-11
!
Law also legalizes format shifting and the creation
of backup copies
!
Will be helpful for those seeking to digitize content,
transfer content to portable devices, or create
backups to guard against accidental deletion or data
loss
!
Ex. Time shifting (or the recording of television shows) is
now legal in Canada
○
Law now features a wide range of user-orientated provisions
that legalize common activities
!
Five previous ones: research, private study, news
reporting, criticism
○
Law now features considerable flexibility that allows
Canadians to make greater use of works without prior
permission or fear of liability
○
Scope of fair dealing has been expanded with the addition of
three new purposes: education, satire and parody
!
Provision is often referred to as the "Youtube exception"
○
Law also includes a user-generated content provision that
establishes a legal safe harbour for creators of non-commercial
user-generated content (such as remixed music, mashup videos
or home movies with commercial music in the background)
!
Some exceptions to this prohibition include the ability to
circumvent the digital lock to protect personal
information, unlock a cellphone or access content if the
person has a perceptual disability
○
Most significant new restriction involves the controversial
digital lock rules that prohibit bypassing technological
protections found on DVDs, software and electronic books
!
Ex. Law now includes a cap of $5000 for all non-
commercial infringement
○
Change reduces likelihood of lawsuits against individuals
for non-commercial activities (including unauthorized
downloading or mistaken reliance on fair dealing)
○
Law generally tries to target genuinely "bad actors" while
leaving individuals alone
!
Allows rights holders to send notifications alleging
infringement on Internet providers, who must forward the
notices to their subscribers
○
Approach to unauthorized downloading is now centers on a
"notice-and-notice" system
!
Canadian digital lock rules are among the most restrictive
in the world, but do not carry significant penalties for
individuals
○
It is not a infringement to possess tools or software that
can be used to circumvent digital locks
○
Liability is limited to actual damages in non-commercial
cases
○
Circumventing a digital lock raises different legal issues
!
Read: What new copyright law means to you
Do fundamentally different things
○
More data allows us to see "new", "better" and "different"
!
Effective use of big data can be extremely useful
!
Copying, searching, processing and sharing information is
currently very easy
!
Stationary/static --> fluid/dynamic
!
Car fatigue or posture could also be datafied
○
Putting things into data (ex. Location has been "dataified")
!
--> more information
!
Throw data at problem and make computer figure it out
itself
○
More data --> increasing accuracy of prediction
○
Ex. Biopsy of cancer cells --> determined 3 additional
signs
○
Most impressive areas is in the area of machine learning
(branch of artificial intelligence)
!
Predictive policing
○
Algorithms could predict what we are about to do
○
Challenge: safeguarding free will
○
It will improve our lives but also has consequences
!
May completely eliminate jobs
○
Transform how we live, work and think
○
Big data is going to steal our jobs in the same way that factory
automation did
!
Humanity can learn from the information we can collect
!
Watch: Big data is better data
Vast majority is put up by average users (reviews,
youtube)
○
Has become a more interactive place
○
First decade of the web was a static place -it is now more
dynamic with rise of social media and social networks
!
Allows one to create online persona with little technical
skill
○
Put a lot of personal information online --> behavioural
and demographic information
○
Can create models to predict attributes of individuals
○
Facebook can 1.2 billion users/month
!
Target has purchase history for thousands of customers
○
Has pregnancy score --> purchases such as vitamins, size
of purses, etc.
○
Target -sent advertisements to 15 year old girl about baby
supplies before she even told her parents she was pregnant
!
Political preference
○
Personality score
○
Gender
○
Sexual orientation
○
Age
○
Intelligence
○
How much you trust the people you know and how strong
the relationships are
○
Patterns of behaviour detected from millions of users, allows
one to make predictions about individual circumstance or
behaviour
!
Liking page for curly fries
!
*content is irrelevant to attribute
!
Propagated through network of similar people
!
Indicative of high intelligence:
○
People are friends with people like them (well
established)
○
Know a lot about how information spreads (similar to
disease)
○
Study: looked at facebook likes to determine these attributes
(and other things)
!
Revenue models for most social media companies
rely on exploiting or sharing user data in some way
!
Facebook: users are not the customers, but the
product
!
Policy and law --> users control their own data
○
Allow people to encrypt data they upload
○
Users don’t have much control on how this data is used
!
Users should be informed and consenting
!
Watch: Curly fry conundrum: why social media "likes" say more than
you might think
Internet has become a zone of mass, indiscriminate surveillance
!
Only "bad" people seek out privacy (narrow conception)
○
Only those who challenge power have something to
worry about
!
Implicit bargain: if you're willing to render yourself
harmless to political power, then you can be free of the
dangers of surveillance
○
Debate of being monitored: good vs bad individuals
!
Privacy is no longer a "social norm"
!
Make judgements of what we are willing to let other
people know
○
Make decisions of what is expected by others
!
Behaviour changes when we believe we are being
watched (more conformist and compliant)
○
Everyone has something to hide
!
Could watch the students but the students could not see
into tower (do not know when they are being watched)
○
Therefore, the individuals will always act like they are
being watched
○
Jeremy Benthem: architectural design of the paniopticon
(initially used in prisons) was applied to educational institutions
in the 18th century
!
Mass surveillance creates a "prison in the mind"
○
This is more effective than brute force
○
This mindset was the key means of societal control
!
Never have a private moment --> obedience
○
Orwell 1984: warns that we could be watched at any given
moment
!
Creativity, descent, and exclusivity can occur in isolated
places away from being monitored --> essence of human
freedom
○
A society that breeds conformity, obedience and submission
!
"he that does not move does not notice its chains"
○
System of mass surveillance limits us
!
Watch: Why privacy matters
Tattoos tell a lot of stories
!
Programs we use store our data and tell our story, similar to a
tattoo
!
Facial recognition
○
Track our movements, clothing choices…etc
!
Face.org --> 18 billion faces online
○
= electronic tattoos
!
--> immortality through our electronic tattoo
!
Constantly changing reputation
○
Do not look too far into past of people we love
○
Narcissus: don't fall in love with your own reflection
○
Lessons learned from Greek mythology:
!
Watch: Your online life, permanent as a tattoo
"big data" refers to a newly aware scenario when
enormous volumes of data are available, either publicly or
discreetly
○
The types of data may be in many forms and formats, and
may even be stored in different places
○
"Big data" technology refers to a collection of
computational techniques developed to process and
analyze these diverse data interpretable or actionable
consequences
○
What is "big data"? What are some of the characteristics of "big
data" technology?
1.
Cryptography is the study of methods that encrypt private
information to conceal it from being viewed by anyone
other than the intended recipient
○
Using cryptography can prevent malicious attackers from
getting sensitive information such as credit card numbers
or passwords
○
What is cryptography, and why should people use it?2.
Questions:
Privacy and Information Technology
#$%&'()*+,-./&%)&*, 01+,2304
5654,78
Describe the rationale of privacy in society and the trade-offs
between privacy and information exchange
1.
Explain the big data technology and some of the design in
extracting information from aggregate patterns
2.
Identify the scenarios when privacy can be degraded and
describe their impact of the consequences on society
3.
Learning Outcomes:
Privacy of information is of a serious concern to society,
especially with the development of many innovative designs of
computer algorithms in extracting personal information from
the massively available data
!
The government, corporations and individuals have
responsibilities to ensure that privacy of information is
protected
!
Essentially, the method of "big data uses aggregate data to
extract underlying patterns
○
The information of the patterns can be interpreted and
used by decision-makers to make more informed
decisions
○
The big data and data mining technology can be used in
marketing product preferences, identifying voting groups
in elections and identifying consumer habits
○
Recent developments in big data and data mining technology
have opened a big window that may infringe on the individual
privacy
!
3D printing technologies that copy many existing designs
may also affect innovations in manufacturing
○
A serious privacy concern is related to how data are used in
social media such as Facebook, and search log data
!
The applications of big data analysis may not be fully
appreciated by the general public
!
Key Points:
Recently been a call for new professional role (data
scientist) to implement and diffuse analytic
methodologies into and across organizations
○
There is a concern that there will not be enough of these
new professionals to meet the growing demand for this
analytics speciality
○
Academic community is promoting an emerging view of
analytics
!
Ex. Information processing has become increasingly more
powerful and flexible, with faster and higher-capacity
storage and networks
○
Globalization and other competitive factors have exerted
strong pressures to improve efficiencies and
effectiveness, and to strengthen business and customer
relationships
○
Each successive stage of this competition requires more
data and more analysis to support strategic, managerial
and operational decision making
○
Quest for more and better analytics technology and this
technology in turn helps to make competition more
intense
○
More effective analytics enables a higher degree of
competition which creates further imperatives to make
analytics more effective
○
In wave after disruptive wave of technological and
organizational change, business leaders face a host of powerful
forces
!
Ex. Correlations, cluster analysis, filtering, decision trees,
Bayesian analysis, neural network analysis, regression
analysis, textual analysis…etc are all in the analytics
arsenal and none of these is particularly new
○
Software and data complexities can impede effective
analysis, and interpreting the results of complex analyses
accurately can be potentially perilously misleading
○
Change appears to be essentially incremental and does not
embody any fundamental paradigm shifts
○
New analytics employ essentially the same mutlivariate
inferential and descriptive statistical methods and mathematical
modeling techniques that have long been used by businesses for
analyzing data to support instances of complex decision making
!
Database management
!
Data warehousing
!
Data mining
!
Dashboards
!
Associated technologies
!
Essential tools and techniques for dealing with big data
include:
○
Concepts including data ambiguity, data filtering, data
context, data interruption, data conversion and data
redundancy are not new
○
Aside from a few progressive data management
refinements, none of this is new
○
Big data is characterized by vast collections of variously
structures (even unstructured) data that, when appropriately
rationalized, can provide understanding and insight into various
issues that reside embedded within that data
!
Big data consists of expansive collections of data = large
volumes
○
Updated quickly and frequently = high velocity
○
Exhibit huge range of different formats and content =
wide variety
○
Big data is different due to: volume, velocity and variety (the
three V's)
!
Big data has been accelerated by the growth of the web
!
Technologies for collecting, manipulating, transmitting
and analyzing data for a long time
○
Technologies have reached and are surpassing a capacity
threshold for processing and storing data that is swamping
conventional levels of organizational ability to cope with
volumes of data being generated
○
Large scale analytics are critical for both sustaining
business competitiveness and enhancing day-to-day
decision making
○
Application of business analytics methods leads to
improvement in a organization's overall decision-making
capacity, which enhances its ability to conduct its
business intelligently
○
Businesses are experiencing ever-expanding cycles of change
caused by the interaction of competitive forces and the
harnessing of analytics and big data technologies as essential
competitive weaponry
!
Usage threshold is that of utter dependence upon timely
information for basic competitive viability
○
Since the advent of computing (and networking) as a
profession, computing individuals believe that this technology
over time is destined to eventually become wholly integrated
into every operating and managerial function in every part of
every organization
!
Read: Beyond data and analysis
Warren and Brandeis: there is a "right to be left alone"
based on the principle of "inviolate personality"
○
Privacy debate has co-evolved with the development of
information technology
○
The first refers to the freedom to make one's own
decisions without interference by others in regard to
matters seen and intimate and personal
!
The second is concerned with interest of individuals
in exercising control over access to information
about themselves
!
A distinction can be made between (1) constitutional (or
decisional) privacy and (2) tort (or informational privacy)
○
Information about oneself
"
Situations in which others could acquire
information about oneself
"
Technology that can be used to generate,
process or disseminate information about
oneself
"
Informational privacy in a normative sense refers
typically to a non-absolute moral right of persons to
have direct or indirect control over access to…
!
Statements about privacy can be either descriptive or
normative
○
First is held by many in the IT and R&D
Industries: we have zero privacy in the digital
age and there is no way we can protect it
"
Second: our privacy is more important than
ever and we must attempt to protect it
"
There are two reactions to the flood of new
technology and its impact on personal information
and technology
!
Debates about privacy are almost always revolving
around new technology, ranging from genetics and the
extensive study of biomarkers, brain imaging, drones,
wearable sensors/ sensor networks, social media, smart
phones, closed circuit television, government
cybersecurity programs, direct marketing, RFID tags, Big
Data, head-mounted displays and search engines
○
Value of privacy is reducible to these other values
or sources of value
!
Proposal mention property rights, security,
autonomy, intimacy or friendship
!
Hold that the importance of privacy should be
explained and its meaning clarified in terms of
those other values and sources of value
!
*opposing view holds that privacy is valuable in
itself and its value and importance are not derived
from other considerations
!
Reductionist accounts argue that privacy claims are really
about other values and other things that matter from a
moral point of view
○
Notion of privacy is analyzed primarily in terms of
knowledge or other epistemic states
!
Subject
"
Set of propositions
"
Set of individuals
"
Three arguments:
!
A new type of privacy account has been proposed in
relation to new information technology, that
acknowledges that there is a cluster of related moral
claims (cluster accounts) underlying appeals to privacy,
but maintains that there is no single essential core of
privacy concerns
○
First conceptualizes issues of information privacy
in terms of 'data protection' and the second in terms
of 'privacy
!
There is a difference between the US and European
approach
○
Referential use = type of use that is made on the
basis of a acquaintance relationship of the speaker
with the object of his knowledge
!
Vs non-referential use
!
Personal data = data that can be linked with a natural
person
○
Prevention of harm
!
Information inequality
!
Information injustice and discrimination
!
Encroachment on moral autonomy
!
Moral reasons for protecting personal data:
○
Basic moral principle underlying these laws is the
requirement of informed consent for processing by
the data subject
!
Processing of personal information requires that its
purpose be specified, its use be limited, individuals
be notified and allowed to correct inaccuracies, and
the holder of the data to be accountable to oversight
authorities
!
Challenge with privacy in the 21st century is to
assure that technology is designed in such a way
that it incorporates privacy requirements in the
software, architecture, infrastructure, and work
processes in a way that makes privacy violations
unlikely to occur
!
Data protection laws are in force in almost all countries
○
Conceptions of privacy and the value of privacy:
!
Information technology = automated systems for storing,
processing and distributing information
○
Rapid changes have increased the need for careful
consideration of the desirability of effects
○
As connectivity increases access to information, it also
increases the possibility for agents to act based on new
sources of information
○
Cookies = small pieces of data that web sites store
on the user's computer, in order to enable
personalization of the site
!
Some cookies can be used to track the use across
multiple sites (=tracking cookies)
!
Major theme in the discussion of Internet privacy
revolves around use of cookies
○
In cloud computing, both data and programs are
online and it is not always clear what the user-
generated and system-generated data are used for
!
Recent development of cloud computing increases the
many privacy concerns
○
Not only data explicitly entered by the user, but
also numerous statistics on user behaviour
!
Data mining can be employed to extract patterns
from such data, which can then be used to make
decisions about the user
!
Big Data may used in profiling the user, creating
patterns of typical combinations of user properties,
which can then be used to predict interests and
behaviour
!
All of this data could be used to profile
citizens
"
How to obtain permission when the
user does not explicitly engage in a
transaction
!
How to prevent "function creep" (data
being used for different purposes after
they are collected)
!
Specific challenges:
"
Concern could arise from genetics data
"
Similarly, data may be collected when shopping,
when be recorded by surveillance cameras, or when
used smartcard-based pubic transport payment
systems
!
Users generate loads of data when online
○
"reconfigurable technology" that handles personal
data raises the question of user knowledge of the
configuration
!
Cell phones typically contain a range of data-generating
sensors, including GPS, movement sensors, and cameras,
and may transmit the resulting data via networks
○
Many devices contain chips or are connected to the
Internet of Things
!
Radio frequency chips can be read from a lipid
distance
!
EU and US passports have RFID chips with
protected biometric data
!
"smart" RFIDs are also embedded in public
transport payment systems
!
"dumb" RFIDs basically only containing a number,
appear in many kinds of products as a replacement
of the barcode and for use in logistics
!
Such devices generate statistics and can be
used for mining and profiling
"
Ambient intelligence and ubiquitous
computing, along with the Internet of Things,
also enable automatic adaptation of the
environment to the user, based on explicitly
preferences and implicit observations
"
In the home there are smart meters for
automatically reading and sending electricity
consumption, and thermostats that can be remotely
controlled by the owner
!
Devices connected to the internet are not limited to user-
owned computing devices
○
Ex. Biometric passports, online e-government
services, coting systems, a variety of online citizen
participation tools and platforms or online access to
recordings of sessions of parliament and
government committee meetings
!
Government and public administration have undergone
radical transformations as a result of the availability of
advanced IT systems
○
Impact of information technology on privacy:
!
Provides set of rules and guidelines for designing a
system with a certain value in mind (such as
privacy)
!
Value Sensitive Design provides a "theoretically
grounded approach to the design of technology that
accounts for human values in a principled and
comprehensive manner throughout the design process"
○
Data protection needs to be viewed in
proactive rather than reactive terms
"
Provides high-level guidelines in the form of seven
principles for designing privacy-preserving systems
!
Privacy by Design approach specifically focuses on
privacy
○
Privacy Impact Assessment approach proposes "a
systematic process for evaluating the potential effects on
privacy of a project, initiative or proposed system or
scheme"
○
Payment Card Industry Data Security: gives clear
guidelines for privacy and security sensitive
systems design in the domain of the credit card
industry and its partners
!
Various International Organization for
Standardization (ISO standards: serve as a source of
best practices and guidelines
!
EU Data Protection Directive are based on Fair
Information Practices --> transparency, purpose,
proportionality, access, transfer
!
There are several industry guidelines that can be used to
design privacy preserving IT systems
○
Ex. Privacy Coach supports customers in making
privacy decisions when confronted with RFID tags
!
Specific solutions to privacy problems aim at increasing
the level of awareness and consent of the user
○
Allow users to anonymously browse the web
or share content
"
Employ a number of cryptographic
techniques and security protocols in order to
ensure their goal of anonymous
communication
"
Use the property that numerous users use the
system at the same time which provides k-
anonymity
"
Downside: susceptible to an attack
where the anonymity of the user is no
longer guaranteed
!
Tor: messages are encrypted and routed
among numerous different computers,
thereby obscuring the original sender of the
message
"
Freenet: content is stored in encrypted form
from all users of the system
"
*could be infected by a Trojan horse
that monitors all communication and
knows identity of user
!
Provides plausible deniability and privacy
"
Ex. Communication-anonymizing tools (Tor,
Freenet), and identity-management systems
!
Another tool for providing anonymity is the
anonymization of data through special software
!
Growing number of software tools are available that
provide some form of privacy for their users = privacy
enhancing technologies
○
Modern cryptographic techniques are essential in
any IT system
!
Various techniques exists for searching through
encrypted data, which provides a form of privacy
protection and selective access to sensitive data
!
Allows data processor to process encrypted
data
"
Original user can then again decrypt the
result and use it without revealing any
personal data to the data processor
"
Could be used to aggregate encrypted data
thereby allowing both privacy protection and
useful aggregate information
"
New technique used for designing privacy-
preservation systems = 'homomorphic encryption'
!
Cryptography has been used as a means to protect data
○
Companies can gather a large amount of data
and build detailed profile of users
"
Profiling becomes even easier if the profile
information is combined with other
techniques such as implicit authentication via
cookies and tracking cookies
"
Requiring a direct link between online and 'real
world' identities is problematic from a privacy
perspective, because they allow profiling of users
!
Users could no longer be tracked to different
services because they can use different
attributes to access different services, which
makes it difficult to trace online identities
over multiple transactions
"
From a privacy perspective a better solution would
be the use of attribute-based authentication which
allows access of online services based on the
attributes of users
!
'Single sign on' frameworks provided by independent
third parties make it easy for users to connect to
numerous online services using a single online identity
○
How can information technology itself solve privacy concerns:
!
Computers are connected directly to the brain, not
only behavioural characteristics are subject to
privacy considerations, but even one's thoughts
might run the risk of becoming public
!
It could become possible to change one's behaviour
by means of this technology
!
Brain-computer interfaces:
○
Oversharing may become an accepted practice
within certain groups
!
Technological changes influence the privacy norms
themselves
○
It may be more feasible to protect privacy by
transparency
!
Question: is it feasible to protect privacy by trying to hide
information from parties who may use it in undesirable
ways?
○
Could be used to impose restrictions at a regulatory
level, in combination with or as an alternative to
empowering users, thereby potentially contributing
to the prevention of moral or informational
overload on the users site
!
Challenges lie in its translation to social effects and
social sustainability
!
Precaution principle might have a role in dealing with
emerging information technologies
○
Ex. Effects of social network sites on friendship,
and the verifiability of results of electronic
elections
!
Note that not all social effect of information privacy
concern privacy
○
Emerging technologies and our understanding of privacy:
!
Read: Privacy and Information Technology
When combined with the Supreme Court of Canada's recent
decisions that emphasized the importance of fair dealing as
users' rights, the law now features considerable flexibility that
allows Canadians to make greater use of works without prior
permission or fear of liability
!
New Bill C-11
!
Law also legalizes format shifting and the creation
of backup copies
!
Will be helpful for those seeking to digitize content,
transfer content to portable devices, or create
backups to guard against accidental deletion or data
loss
!
Ex. Time shifting (or the recording of television shows) is
now legal in Canada
○
Law now features a wide range of user-orientated provisions
that legalize common activities
!
Five previous ones: research, private study, news
reporting, criticism
○
Law now features considerable flexibility that allows
Canadians to make greater use of works without prior
permission or fear of liability
○
Scope of fair dealing has been expanded with the addition of
three new purposes: education, satire and parody
!
Provision is often referred to as the "Youtube exception"
○
Law also includes a user-generated content provision that
establishes a legal safe harbour for creators of non-commercial
user-generated content (such as remixed music, mashup videos
or home movies with commercial music in the background)
!
Some exceptions to this prohibition include the ability to
circumvent the digital lock to protect personal
information, unlock a cellphone or access content if the
person has a perceptual disability
○
Most significant new restriction involves the controversial
digital lock rules that prohibit bypassing technological
protections found on DVDs, software and electronic books
!
Ex. Law now includes a cap of $5000 for all non-
commercial infringement
○
Change reduces likelihood of lawsuits against individuals
for non-commercial activities (including unauthorized
downloading or mistaken reliance on fair dealing)
○
Law generally tries to target genuinely "bad actors" while
leaving individuals alone
!
Allows rights holders to send notifications alleging
infringement on Internet providers, who must forward the
notices to their subscribers
○
Approach to unauthorized downloading is now centers on a
"notice-and-notice" system
!
Canadian digital lock rules are among the most restrictive
in the world, but do not carry significant penalties for
individuals
○
It is not a infringement to possess tools or software that
can be used to circumvent digital locks
○
Liability is limited to actual damages in non-commercial
cases
○
Circumventing a digital lock raises different legal issues
!
Read: What new copyright law means to you
Do fundamentally different things
○
More data allows us to see "new", "better" and "different"
!
Effective use of big data can be extremely useful
!
Copying, searching, processing and sharing information is
currently very easy
!
Stationary/static --> fluid/dynamic
!
Car fatigue or posture could also be datafied
○
Putting things into data (ex. Location has been "dataified")
!
--> more information
!
Throw data at problem and make computer figure it out
itself
○
More data --> increasing accuracy of prediction
○
Ex. Biopsy of cancer cells --> determined 3 additional
signs
○
Most impressive areas is in the area of machine learning
(branch of artificial intelligence)
!
Predictive policing
○
Algorithms could predict what we are about to do
○
Challenge: safeguarding free will
○
It will improve our lives but also has consequences
!
May completely eliminate jobs
○
Transform how we live, work and think
○
Big data is going to steal our jobs in the same way that factory
automation did
!
Humanity can learn from the information we can collect
!
Watch: Big data is better data
Vast majority is put up by average users (reviews,
youtube)
○
Has become a more interactive place
○
First decade of the web was a static place -it is now more
dynamic with rise of social media and social networks
!
Allows one to create online persona with little technical
skill
○
Put a lot of personal information online --> behavioural
and demographic information
○
Can create models to predict attributes of individuals
○
Facebook can 1.2 billion users/month
!
Target has purchase history for thousands of customers
○
Has pregnancy score --> purchases such as vitamins, size
of purses, etc.
○
Target -sent advertisements to 15 year old girl about baby
supplies before she even told her parents she was pregnant
!
Political preference
○
Personality score
○
Gender
○
Sexual orientation
○
Age
○
Intelligence
○
How much you trust the people you know and how strong
the relationships are
○
Patterns of behaviour detected from millions of users, allows
one to make predictions about individual circumstance or
behaviour
!
Liking page for curly fries
!
*content is irrelevant to attribute
!
Propagated through network of similar people
!
Indicative of high intelligence:
○
People are friends with people like them (well
established)
○
Know a lot about how information spreads (similar to
disease)
○
Study: looked at facebook likes to determine these attributes
(and other things)
!
Revenue models for most social media companies
rely on exploiting or sharing user data in some way
!
Facebook: users are not the customers, but the
product
!
Policy and law --> users control their own data
○
Allow people to encrypt data they upload
○
Users don’t have much control on how this data is used
!
Users should be informed and consenting
!
Watch: Curly fry conundrum: why social media "likes" say more than
you might think
Internet has become a zone of mass, indiscriminate surveillance
!
Only "bad" people seek out privacy (narrow conception)
○
Only those who challenge power have something to
worry about
!
Implicit bargain: if you're willing to render yourself
harmless to political power, then you can be free of the
dangers of surveillance
○
Debate of being monitored: good vs bad individuals
!
Privacy is no longer a "social norm"
!
Make judgements of what we are willing to let other
people know
○
Make decisions of what is expected by others
!
Behaviour changes when we believe we are being
watched (more conformist and compliant)
○
Everyone has something to hide
!
Could watch the students but the students could not see
into tower (do not know when they are being watched)
○
Therefore, the individuals will always act like they are
being watched
○
Jeremy Benthem: architectural design of the paniopticon
(initially used in prisons) was applied to educational institutions
in the 18th century
!
Mass surveillance creates a "prison in the mind"
○
This is more effective than brute force
○
This mindset was the key means of societal control
!
Never have a private moment --> obedience
○
Orwell 1984: warns that we could be watched at any given
moment
!
Creativity, descent, and exclusivity can occur in isolated
places away from being monitored --> essence of human
freedom
○
A society that breeds conformity, obedience and submission
!
"he that does not move does not notice its chains"
○
System of mass surveillance limits us
!
Watch: Why privacy matters
Tattoos tell a lot of stories
!
Programs we use store our data and tell our story, similar to a
tattoo
!
Facial recognition
○
Track our movements, clothing choices…etc
!
Face.org --> 18 billion faces online
○
= electronic tattoos
!
--> immortality through our electronic tattoo
!
Constantly changing reputation
○
Do not look too far into past of people we love
○
Narcissus: don't fall in love with your own reflection
○
Lessons learned from Greek mythology:
!
Watch: Your online life, permanent as a tattoo
"big data" refers to a newly aware scenario when
enormous volumes of data are available, either publicly or
discreetly
○
The types of data may be in many forms and formats, and
may even be stored in different places
○
"Big data" technology refers to a collection of
computational techniques developed to process and
analyze these diverse data interpretable or actionable
consequences
○
What is "big data"? What are some of the characteristics of "big
data" technology?
1.
Cryptography is the study of methods that encrypt private
information to conceal it from being viewed by anyone
other than the intended recipient
○
Using cryptography can prevent malicious attackers from
getting sensitive information such as credit card numbers
or passwords
○
What is cryptography, and why should people use it?2.
Questions:
Privacy and Information Technology
#$%&'()*+,-./&%)&*, 01+,2304 5654,78
Describe the rationale of privacy in society and the trade-offs
between privacy and information exchange
1.
Explain the big data technology and some of the design in
extracting information from aggregate patterns
2.
Identify the scenarios when privacy can be degraded and
describe their impact of the consequences on society
3.
Learning Outcomes:
Privacy of information is of a serious concern to society,
especially with the development of many innovative designs of
computer algorithms in extracting personal information from
the massively available data
!
The government, corporations and individuals have
responsibilities to ensure that privacy of information is
protected
!
Essentially, the method of "big data uses aggregate data to
extract underlying patterns
○
The information of the patterns can be interpreted and
used by decision-makers to make more informed
decisions
○
The big data and data mining technology can be used in
marketing product preferences, identifying voting groups
in elections and identifying consumer habits
○
Recent developments in big data and data mining technology
have opened a big window that may infringe on the individual
privacy
!
3D printing technologies that copy many existing designs
may also affect innovations in manufacturing
○
A serious privacy concern is related to how data are used in
social media such as Facebook, and search log data
!
The applications of big data analysis may not be fully
appreciated by the general public
!
Key Points:
Recently been a call for new professional role (data
scientist) to implement and diffuse analytic
methodologies into and across organizations
○
There is a concern that there will not be enough of these
new professionals to meet the growing demand for this
analytics speciality
○
Academic community is promoting an emerging view of
analytics
!
Ex. Information processing has become increasingly more
powerful and flexible, with faster and higher-capacity
storage and networks
○
Globalization and other competitive factors have exerted
strong pressures to improve efficiencies and
effectiveness, and to strengthen business and customer
relationships
○
Each successive stage of this competition requires more
data and more analysis to support strategic, managerial
and operational decision making
○
Quest for more and better analytics technology and this
technology in turn helps to make competition more
intense
○
More effective analytics enables a higher degree of
competition which creates further imperatives to make
analytics more effective
○
In wave after disruptive wave of technological and
organizational change, business leaders face a host of powerful
forces
!
Ex. Correlations, cluster analysis, filtering, decision trees,
Bayesian analysis, neural network analysis, regression
analysis, textual analysis…etc are all in the analytics
arsenal and none of these is particularly new
○
Software and data complexities can impede effective
analysis, and interpreting the results of complex analyses
accurately can be potentially perilously misleading
○
Change appears to be essentially incremental and does not
embody any fundamental paradigm shifts
○
New analytics employ essentially the same mutlivariate
inferential and descriptive statistical methods and mathematical
modeling techniques that have long been used by businesses for
analyzing data to support instances of complex decision making
!
Database management
!
Data warehousing
!
Data mining
!
Dashboards
!
Associated technologies
!
Essential tools and techniques for dealing with big data
include:
○
Concepts including data ambiguity, data filtering, data
context, data interruption, data conversion and data
redundancy are not new
○
Aside from a few progressive data management
refinements, none of this is new
○
Big data is characterized by vast collections of variously
structures (even unstructured) data that, when appropriately
rationalized, can provide understanding and insight into various
issues that reside embedded within that data
!
Big data consists of expansive collections of data = large
volumes
○
Updated quickly and frequently = high velocity
○
Exhibit huge range of different formats and content =
wide variety
○
Big data is different due to: volume, velocity and variety (the
three V's)
!
Big data has been accelerated by the growth of the web
!
Technologies for collecting, manipulating, transmitting
and analyzing data for a long time
○
Technologies have reached and are surpassing a capacity
threshold for processing and storing data that is swamping
conventional levels of organizational ability to cope with
volumes of data being generated
○
Large scale analytics are critical for both sustaining
business competitiveness and enhancing day-to-day
decision making
○
Application of business analytics methods leads to
improvement in a organization's overall decision-making
capacity, which enhances its ability to conduct its
business intelligently
○
Businesses are experiencing ever-expanding cycles of change
caused by the interaction of competitive forces and the
harnessing of analytics and big data technologies as essential
competitive weaponry
!
Usage threshold is that of utter dependence upon timely
information for basic competitive viability
○
Since the advent of computing (and networking) as a
profession, computing individuals believe that this technology
over time is destined to eventually become wholly integrated
into every operating and managerial function in every part of
every organization
!
Read: Beyond data and analysis
Warren and Brandeis: there is a "right to be left alone"
based on the principle of "inviolate personality"
○
Privacy debate has co-evolved with the development of
information technology
○
The first refers to the freedom to make one's own
decisions without interference by others in regard to
matters seen and intimate and personal
!
The second is concerned with interest of individuals
in exercising control over access to information
about themselves
!
A distinction can be made between (1) constitutional (or
decisional) privacy and (2) tort (or informational privacy)
○
Information about oneself
"
Situations in which others could acquire
information about oneself
"
Technology that can be used to generate,
process or disseminate information about
oneself
"
Informational privacy in a normative sense refers
typically to a non-absolute moral right of persons to
have direct or indirect control over access to…
!
Statements about privacy can be either descriptive or
normative
○
First is held by many in the IT and R&D
Industries: we have zero privacy in the digital
age and there is no way we can protect it
"
Second: our privacy is more important than
ever and we must attempt to protect it
"
There are two reactions to the flood of new
technology and its impact on personal information
and technology
!
Debates about privacy are almost always revolving
around new technology, ranging from genetics and the
extensive study of biomarkers, brain imaging, drones,
wearable sensors/ sensor networks, social media, smart
phones, closed circuit television, government
cybersecurity programs, direct marketing, RFID tags, Big
Data, head-mounted displays and search engines
○
Value of privacy is reducible to these other values
or sources of value
!
Proposal mention property rights, security,
autonomy, intimacy or friendship
!
Hold that the importance of privacy should be
explained and its meaning clarified in terms of
those other values and sources of value
!
*opposing view holds that privacy is valuable in
itself and its value and importance are not derived
from other considerations
!
Reductionist accounts argue that privacy claims are really
about other values and other things that matter from a
moral point of view
○
Notion of privacy is analyzed primarily in terms of
knowledge or other epistemic states
!
Subject
"
Set of propositions
"
Set of individuals
"
Three arguments:
!
A new type of privacy account has been proposed in
relation to new information technology, that
acknowledges that there is a cluster of related moral
claims (cluster accounts) underlying appeals to privacy,
but maintains that there is no single essential core of
privacy concerns
○
First conceptualizes issues of information privacy
in terms of 'data protection' and the second in terms
of 'privacy
!
There is a difference between the US and European
approach
○
Referential use = type of use that is made on the
basis of a acquaintance relationship of the speaker
with the object of his knowledge
!
Vs non-referential use
!
Personal data = data that can be linked with a natural
person
○
Prevention of harm
!
Information inequality
!
Information injustice and discrimination
!
Encroachment on moral autonomy
!
Moral reasons for protecting personal data:
○
Basic moral principle underlying these laws is the
requirement of informed consent for processing by
the data subject
!
Processing of personal information requires that its
purpose be specified, its use be limited, individuals
be notified and allowed to correct inaccuracies, and
the holder of the data to be accountable to oversight
authorities
!
Challenge with privacy in the 21st century is to
assure that technology is designed in such a way
that it incorporates privacy requirements in the
software, architecture, infrastructure, and work
processes in a way that makes privacy violations
unlikely to occur
!
Data protection laws are in force in almost all countries
○
Conceptions of privacy and the value of privacy:
!
Information technology = automated systems for storing,
processing and distributing information
○
Rapid changes have increased the need for careful
consideration of the desirability of effects
○
As connectivity increases access to information, it also
increases the possibility for agents to act based on new
sources of information
○
Cookies = small pieces of data that web sites store
on the user's computer, in order to enable
personalization of the site
!
Some cookies can be used to track the use across
multiple sites (=tracking cookies)
!
Major theme in the discussion of Internet privacy
revolves around use of cookies
○
In cloud computing, both data and programs are
online and it is not always clear what the user-
generated and system-generated data are used for
!
Recent development of cloud computing increases the
many privacy concerns
○
Not only data explicitly entered by the user, but
also numerous statistics on user behaviour
!
Data mining can be employed to extract patterns
from such data, which can then be used to make
decisions about the user
!
Big Data may used in profiling the user, creating
patterns of typical combinations of user properties,
which can then be used to predict interests and
behaviour
!
All of this data could be used to profile
citizens
"
How to obtain permission when the
user does not explicitly engage in a
transaction
!
How to prevent "function creep" (data
being used for different purposes after
they are collected)
!
Specific challenges:
"
Concern could arise from genetics data
"
Similarly, data may be collected when shopping,
when be recorded by surveillance cameras, or when
used smartcard-based pubic transport payment
systems
!
Users generate loads of data when online
○
"reconfigurable technology" that handles personal
data raises the question of user knowledge of the
configuration
!
Cell phones typically contain a range of data-generating
sensors, including GPS, movement sensors, and cameras,
and may transmit the resulting data via networks
○
Many devices contain chips or are connected to the
Internet of Things
!
Radio frequency chips can be read from a lipid
distance
!
EU and US passports have RFID chips with
protected biometric data
!
"smart" RFIDs are also embedded in public
transport payment systems
!
"dumb" RFIDs basically only containing a number,
appear in many kinds of products as a replacement
of the barcode and for use in logistics
!
Such devices generate statistics and can be
used for mining and profiling
"
Ambient intelligence and ubiquitous
computing, along with the Internet of Things,
also enable automatic adaptation of the
environment to the user, based on explicitly
preferences and implicit observations
"
In the home there are smart meters for
automatically reading and sending electricity
consumption, and thermostats that can be remotely
controlled by the owner
!
Devices connected to the internet are not limited to user-
owned computing devices
○
Ex. Biometric passports, online e-government
services, coting systems, a variety of online citizen
participation tools and platforms or online access to
recordings of sessions of parliament and
government committee meetings
!
Government and public administration have undergone
radical transformations as a result of the availability of
advanced IT systems
○
Impact of information technology on privacy:
!
Provides set of rules and guidelines for designing a
system with a certain value in mind (such as
privacy)
!
Value Sensitive Design provides a "theoretically
grounded approach to the design of technology that
accounts for human values in a principled and
comprehensive manner throughout the design process"
○
Data protection needs to be viewed in
proactive rather than reactive terms
"
Provides high-level guidelines in the form of seven
principles for designing privacy-preserving systems
!
Privacy by Design approach specifically focuses on
privacy
○
Privacy Impact Assessment approach proposes "a
systematic process for evaluating the potential effects on
privacy of a project, initiative or proposed system or
scheme"
○
Payment Card Industry Data Security: gives clear
guidelines for privacy and security sensitive
systems design in the domain of the credit card
industry and its partners
!
Various International Organization for
Standardization (ISO standards: serve as a source of
best practices and guidelines
!
EU Data Protection Directive are based on Fair
Information Practices --> transparency, purpose,
proportionality, access, transfer
!
There are several industry guidelines that can be used to
design privacy preserving IT systems
○
Ex. Privacy Coach supports customers in making
privacy decisions when confronted with RFID tags
!
Specific solutions to privacy problems aim at increasing
the level of awareness and consent of the user
○
Allow users to anonymously browse the web
or share content
"
Employ a number of cryptographic
techniques and security protocols in order to
ensure their goal of anonymous
communication
"
Use the property that numerous users use the
system at the same time which provides k-
anonymity
"
Downside: susceptible to an attack
where the anonymity of the user is no
longer guaranteed
!
Tor: messages are encrypted and routed
among numerous different computers,
thereby obscuring the original sender of the
message
"
Freenet: content is stored in encrypted form
from all users of the system
"
*could be infected by a Trojan horse
that monitors all communication and
knows identity of user
!
Provides plausible deniability and privacy
"
Ex. Communication-anonymizing tools (Tor,
Freenet), and identity-management systems
!
Another tool for providing anonymity is the
anonymization of data through special software
!
Growing number of software tools are available that
provide some form of privacy for their users = privacy
enhancing technologies
○
Modern cryptographic techniques are essential in
any IT system
!
Various techniques exists for searching through
encrypted data, which provides a form of privacy
protection and selective access to sensitive data
!
Allows data processor to process encrypted
data
"
Original user can then again decrypt the
result and use it without revealing any
personal data to the data processor
"
Could be used to aggregate encrypted data
thereby allowing both privacy protection and
useful aggregate information
"
New technique used for designing privacy-
preservation systems = 'homomorphic encryption'
!
Cryptography has been used as a means to protect data
○
Companies can gather a large amount of data
and build detailed profile of users
"
Profiling becomes even easier if the profile
information is combined with other
techniques such as implicit authentication via
cookies and tracking cookies
"
Requiring a direct link between online and 'real
world' identities is problematic from a privacy
perspective, because they allow profiling of users
!
Users could no longer be tracked to different
services because they can use different
attributes to access different services, which
makes it difficult to trace online identities
over multiple transactions
"
From a privacy perspective a better solution would
be the use of attribute-based authentication which
allows access of online services based on the
attributes of users
!
'Single sign on' frameworks provided by independent
third parties make it easy for users to connect to
numerous online services using a single online identity
○
How can information technology itself solve privacy concerns:
!
Computers are connected directly to the brain, not
only behavioural characteristics are subject to
privacy considerations, but even one's thoughts
might run the risk of becoming public
!
It could become possible to change one's behaviour
by means of this technology
!
Brain-computer interfaces:
○
Oversharing may become an accepted practice
within certain groups
!
Technological changes influence the privacy norms
themselves
○
It may be more feasible to protect privacy by
transparency
!
Question: is it feasible to protect privacy by trying to hide
information from parties who may use it in undesirable
ways?
○
Could be used to impose restrictions at a regulatory
level, in combination with or as an alternative to
empowering users, thereby potentially contributing
to the prevention of moral or informational
overload on the users site
!
Challenges lie in its translation to social effects and
social sustainability
!
Precaution principle might have a role in dealing with
emerging information technologies
○
Ex. Effects of social network sites on friendship,
and the verifiability of results of electronic
elections
!
Note that not all social effect of information privacy
concern privacy
○
Emerging technologies and our understanding of privacy:
!
Read: Privacy and Information Technology
When combined with the Supreme Court of Canada's recent
decisions that emphasized the importance of fair dealing as
users' rights, the law now features considerable flexibility that
allows Canadians to make greater use of works without prior
permission or fear of liability
!
New Bill C-11
!
Law also legalizes format shifting and the creation
of backup copies
!
Will be helpful for those seeking to digitize content,
transfer content to portable devices, or create
backups to guard against accidental deletion or data
loss
!
Ex. Time shifting (or the recording of television shows) is
now legal in Canada
○
Law now features a wide range of user-orientated provisions
that legalize common activities
!
Five previous ones: research, private study, news
reporting, criticism
○
Law now features considerable flexibility that allows
Canadians to make greater use of works without prior
permission or fear of liability
○
Scope of fair dealing has been expanded with the addition of
three new purposes: education, satire and parody
!
Provision is often referred to as the "Youtube exception"
○
Law also includes a user-generated content provision that
establishes a legal safe harbour for creators of non-commercial
user-generated content (such as remixed music, mashup videos
or home movies with commercial music in the background)
!
Some exceptions to this prohibition include the ability to
circumvent the digital lock to protect personal
information, unlock a cellphone or access content if the
person has a perceptual disability
○
Most significant new restriction involves the controversial
digital lock rules that prohibit bypassing technological
protections found on DVDs, software and electronic books
!
Ex. Law now includes a cap of $5000 for all non-
commercial infringement
○
Change reduces likelihood of lawsuits against individuals
for non-commercial activities (including unauthorized
downloading or mistaken reliance on fair dealing)
○
Law generally tries to target genuinely "bad actors" while
leaving individuals alone
!
Allows rights holders to send notifications alleging
infringement on Internet providers, who must forward the
notices to their subscribers
○
Approach to unauthorized downloading is now centers on a
"notice-and-notice" system
!
Canadian digital lock rules are among the most restrictive
in the world, but do not carry significant penalties for
individuals
○
It is not a infringement to possess tools or software that
can be used to circumvent digital locks
○
Liability is limited to actual damages in non-commercial
cases
○
Circumventing a digital lock raises different legal issues
!
Read: What new copyright law means to you
Do fundamentally different things
○
More data allows us to see "new", "better" and "different"
!
Effective use of big data can be extremely useful
!
Copying, searching, processing and sharing information is
currently very easy
!
Stationary/static --> fluid/dynamic
!
Car fatigue or posture could also be datafied
○
Putting things into data (ex. Location has been "dataified")
!
--> more information
!
Throw data at problem and make computer figure it out
itself
○
More data --> increasing accuracy of prediction
○
Ex. Biopsy of cancer cells --> determined 3 additional
signs
○
Most impressive areas is in the area of machine learning
(branch of artificial intelligence)
!
Predictive policing
○
Algorithms could predict what we are about to do
○
Challenge: safeguarding free will
○
It will improve our lives but also has consequences
!
May completely eliminate jobs
○
Transform how we live, work and think
○
Big data is going to steal our jobs in the same way that factory
automation did
!
Humanity can learn from the information we can collect
!
Watch: Big data is better data
Vast majority is put up by average users (reviews,
youtube)
○
Has become a more interactive place
○
First decade of the web was a static place -it is now more
dynamic with rise of social media and social networks
!
Allows one to create online persona with little technical
skill
○
Put a lot of personal information online --> behavioural
and demographic information
○
Can create models to predict attributes of individuals
○
Facebook can 1.2 billion users/month
!
Target has purchase history for thousands of customers
○
Has pregnancy score --> purchases such as vitamins, size
of purses, etc.
○
Target -sent advertisements to 15 year old girl about baby
supplies before she even told her parents she was pregnant
!
Political preference
○
Personality score
○
Gender
○
Sexual orientation
○
Age
○
Intelligence
○
How much you trust the people you know and how strong
the relationships are
○
Patterns of behaviour detected from millions of users, allows
one to make predictions about individual circumstance or
behaviour
!
Liking page for curly fries
!
*content is irrelevant to attribute
!
Propagated through network of similar people
!
Indicative of high intelligence:
○
People are friends with people like them (well
established)
○
Know a lot about how information spreads (similar to
disease)
○
Study: looked at facebook likes to determine these attributes
(and other things)
!
Revenue models for most social media companies
rely on exploiting or sharing user data in some way
!
Facebook: users are not the customers, but the
product
!
Policy and law --> users control their own data
○
Allow people to encrypt data they upload
○
Users don’t have much control on how this data is used
!
Users should be informed and consenting
!
Watch: Curly fry conundrum: why social media "likes" say more than
you might think
Internet has become a zone of mass, indiscriminate surveillance
!
Only "bad" people seek out privacy (narrow conception)
○
Only those who challenge power have something to
worry about
!
Implicit bargain: if you're willing to render yourself
harmless to political power, then you can be free of the
dangers of surveillance
○
Debate of being monitored: good vs bad individuals
!
Privacy is no longer a "social norm"
!
Make judgements of what we are willing to let other
people know
○
Make decisions of what is expected by others
!
Behaviour changes when we believe we are being
watched (more conformist and compliant)
○
Everyone has something to hide
!
Could watch the students but the students could not see
into tower (do not know when they are being watched)
○
Therefore, the individuals will always act like they are
being watched
○
Jeremy Benthem: architectural design of the paniopticon
(initially used in prisons) was applied to educational institutions
in the 18th century
!
Mass surveillance creates a "prison in the mind"
○
This is more effective than brute force
○
This mindset was the key means of societal control
!
Never have a private moment --> obedience
○
Orwell 1984: warns that we could be watched at any given
moment
!
Creativity, descent, and exclusivity can occur in isolated
places away from being monitored --> essence of human
freedom
○
A society that breeds conformity, obedience and submission
!
"he that does not move does not notice its chains"
○
System of mass surveillance limits us
!
Watch: Why privacy matters
Tattoos tell a lot of stories
!
Programs we use store our data and tell our story, similar to a
tattoo
!
Facial recognition
○
Track our movements, clothing choices…etc
!
Face.org --> 18 billion faces online
○
= electronic tattoos
!
--> immortality through our electronic tattoo
!
Constantly changing reputation
○
Do not look too far into past of people we love
○
Narcissus: don't fall in love with your own reflection
○
Lessons learned from Greek mythology:
!
Watch: Your online life, permanent as a tattoo
"big data" refers to a newly aware scenario when
enormous volumes of data are available, either publicly or
discreetly
○
The types of data may be in many forms and formats, and
may even be stored in different places
○
"Big data" technology refers to a collection of
computational techniques developed to process and
analyze these diverse data interpretable or actionable
consequences
○
What is "big data"? What are some of the characteristics of "big
data" technology?
1.
Cryptography is the study of methods that encrypt private
information to conceal it from being viewed by anyone
other than the intended recipient
○
Using cryptography can prevent malicious attackers from
getting sensitive information such as credit card numbers
or passwords
○
What is cryptography, and why should people use it?2.
Questions:
Privacy and Information Technology
#$%&'()*+,-./&%)&*, 01+,2304 5654,78