Thanks to the smartphone, photography has become pervasive in contemporary digital culture. Yet the smartphone’s very ‘smartness’ profoundly alters the relations of control between humans and technologies in image-production practices. Unlike dedicated cameras, smartphones use built-in sensors for small-scale positioning to ‘sense’ user’s bodily orientations and states of motion. Combined with photographic applications, this ‘sentience’ enables devices to direct user actions and to require user compliance in order to create an image. In this paper, we analyze image-production in three smartphone applications to chart a continuum between two techno-cultural poles.

At one pole smartphone photography accommodates a range of human-technological interactions, including the development of new forms of play and experimentation. At the opposite pole, it executes algorithmically- choreographed sentient photography in which ultimate decisions are made by context-aware learning software, radically reconfiguring the distribution of agency between humans and technologies. The development of sentient photography, we conclude, represents the integration of the photographer’s body itself into platform control of image-production.
Definition of Sentient Photography
Our conceptualization of sentient photography draws on discussions around ‘sentient computing’ and ‘sentient cities’. Sentient computing is, broadly speaking, a computer system which is ‘aware of the physical environment, shifting the onus of understanding from user to machine….The applications know where people and devices are and what devices can do. This definition echoes earlier work on contextually-aware systems, where computation ‘does not occur at a single location in a single situation as in desktop computing, but rather spans a multitude of situations and locations’. Building on these definitions in urban studies, Deal et al. ‘use the term sentience to represent an ability to collect, process, learn, contextualize, and present locally significant information’, particularly in context-aware, adaptable and ubiquitous computational environments that ‘must become more autonomous to reflect the growing ratio of applications to users’. Thanks to pervasive information and communication technologies, sentient systems in urban contexts demonstrate ‘ambient intelligence’, whereby ‘our environment is not a passive backdrop but an active agent in organizing daily lives’.
Several implications follow from these definitions for conceptualizing sentient photography on smart devices. The first is the emphasis on a multiplicity and synchronization of functions (collecting information, processing, presenting etc.) distributed across a number of devices and locations (as well as users). On smartphones this distribution is most clearly manifested through different photographic applications, and the way that they distribute and delegate different kinds of sensing actions across technologies (particular sensors in the device), human sensory systems (human sight, orientation in space, gestural movement), and technologies working outside of the device but networked to it. The second is the dynamism of sentient systems: smartphone sentience can be organized through applications on individual devices working in real-time across dynamic networks enabling coordination and adaptation to a changing environment. While such real-time adaptation is usually associated with traffic and mapping platforms such as Waze and Google Maps, it can also be enabled through platform-based photographic apps to determine image-production, as we will argue below. Overall, it is this integration of distributed sensory awareness with networked adaptation and machine learning, that, when applied to smartphone-based image-production, distinguishes sentient photography from previous formations.
The phenomenon of sentient photography can be fruitfully elaborated through Science and Technology Studies (STS) – although, as we shall see, with important caveats. Perhaps the most significant exemplification of STS approaches in recent photography theory, and especially in research on smartphone photography, is Gómez Cruz and Meyer’s article on ‘Creation and Control in the Photographic Process’. This article uses STS (notably Actor-Network Theory), augmented by Social Informatics, to analyze the development of photography as a ‘socio-technical network’ of diverse actors – human and technological – across distinctive domains of image-production, processing, distribution and consumption, organized according to four historical ‘moments’ and a postulated new ‘fifth moment’ connected to the rise of the iPhone. The key take-away from these approaches is that ‘to understand how technologies are used by people, one should privilege neither the technical nor the social a priori, but instead must be open to the possibility that socio-technical assemblages can be driven by technological developments, by social construction, or by an iterative process of mutual shaping between the two’. The metaphor employed to signify this mutual shaping is the ‘pendulum’ swinging between technical objects (and their possibilities), and the social and practical knowledge-systems they interact with at different times.
Our understanding of sentient photography draws on Gómez Cruz and Meyer’s insights into photography as a dynamic assemblage that is always both social and technological. However, it also differs from them in several respects. The first difference is partly a matter of historical recency: Gómez Cruz and Meyer wrote at an earlier stage in the development of smartphone photography. Their principal claim for the novelty of the iPhone ‘moment’ is that:
For the first time in photographic history, a single device has made it possible to control the whole process, not only of image production and distribution of those images (like any mobile phone) but also the possibility of processing those images, in the same device, to obtain different results. As a consequence, the process itself has changed. The approach to the final image can be randomly experimental rather than pre-planned.
While they emphasize the integration of photographic functions of image-production, processing and distribution (which were previously spread across diverse technologies and infrastructures), we argue for an additional integrated function that is also transformative: the incorporation into image-production and processing of sensory capacities that are themselves – in certain cases – enabled by being distributed across devices and locations. Photography becomes sentient through new technologies that are produced and operated by networks of human and technical actors, from the smartphone itself and its various sensors and components, to the cellular and digital platform infrastructures which enable devices to ‘know’ their speed and location. The very multi-functionality and connectivity of the smartphone have enabled its ‘camera’ to achieve dynamic context-awareness, with new consequences for control over how images are made.
This brings us to our second departure from their framework, which is more conceptual and centers on the notion of ‘agency’: Gómez Cruz and Meyer use STS to describe photographic agency as a heterogenous network of humans and non-humans. We seek to understand the hierarchies within this heterogeneity, in which distinctive components of the assemblage shape ‘photographic agency’ more than others – they have more ‘agency’ over agency, so to speak. This is not to say that STS does not deal with hierarchies at all. Concepts such as the ‘black box’ and ‘punctualization’ indicate switching points at which networked assemblages become fixed and their chains of interaction become obscured or harder to change, while Akrich writes usefully about the ‘inscription’ and ‘description’ of technical objects, the former relating to designers’ ‘hypotheses made about the entities that make up the world into which the object is to be inserted’, with ‘de-scription’ accounting for the ‘mechanisms of adjustment (or failure to adjust) between the user, as imagined by the designer, and the real user’. Nevertheless, these concepts are used within an overall theoretical framework that postulates a radical equivalence between human and technical actors and the organization and distribution of agency between them.
It is from this axiom of radical equivalence that we diverge, siding with Pickering, who, while endorsing ANT’s emphasis on the ‘constitutive intertwining and reciprocal interdefinition of human and material agency’ marks a crucial divergence from them around the question of intentionality. For Pickering, the claim of radical symmetry between human and material technological actors does not sufficiently consider the role of intentionality in the temporal emergence of human agency: ‘Sometimes, at least, we humans live in time in a particular way. Unlike DNA double helixes or stereo systems, we construct goals that refer to presently nonexistent future states and then seek to bring them about. This extended temporal sweep of human agency is, for me, a respect in which the symmetry between human and material agency breaks down. This human ability to imagine and model the future (including the immediate future) is crucial to the emergent qualities of agency, and – as we discuss below – gives us greater conceptual and analytical purchase on the hierarchical movements between human and technological actors as new delegations are performed by ‘sentient photography’, delegations which affect previously existing structures of intentionality and predictability in creating photographs.
Photography, Consultation and Prediction
Much discussion of photographic agency in the early years of digital photography (especially in the 1990s) was framed around possibilities for ‘simulation’ and post-production ‘manipulation’ granted to photographers by new technologies, particularly software such as Photoshop. Common to much of these discussions was the fear that new digital technologies gave too much control to photographers over image-production, compared to the putative ‘weak intentionality’ of analogue photography and its indexical relation to the real. One early exception to this (negatively framed) concern about photographer agency was William Mitchell, who argued that the ‘algorithmic image’ produces a simple inverse equation: greater participation of the algorithm in creating the image meant increased involvement of the technology and reduced involvement of the human agent. Much more recently, Yanai Toister has amplified this position in a polemic against the historically-constructed ‘myth’ of the agency of the artist-photographer, claiming that the human photographer (along with the human viewer) has become largely irrelevant to contemporary photography, which he defines in purely informational terms as ‘an autonomous programmatic performance of a data-gathering process’.
Between these two extremes, the pre-digital photography theory of Vilem Flusser offers useful insights. Flusser likened the photographic act to the ‘ancient act of stalking which goes back to the Paleolithic hunter. Significantly, Flusser maintains that this combination of hunter, weapon and prey is not based a linear chain of human intentionality which provides control over the camera. Rather, it is the camera as a techno-cultural apparatus that governs the conditions in which control over the photographic act – and the resulting image – are performed: ‘In the act of photography the camera does the will of the photographer but the photographer has to will what the camera can do’. Within this dialectic of wills, it is important to emphasize the consultative character of the relationship between photographers and analogue cameras, especially since it can be carried over into digital photography through the simulation of analogue procedures. For instance, with both analogue and digital single-lens reflex (SLR) cameras, the camera’s built-in sensor warnings for low-light conditions or poor focus are similar; an audio-visual signal will advise the photographer to pay attention to these conditions. The photographer will then need to decide whether to respond and change the camera settings, or to go ahead and take the photograph anyway.
The photographer’s ability to make such a decision irrespective of the camera’s recommendation is essential to the consultative character of the interaction: the final decision remains constantly in the hands of the photographer. This idea of the consultative character of analogue features and their digital remediations appears to contradict Flusser’s claim – couched in computational rhetoric – that the camera ‘programs’ the photographer’s actions. Yet Flusser’s argument is not that the camera directly, that is instrumentally, makes decisions on the photographer’s behalf: rather, it is the broader cultural categories and industrial systems or ‘metaprograms’ that construct the parameters and possibilities of photographers’ choices (Pickering similarly notes that intentional projections of human plans ‘are constructed from existing culture in a process of modelling’. Hence while in theory merely ‘consultative’, the light meter’s advice is likely to be followed by virtually all photographers. The advice is suffused with cultural and technical authority: while it can be resisted in the name of the photographer’s control, ‘the freedom of the photographer remains a programmed freedom’. Nevertheless, the photographer retains an ultimate right of ‘override’ in each specific act of photographing: the algorithmic sequence of ‘if x, then y’ – ‘if the light-meter registers too little light, then I will not take the photograph’ – which will seamlessly operate in most cases, can still be defied.
There is one further important aspect to this consultative dimension of analogue and dedicated digital cameras: they enable the photographer to make predictive adjustments regarding the final image (including predictions that, by refusing the ‘advice’ of the light meter, the image will be over- or under-exposed). Predictability is central to Pickering’s conceptualization of the temporal structure of intentionality discussed earlier, bridging between projected goals, the imagination of nonexistent future states, and the possibility of realizing them. Predictability is an important aspect of the ultimate control of the photographer over the image- production process, and is also considered an indicator of the photographer’s skill. Indeed, part of the skill-acquisition process depends on the ability of the photographer to experience and understand the consequences of putatively ‘incorrect’ actions, to learn from one’s mistakes.
How do these ideas of consultation and prediction relate to the smartphone and sentient photography? The smartphone, one could argue, is in general a relatively ‘open’ and consultative device when compared to dedicated digital cameras – to the extent that, in Gómez and Meyer’s terms, it encourages ‘experimentation’. With dedicated digital cameras the interface is closed to modifications (except for extreme cases of hacking) and is directly and continuously in operation: it cannot be bypassed. With smartphone interfaces, in contrast, the photographer can deliberately switch between photo applications that all use the same optical and other hardware: this modularity gives the photographer an enormous range of possibilities and ‘virtual cameras’. The smartphone is not only distinctive as a ‘multi-camera’ device by virtue of additional lenses at the optical and hardware levels (forward and backward facing, dual and multi-lens technologies): it is also ‘multi-camera’ in that each software application, through its particular interface and functionality, produces a slightly different assemblage of photographic interaction with the photographer. These in turn provide diverse experiences of the image-production process, as well as different image outcomes.
Nevertheless, within this variation there are sentient photo-applications which deliberately and definitively intervene in new ways in image-production. These conspicuously take-over the photographic process based on their own context-aware calculations of optimal end results: they not only ‘know’ the device’s location, but can track the trajectory of its movement and how fast it is going, factoring this into their predictions. This enables the photo-applications to simulate spatial and gestural self-awareness, responding to and guiding the physical motions of a human photographer, and themselves implementing, in performative terms, predictive intentionality and future modelling. This means that such applications can also end the production process entirely – irrespective of the human photographer’s will – when they anticipate a non-satisfactory result. In these cases, the application has acquired the ultimate ‘process-override’ capacity.
Smart Interfaces, Direct Manipulation and Software Agents
To further characterize these different assemblages of photographic interaction we also adopt two terms from the field of Human-Computer Interaction (HCI): direct manipulation and software agents. ‘Direct manipulation’ is used to describe a relationship of cause and effect between users’ actions and the device’s computational processes. Direct manipulation utilizes the interface to provide users with a sense of familiarity and immediacy. An important characteristic of the direct manipulation interface is the user’s perception of ‘being in control’, with the ability to see instantly the consequences of one’s actions: ‘much of the appeal and power of this form of interface comes from its ability to directly support the way we normally think about a domain’. The cognitive perception of ‘being in control’ with photo-applications comes into effect through physical interactions with screen-based objects; for example, sliding a finger across the touch- screen to ‘turn’ a virtual dial provides instant visual feedback about the manipulation of an image, for example its brightness value. A downside to direct manipulation is that familiarity can constrain new thought and possibilities for action. Direct manipulation is based on prior knowledge and constantly simulates what the user already knows; hence it can discourage the use of new technologies to generate innovative meanings and interactions.
‘Software agent’ refers to computer programs which have some degree of autonomy with respect to task performance. Early discussions of agents in HCI defined them broadly enough to include – at least theoretically – non-computational entities (such as thermostats or, in the case of photography, an automatic flash) as well as computer programs. While there are distinctive definitions, sub-types and attributes proposed by different scholars, software agents are usually described as having autonomy from human intervention in aspects of decision making, being pro-active with regard to their environment, reacting in relevant and timely ways to environmental changes, and possessing temporal continuity without the need for repeated human activation. More recently, in defining what they call ‘intelligent software agents’ (ISA), have focused on the dynamic updating of how a program models its environment, particularly when that environment ‘includes a human user, and the ISA’s goals depend on whether the interacting user performs certain actions (e.g. clicks, purchases, actions in a game or physical exercise). Hence, what rewards the ISA obtains is conditional on its ability to influence the behavior of a human user’.
Two significant characteristics of the work of software agents are relevant to our discussion of agency in sentient photography. The first characteristic is the opacity of highly complex task performance: as software agents, smartphone photography applications like those discussed below achieve their goals by combining dynamic input and calculations from sensory, optical, geolocational and other parameters that are beyond the knowledge of all but the most computationally literate photographers (and also beyond the understanding of most non- professionals). Both the level of complexity and the opacity of its presentation seem new in degree for photographic production. Second, this complexity is often tied to the delegation of final authority over image-production to the smartphone application (as mentioned above), independent of the will of the photographer. Once prompted by the user to execute, the system operates as an autonomous ‘black box’ that carries out its processes with very little or no user intervention and on its own authority.
In the following sections we analyze three case studies of smartphone photography applications in order to show in greater detail how direct manipulation and software agents work within diverse photographic assemblages to structure agency. As noted earlier, these assemblages work across a continuum between two techno-cultural poles: from the use of interface components whose familiarity and responsiveness produce effects of photographer control based on direct manipulation, to context-aware sentient photography applications that openly and overtly challenge the ultimate override decision-making capacity of the photographer. We begin with Hipstamatic.
Hipstamatic Mobile Camera Application
Hipstamatic is among the earliest photo-applications designed for the Apple iPhone, winning Apple’s first ever ‘App of the Year’ award in 2010, but dwindling in popularity since the rise of Instagram, though it is still in operation. It uses the phone’s camera-sensor to create vintage images by means of software filters. What is distinctive about Hipstamatic is its visual interface: it displays an illustration of an old analogue camera on the smartphone’s touchscreen, and the photographer operates the virtual camera by pushing ‘buttons’ and sliding ‘knobs’ just as he or she might with a dedicated camera. The ‘analogue camera’ user-experience is heightened by performing visual manipulations on the captured image, for example when the photographer virtually ‘replaces the film’ or ‘switches the lens’. These tangible actions of switching ‘films’ and ‘lenses’ are of course borrowed from the domain of analogue photography, and they are key to the application’s overtly nostalgic sensibility. The virtual ‘buttons’ and ‘knobs’, with the large range of directly switchable ‘lenses’, ‘films’ and ‘flashes’ are an overt example of ‘remediation’ – the citation of one (usually older) medium in another – reproducing the ‘look and feel’ of optical analogue cameras which require and encourage continual decision-making on the part of the user.
An important outcome of the Hipstamatic interface is that it uses direct manipulation to disguise the work of image-processing software. For example, its image preview screen resembles the viewfinder of an old-style point and shoot camera, and is deliberately pervaded by (simulated) visual ‘defects’ such as ‘light reflections’ on the virtual ‘glass’. Furthermore, when, for instance, the photographer changes the shutter speed by moving a virtual ‘dial’, he or she is not actually affecting the speed of a physical shutter at all, because smartphones do not have mechanical shutters. Yet in contrast to a point and shoot camera a with fixed shutter speed, whose viewfinder image reflects a constant relationship to external light conditions, the Hipstamatic interface simulates a ‘hyper-realistic’ change in ‘shutter speed’ by altering the brightness of the ‘viewfinder’ image seen by the photographer. The reality behind this simulation, of course, is that Hipstamatic is not based on analogue technology, but is a software program executing a series of computational operations.
However, it is important to distinguish between this masking of software processes and the question of user control. While Hipstamic disguises the form of the control it gives the photographer over the photographic process and the final image, it does not override the substance of user control itself: users can and do employ the interface (often with great dexterity and flair) to produce diverse outcomes – frequently beyond the range of most analogue photographers. So while it is fair to say that Hipstamatic’s computational simulations are pre-defined by software engineers, this may be no more (and no less) determining than such analogue fixities as the chemical balance of FUJI (or other) photographic films, which were pre-defined by chemists and film experts. Indeed, Hipstamatic uses the ‘variability’ and simulative qualities of software-based media objects to present photography as a space of (nostalgically-inflected) aesthetic play. Its remediation of the analogue parameters of lens, film and flash ‘inscribes’ an effective and functional template for combinatorial experimentation, freed from the material constraints – of both expertise and equipment costs – that characterize actual analogue photography. Significantly, this framework for aesthetic play and combinatorial experimentation is produced within a relationship of predictability: the end result of the possible permutations offered by the application, and selected by the user, produce the results expected by the user from the image in the viewfinder. Additionally, familiarity and experimentation over time can contribute to this sense of control through predictability.
Here, then, direct manipulation – the seeming control over image-production procedures – provided by a nostalgically remediated interface, disguises the deep participation of software processes in the production of the image, though it does not deprive the photographer of ultimate control. One further caveat is significant here, however. Hipstamatic, like other smartphone applications and computer software in general, is regularly updated. These updates create changes to the application, adding new features but also changing core processes (they also fix bugs which photographers may have learnt to work with). This somewhat tempers our previous statement about the predictability of the application. Hipstamatic, to a much greater degree than the analogue equipment it simulates, is a dynamic, evolving and therefore relatively unpredictable system, entailing a never-ending learning curve for its users.
Beyond Hipstamatic, however, there are smartphone applications which are not only less conspicuously dependent on remediations of analogue technology, but which control the image more overtly and decisively thanks to more extensive integration of the smart device’s array of sensors. This brings us to the iPhone’s Pano feature.
iPhone Camera Pano Capabilities
The Pano tool is part of the iPhone’s native camera application. It enables the photographer to create horizontal panoramic images by gradually moving the iPhone across the desired scene. This requires following the application’s instructions (‘move iPhone continuously’, ‘keep the arrow on the center line’, ‘slow down’, ‘move up’, ‘move down’) until one reaches the end of the screen-space used by the application. These instructions are displayed via a large arrow, a guidance line and a central ‘viewfinder’ strip showing the panoramic image-in-process, accompanied by textual guidance overlaying the digital viewfinder shown on the smartphone screen.
If the photographer strays too far from the guidance line while taking the image, or moves too quickly, the application advises the user of her non-compliance by presenting textual messages designed to steer the photographer’s movements back on track: the photographer literally is the kinetic arm of an algorithmically guided physical action. However, if the non-compliance is more radical – such as moving the camera in the opposite direction to the indicated horizontal trajectory, or turning the smartphone on its side to move vertically, the application can simply end the process of ‘taking the image’.
Unlike Hipstamatic, which is indifferent to the device’s physical orientation within its immediate spatial environment (and is thus similar to most dedicated digital cameras), Pano is an example of ‘sentient photography’: it uses the smartphone’s range of location and orientation sensors in addition to its optical hardware. This means that the iPhone Pano interface produces a heightened sense of continual negotiation between the user who is moving the camera and the immediate responses and instructions provided by the algorithm. The negotiation with the algorithm is critical for the creation of a ‘successful’ panorama.
The camera has become overtly part of a spatially sentient entity with whom users can collaborate, or by whom they can be opposed (through bringing the ‘collaboration’ to an end). Significantly, there are recognizable and fuzzily predictable thresholds within which the negotiation process can occur. One can deliberately work against the instructions of the application to a degree that does not cause it to end the process. Moreover, the application does store images produced by deviations from instructions that remain within the threshold, even if this means that the final image is far from being a seamless and continuous photograph.
These thresholds or margins for deviation have allowed photographers to come up with ‘work-arounds’ to the application’s programing process, and to use the Pano in ways which are unintended. For instance, ‘Pano glitches’ are panoramic images created by photographers who deliberately choose to bypass the interface and create their own aesthetics. These images are examples of aesthetic choices which are conducted within the parameters set by the algorithm, and they enable the development by photographers of new kinds of embodied dialogical awareness and creative skill in relation to the device: this dialogue, however, is with a sentient entity who can end the interaction with the user of its own accord, and whose collaboration must be ensured through taking its decision-making capacity into account.
Unlike analogue devices, DSLRs, and smartphone applications like Hipstamatic, the ultimate override, the decision whether to create or not create an image desired by the photographer, has been delegated to the software. Photographers can learn to factor this delegation into their own uses by testing the software’s boundaries in trial and error interactions: the Pano tool’s visual feedback, displaying the image-in-process in the central viewfinder strip, provides the user with a sense of predictability and an understanding of how deviations within the thresholds of acceptability will look in the final image. Nevertheless, this delegation of the override to the sentient system largely contradicts the kinds of ‘fluid practice, a playful relationship with the possibilities of the program’ that Gómez Cruz & Meyer suggest is characteristic of smartphone photography. A more draconian insistence on precise control, but within a different framework of bodily relations between photographer and algorithm, and a more distributed platform-based software agent, can be found in our final case study: Google Street View (GSV).
Photographing with Google Street View Software
Google Street View (GSV) is a feature of Google Maps and Google Earth that provides viewers with interactive ‘street level’ 360 ̊ panoramic images of streets and other locations. While the GSV platform has been highly controversial in relation to questions of social bias and surveillance, we are concerned here with its implications for image-production. Many of the system’s 360 ̊ panoramic images are contributed to the GSV database by registered users via the ‘take photo sphere’ function of Google Street View’s smartphone application. Based on the operation of software agents, the application enables the production of 360 ̊ images by integrating the smartphone’s camera with a spatially-aware photo-stitching process that structures the interaction with the photographer. When the user turns on the ‘photo sphere’ function of the GSV application, the smartphone’s built-in camera goes to work and the interface becomes a camera frame.
A hollow circle in the center of the screen and a movable dot are a part of the interface, along with textual instructions that appear at the bottom of the screen. In order to produce the stitched 360 ̊ image, the user must carry out the application’s instructions with absolute precision. First, the user is told to point the camera at the dot. The stitching process will not proceed until the dot is aligned with the circle at the center of the frame. To align the dot, the user must move her hands laterally, following visual and textual commands. Once the dot is aligned, the application directs the user to the next spatial point, and so on. The process comes to an end when the user has followed the application’s instructions in full. If, for any reason, it is halted prior to this, the collected data is lost. Within the entire interaction, the only decision made by the user is to initiate the process. The fact that the GSV application has the ability to finalize the production process is fundamental to the photographer’s acceptance of its control.
The contrast between GSV and iPhone Pano is instructive. If, in the case of Pano, the photographer becomes the kinetic arm of the algorithm moving continuously within thresholds that are made visible on the screen, with GSV the photographer is the mobile point-finder of a selective binary process of spatial recording in discrete stages. The physical ‘choreography’ and predictive relations between photographer and software are subtly but significantly different. iPhone Pano engages the photographer, to speak metaphorically, in an ‘analogical’ movement through space. The movement of scanning the image is experienced as continuous, as is the registration of the image in the viewfinder and in the final product: even acceptable deviations from the path assigned by the interface need to be part of a continuous bodily movement. This allows photographers to predict the kind of movements required of them – even to ‘mime’ the taking of a Pano image – since Pano utilizes a procedure of spatial scanning that is tied to intuitive and proprioceptive relations between hand, eye, balance and space.
One could argue that Pano invokes and participates in new smartphone-based modes of image- making associated with ‘gestural’ embodiment, notably the selfie: Pano’s on screen display of the thresholds for ‘permissible’ motion constitute a visualization of the relations between ‘inscription’ and ‘de-scription’, discipline and expression, in the human- technological ensemble, where the photographer’s arm and body ‘dance’ to moves directed by the camera’s eye. In contrast, the GSV app is based on sequential, fragmented ‘jumps’ between discrete points that is multi-perspectival, discontinuous and unpredictable: one does not know where one will be asked to shoot from next. The overriding logic here is provided by an algorithmic view ‘from the cloud’, so to speak – a view perceptible exclusively to software agents as a sentient assemblage – which combines the images one takes from the points decreed by the interface with additional data and processing in Google’s Street View servers. It is only through this ‘view from the cloud’ that the image is stitched together. Hence the power granted to the algorithm seems different to the ‘collaborative authorship of the image’ that Uricchio postulated as the core of ‘algorithmic photography’, since with GSV the photographer has waived control over core aspects of image-production. There are almost no degrees of freedom for either aesthetic play or for predicting the precise outcome from a particular sequence of actions.
Clearly, crucial to this contrast between Pano and the GSV interface is the redistribution of operations from a single device to software agents working on a continually connected server- based system. The sentience of the GSV application is utterly dependent on the constant connectivity of the smartphone camera and other device sensors to a larger network: unlike iPhone Pano (or Hipstamatic) it will not work without network connectivity. Image-taking, processing and creation are real-time connective processes, further complicating and challenging the traditional relations between photographer and equipment and the distribution of agency.
When we add to the mix the fact that Google’s software agents are continually ‘learning’ from their multiple interactions with humans and other actors, we can characterize the kind of relationship manifested through GSV as a dynamically evolving condition of expertise-lag in photographic production: the algorithm will always be slightly different from the one previously encountered. If, as noted earlier, the regular updates of a much more static and device-based application such as Hipstamatic put into question the expertise of the photographer, then the challenge is magnified exponentially by a cloud-based learning algorithm such as GSV.
Sentient Photography Applications Extend Image Production Capabilities
In this article we have explored how smartphones enable a new kind of sentient photography that is significantly different to conventional ‘non-sentient’ digital photography (on both dedicated cameras and, in apps like Hipstamatic, on smartphones), focusing on the interaction between humans and technology in image-production. Moreover, the shift from conventional to sentient photography may be even more culturally significant than the widely discussed analogue versus digital divide. Here we extend Gómez Cruz and Meyer’s argument that the advent of the smartphone represents a distinctive ‘moment’ in the history of photography that is different to that of primary digitization, because the smartphone contains, in one single device, all the main elements of the medium that were previously distributed across different equipment and infrastructures.
However, in the case of sentient photography, the photographer’s body has become deeply infrastructural too, activated and governed in new micro-gestural ways through the physical directives enabled by the smartphone’s spatial awareness technologies and software agents. The photographer’s body, in some though not all of the applications we have analyzed, is itself integrated into a wider sentient assemblage of photographic production.
This caveat – that not all smartphone photography applications produce sentient software control – returns us to the idea of a continuum between diverse photographic assemblages of human and technological elements. Every point on this continuum, from Hipstamatic’s experimental playfulness through Pano’s negotiated sentience to GSV’s cloud-based instrumentalism, operates within the framework of ‘non-human photography’. This is because photography, as Zylinska observes, has always been ‘non-human’; moreover, being ‘non- human’ does not involve the elimination of the human, but positioning ‘the human as part of a complex assemblage of perception in which various organic and machinic agents come together – and apart – for functional, political and aesthetic reasons’.
Additionally, the advent of sentient software control of image-production through applications like GSV is not the result of a teleological process: we are not arguing that sentient photography is the culmination of a historical logic of increasing technological dominance. The example of Hipstamatic is important in this respect (and it is only one among many applications that provide detailed levels of control over smartphone image-production processes). While it is heavily dependent on software that disguises the character of its operations, it nevertheless provides new possibilities for human play, initiative and skill in producing images: emergent forms of technological and human agency are jointly magnified in its ‘mangle of practice’. In fact, the two poles of the continuum offer an interesting commentary on one of the best-known claims regarding the shift between analogue and digital media cultures: Manovich’s argument that this shift corresponds to a general movement from an industrial to a postindustrial logic. Hipstamatic appears to resonate with a postindustrial logic of variability, modularity and niche expertise, as does the smartphone’s entire universe (or better still, marketplace) of diverse applications for photography and many other cultural practices.
The Google Street View app, in contrast, seems to promote an industrial logic of task simplicity, repetitiveness and the opacity of the final product to its human operators (we do not know or need to know how the image will look). The emphasis is on system efficiency: photographers are workers in the factory called Google Street View. One just has to follow instructions. While Hipstamatic and even iPhone Pano are user-centered, GSV is software and platform-centered.
This contrast leads to a final point. The sentience of the smartphone as an image-producing device reaches its extreme point on our continuum when integrated into a distributed server-based system in ‘the cloud’. This situation corresponds to what Kember calls ‘ambient intelligent photography’, with two differences of emphasis.
The first is that the sentient ambient systems Kember discusses are geared towards ‘naturalistic, invisible and embedded computing’ as part of a ‘quest to make media and technological disappear into ambient and augmented environments’. Yet such user-centered transparency is decidedly at odds with the physically jerky yet strictly controlled choreography imposed on the photographer’s body by the GSV interface: rather than disappearing, the technology is foregrounded, its decisions and procedures made opaque.
The second difference of emphasis is that the distributed systems of sentient photography we highlight are not ad-hoc assemblages but active forces in the ‘platformization’ of photographic production. They coincide with the shift from ‘networked’ models of digital connectivity to the ‘platform’ as the key organizational mode of contemporary mediated social life: entire ecologies of media technologies, administrative structures and social practices dominated by computationally-enhanced commercial organizations. While Instagram is usually perceived as the prime example of the platformization of photography, directly regulating photography through its interlinked application-driven editing (e.g. filters), distribution, viewing and commenting features, we believe that sentient photography through systems such as GSV take platformization one step further. Rather than moderating images as uploaded content, the system itself directs the physical enactment of photographs so that they are ‘platform-ready’. This occurs, via the application, from the earliest stages of primary production onwards, requiring the application’s instructions and its approval.
Thus the GSV platform, working through our sentient smart devices, utilizes the very gestures of our bodies as physical ‘platforms’ for ‘its’ camera. Sentient photography shows how platformization is not restricted to the ‘extension of social media platforms into the rest of the web’. For it goes beyond the web: it extends into physical space as a powerful new framework for photographic action in which we – embodied vehicles of cloud-based software – produce ‘programmed’ images of the world.
Text by Doron Altaratz and Paul Frosh (The Hebrew University of Jerusalem)