research
2020.08.2 Sunday

Architecture and Bioinformatics: Application of DNA alignment methods to human behaviors analysis

Miguel Andrade/CC BY 3.0

Methodology

DNA sequence

DNA is formed as the double helix structure, being composed of four nucleotide: the adenine (A), cytosine (C), Thymine (T), and guanine (G). In this way, the genetic information can be described as the sequence, which is the combination of four letters as a string. The main objective of genomics is to compare the different species’ DNA sequences and measure their similarities by arranging the symbols in each sequence. When two DNA sequences have the similar regions, those sequences can be considered to be probably homologous, which might share the common ancestor in the evolution process. Conversely, when the sub-string of two DNA sequences are similar, those sequences may also derive from the common ancestor.

DNA sequence alignment methods

Sequence Alignment Methods (SAMs) enables us to measure the distance of two or more sequences. “Alignment” is a technique to explore the optimal alignment among sequences by arranging the symbols. The set of defined operations are conducted to transform the symbols of one sequence into symbols of the other, and more efforts it is needed to equalize one string of information elements with another, the more distance or dissimilarities there exist between two sequences.

The distribution of Bluetooth sensor and Wi-Fi access points

Here, we take a look at data collection through Wi-Fi and Bluetooth for the human movement in the built environment. The figure presents the conceptual diagrams of the sensor deployment, the captured data, and its structure.The obtained raw data enable us to transform into several kinds of forms for our subsequent analysis. For example, the simple path can be the one, which contains only the spatial notations where the passengers pass by in order. Or, the spatial notations with the time stamps of the length of stay in each location make more spatio-temporal oriented analysis possible. This transformation is useful for the application of Sequential Alignment Methods (SAMs), which has been used for DNA analysis.

Sensor deployment and Dataset

Figure 1. Location of 15 Wi-Fi access points, indicating their approximate sensing range

It shows the location of 15 Wi-Fi access points deployed all over Paris-Gare-de-Lyon, one of the largest railway stations in Paris, which receives almost 90 million passengers per year. It contains the commercial area, having more than 100 retail stores, restaurants, and cafes. The underground space is connected to the metro stations, bus stops, and parking. Node Z is the platform, which has 10 tracks. Nodes E, F, G, and H comprise the waiting node, where the ticket sales booths are located, while nodes B, I, J, K, L, and M form the commercial space in the station.

Results

Figure 2. (a) The distribution of passengers after arriving from node Z.

(b) The distribution of leaving passengers, who move to node Z.

Our analysis shows that 31.8% of passengers arrive at the station through node Z, while 51.4% leave the station from the same node, indicating that this station is largely oriented toward leaving rather than arriving. In addition, Figures 2(a) and 2(b) show the uneven spatial distribution of the arriving and leaving passengers. To analyze the passengers’ behaviors in more detail, we examine the frequently appearing paths in both groups: the longer-stay type and the shorter-stay type.

Table 1. Top 15 frequently appearing paths from the longer-stay type and shorter-star type of passengers

Table 1 presents the top 15 most frequently appearing paths in the longer-stay and shorter-stay types of passengers, and Figure 4 is the visualization of those paths. As we can see, the most paths are concentrated in such regions as nodes C, D, L, and K. Conversely, the paths of shorter-stay passengers are likely to be dispersed and extended to all over the station, such as nodes E, J, H, I, and M (see top left in Figure 4). This indicates that the shorter-stay type passengers tend to explore much wider spatial dimensions than the longer-stay passengers.

Figure 3. Top 10 of the most frequently appearing paths in each group from the shorter stay type to the longer-stay one.

The result of the multiple sequence alignment for a protein

First 90 positions of a protein multiple sequence alignment of instances of the acidic ribosomal protein P0 (L10E) from several organisms. Generated with ClustalX. Miguel Andrade/CC BY 3.0

The result of the multiple sequence alignment for human movement in the station

Source: Urban Sciences Lab

We implement the multiple sequence alignment for Wi-Fi dataset about the human movement in the station. The color indicates the functional spaces in the station such as the commercial space or underground space. The result enables us to capture how people transit the spaces in terms of its function.

Phylogenetic tree for the organism

Haeckel’s Tree of Life in Generelle Morphologie der Organismen (1866), obtained from https://en.wikipedia.org/wiki/File:Haeckel_arbol_bn.png

The organism has evolved through the process that repeatedly divides into two or more organisms from the common ancestors. Therefore, it make us possible to infer the organism’s evolutional process by examining the close relationships among species. The phylogenetic tree represents the evolution and its process of the organisms. Although it is created since Darwin’s age, the recent advance of our technology enables us to create the molecular phylogenetics, which is based on the analysis of the organism’s genes.

Phylogenetic tree for people’s trajectory in the station

Source: Urban Sciences Lab

We analyzed our collected dataset by applying DNA sequence alignment methods. Through this algorithm, we can classify each path into the group, which can be considered as the similar pattern. Figure visualizes how each path is classified depending on the similarities of its components. This is the phylogenetic tree of people’s sequential movement in the station.

Publications

Yoshimura, Y., de la Torre, I., Park, S., Santi, P., Seer, S., Ratti, C. (2020) Paris-Gare-de-Lyon’s DNA: Analysis of Passengers’ Behaviors through Wi-Fi Access Points, Traffic and Granular Flow 2020, Springer (accepted)

現在、英語版が表示されています。
▲ Back to Top