What technique can be used for extracting evidence from large systems?
Open access peer-reviewed chapter
Submitted: June 13th, 2018 Reviewed: October 15th, 2018 Published: September 30th, 2020
Internet plays a vital role in providing various services to people all over the world. Its usage has been increasing tremendously over the years. In order to provide services efficiently at a low cost, cloud computing has emerged as one of the prominent technologies. It provides on-demand services to the users by allocating virtual instances and software services, thereby reducing customer’s operating cost. The availability of massive computation power and storage facilities at very low cost motivates a malicious individual or an attacker to launch attacks from machines either from inside or outside the cloud. This causes high resource consumption and also results in prolonged unavailability of cloud services. This chapter surveys the systematic analysis of the forensic process, challenges in cloud forensics, and in particular the data collection techniques in the cloud environment. Data collection techniques play a major role to identify the source of attacks by acquiring evidence from various sources such as cloud storage (Google Drive, Dropbox, and Microsoft SkyDrive), cloud log analysis, Web browser, and through physical evidence acquisition process.
*Address all correspondence to:
In today’s world, users are highly dependent on the cyberspace to perform all day-to-day activities. With the widespread use of Internet technology, cloud computing plays a vital role by providing services to the users. Cloud computing services enable vendors (Amazon EC2, Google, etc.) to provide on-demand services (e.g., CPU, memory, network bandwidth, storage, applications, etc.) to the users by renting out physical machines at an hourly basis or by dynamically allocating virtual machine (VM) instances and software services [1, 2, 3]. Cloud computing moves application software and databases to large data centers, where the outsourcing of sensitive data and services is not trustworthy. This poses various security threats and attacks in the cloud. For instance, the attackers use employee login information to access the account remotely with the usage of cloud . Besides attacking cloud infrastructure, adversaries can also use the cloud to launch an attack on other systems. For example, an adversary can rent hundreds of virtual machine (VM) instances to launch a distributed denial-of-service (DDoS) attack. A criminal can also keep secret files such as child pornography, terrorist documents, etc. in cloud storage to remain clean. To investigate such crimes involved in the cloud, investigators have to carry out forensic investigations in the cloud environment. This arises the need for cloud forensics, which is a subset of network forensics. Cloud forensics is an application of scientific principles, practices, and methods to reorganize the events through identification, collection, preservation, examination, and reporting of digital evidence . Evidence can reside anywhere in the cloud and it is more complex to identify the traces located in the cloud server.
The advancement of new technologies, frameworks, and tools enables the investigator to identify the evidence from trusted third parties, that is, cloud service provider (CSP). There are numerous techniques in cloud forensics that arises on the basis of cloud service models and deployment models. In the Software as a Service (SaaS) and Platform as a Service (PaaS) models, the customer does not have any control of the hardware and they need to depend on CSP for collecting the evidence, whereas, in the case of Infrastructure as a Service (IaaS) model, customers can acquire the virtual machine (VM) image and logs.
The forensic examiner isolates the attacked system in the virtualized environment by segregating and protecting the information from a hard disk, RAM images, log files, etc. This evidence is analyzed based on the artifacts of the attack traces left by the attacker [6, 7]. The forensic investigator relies on finding a series of information such as where, why, when, by whom, what, and how attack has happened. This chapter details the challenges in cloud forensics and also details the data collection techniques in the cloud.
2. Types of forensics
The forensic process is initiated after the crime occurs as a post-incident activity. It follows a set of predefined steps to identify the source of evidence. It is categorized into five groups, namely digital forensics, network forensics, Web forensics, cloud forensics, and mobile forensics.
2.1 Cloud forensic process flow
The cloud forensic process flow is shown in Figure 1, which is described as follows:
Cloud forensic process flow.
3. Evidence collection
Evidence collection plays a vital role to identify and access the data from various sources in the cloud environment for forensic investigation. The evidence is no longer stored in a single physical host and their data are distributed across a different geographical area. So, if a crime occurs, it is very difficult to identify the evidence. The evidence is collected from various sources such as router, switches, server, hosts, VMs, browser artifacts, and through internal storage media such as hard disk, RAM images, physical memory, etc., which are under forensic investigation. Evidence is also collected through the analysis of log files, cloud storage data collection, Web browser artifacts, and physical memory analysis.
3.1 Cloud log analysis
Logging is considered as a security control which helps to identify the operational issues, incident violations, and fraudulent activities [9, 10]. Logging is mainly used to monitor the system and to investigate various kinds of malicious attacks. Cloud log analysis helps to identify the source of evidence generated from various devices such as the router, switches, server, and VM instances and from other internal components, namely hard disk, RAM images, physical memory, log files etc., at different time intervals. The information about different types of attacks is stored in various log files such as application logs, system logs, security logs, setup logs, network logs, Web server logs, audit logs, VM logs, etc., which are given as follows:
Due to the increase in usage of network or new release of software in the cloud, there is an increase in the number of vulnerabilities or attacks in the cloud and these attacks are reflected in various log files. Application layer attacks are reflected in various logs, namely access log, network log, authentication log, etc., and also reflected in the various log file traces stored on Apache server. These logs are used for forensic examination to detect the application layer attacks. Table 1 indicates the various attack information and the tools used for log analysis of different types of attacks. Figure 2 shows the sample access log trace (Table 2).
Different types of logs, attacks, and the log analysis tool.
Sample access log trace as evidence.
Description of the access log format.
3.2 Evidence collection from cloud storage
It is the process of collecting evidence from cloud storage such as Dropbox, Microsoft SkyDrive, Google drive, etc., using the Web browser and also by downloading files using existing software tools [11, 12, 13]. This helps to identify the illegal modification or access of cloud storage during the uploading or downloading of file contents in storage media and also checks whether the attacker alters the timestamp information in user’s accounts. The Virtual Forensic Computing (VFC) tool is used by forensic investigators to identify evidence from VM image file. The evidence is accessed for each account using the Web browser running in the cloud environment by recording the encoded value of VM image. The packets are captured using network packet tools, namely Wireshark, snappy, etc., of each VM instance running in hosts. The account information is synchronized and downloaded using client accessing software of each device which is used to identify the source of evidence. The evidence is isolated from the files found in VM using “C:\Users\[username]\ Dropbox\” for Dropbox as shown in Figure 3. The zip file contains the name of the folder that can be accessed via the browser to determine the effect of a timestamp in a drive. If an attacker modifies the contents of a file, the evidence is found by analyzing the VM hard drive, history of files stored in the cloud, and also from a cache. It can also be analyzed by computing the hash value of the VM image. The evidence of Google Drive cloud storage is depicted in Figure 4.
Google Drive evidence.
3.3 Evidence collection via a Web browser
The clients communicate with the server in the cloud environment with the help of a Web browser to do various tasks, namely checking email and news, online shopping, information retrieval, etc. [14, 15, 16, 17, 18]. Web browser history is a critical source of evidence. The evidence is found by analyzing the URLs in Web browser history, timeline analysis, user browsing behavior, and URL encoding, and is recovered from deleted information. Here is an example of Web browser URLs,
Similarly, the evidence stored in Web browser cache at the root directory of a Web application is used to identify the source of an attack. Table 3 indicates the evidence collection process and recovery method for various Web browsers.
Evidence collection process and recovery method for different Web browsers.
Here is an example of a Chrome forensic tool that captures and analyzes data stored in Google Web browser. It analyzes the data from the history, web logins, bookmarks, cookies, and archived history. It identifies the evidence from C:\Users\USERNAME\Appdata\Local\Google chrome\UserData\Default. Figure 5 depicts the Google Chrome analysis forensic tool.
Chrome forensic analysis tool.
3.4 Physical memory analysis
This has the ability to provide caches of cloud computing usage that can be lost without passive monitoring such as network socket information, encryption keys, and in-memory database. They are analyzed from the physical memory dump using the “pslist” function, which recovers the process name, process identifier, parent process identifiers, and process initiation time. The processes can be differentiated using the process names ©exe© on the Windows, and ©sync© on the Ubuntu and Mac OS. Table 4 indicates the evidence collection process for cloud forensics in cloud storage and cloud log analysis.
Evidence collection process for cloud forensics.
4. Cloud forensics challenges
This section elucidates the forensic challenges in private and public cloud. It is observed from the literature that most of the challenges are applicable to the public cloud while fewer challenges are applicable to the private cloud environment.
4.1 Accessibility of logs
Logs are generated in different layers of the cloud infrastructures [2, 3, 4, 5, 6, 7]. System administrators require relevant logs to troubleshoot the system, developers need logs for fixing up the errors, and forensic investigators need relevant logs to investigate the case. With the help of an access control mechanism, the logs can be acquired from all the parties, that is, from a user, CSP, and forensic investigator.
4.2 Physical inaccessibility
The data are located in different geographical areas of the hardware device. It is difficult to access these physical access resources since the data reside in different CSPs and it is impossible to collect the evidence from the configured device. If an incident occurs, all the devices are acquired immediately in case of a private cloud environment since an organization has full control over the resources. The same methods cannot be used to access the data in case of a public cloud environment.
4.3 Volatility of data
Data stored in a VM instance in a cloud will be lost when the VM is turned off. This leads to the loss of important evidence such as syslog, network logs, registry entries, and temporary Internet files. It is important to preserve the snapshot of the VM instance to retrieve the logs from the terminated VMs. The attacker launches an attack and turns off the VM instance, hence these traces are unavailable for forensic investigation.
4.4 Identification of evidence at client side
The evidence is identified not only in the provider’s side but also the client side. The user can communicate with the other client through the Web browser. An attacker sends malicious programs with the help of a Web browser that communicates with the third parties to access the services running in the cloud. This, in turn, leads to destroying all the evidence in the cloud. One way of collecting the evidence is from the cookies, user agent, etc., and it is difficult to obtain all the information since the client side VM instance is geographically located.
4.5 Dependence of CSP trust
The consumers blindly depend on CSPs to acquire the logs for investigation. The problem arises when CSPs are not providing the valid information to the consumer that resides in their premises. CSPs sign an agreement with other CSPs to use their services, which in turn leads to loss of confidential data.
In cloud infrastructures, multiple VMs share the same physical infrastructure, that is, the logs are distributed across various VMs. The investigator needs to show the logs to court by proving the malicious activities occurring from the different service providers. Moreover, it also preserves the privacy of other tenants.
In cloud infrastructures, the log information is located on different servers since it is geographically located. Multiple users’ log information may be collocated or spread across several layers and tiers in the cloud. The application log, network log, operating system log, and database log produce valuable information for a forensic investigation. The decentralized nature of the cloud brings the challenge for cloud synchronization.
4.8 Absence of standard format of logs
Logs are available in heterogeneous formats from different layers of a cloud at CSP. The logs provide information such as by whom, when, where, and why some incidents occurred. This is an important bottleneck to provide a generic solution for all CSPs and all types of logs. Table 5 indicates the survey of literature that deals with the challenges of cloud forensics mainly for evidence collection process.
Challenges of cloud forensics.
5. Forensic tools
There are many tools to identify, collect, and analyze the forensic data for investigation. Juel et al. developed the PORs tool for the identification of online archives for providing integrity and privacy of files . Dykstra et al. proposed a forensic tool for acquiring the cloud-based data in management plane . It ensures trust in cloud infrastructures. Moreover, Encase and Access data FTK toolkit are used for the identification of trusted data to acquire the evidence. Similarly, tools such as evidence finder and F-response are used to find the evidence related to social networks. Dystra et al. proposed FROST, an open source OpenStack cloud tool for the identification of evidence from virtual disks, API logs, firewall logs, etc. .
6. Open research problems in cloud forensics
Many researchers have proposed various solutions to mitigate the challenges of cloud forensics. Some of the researchers have proposed new approaches to test the attacks in real-time environment. CSPs have not adopted the proposed solutions yet. Customers or investigators rely on CSPs to collect the necessary logs since they do not have direct physical access. Customers or investigators depend on CSP to collect the various information from the registry, hard disk, memory, log files etc. Even though various forensic acquisition process is proposed still the dependence of CSP remain unsolved. The critical issue is the usage of bandwidth resources. If the cloud storage is too high, then it results in more utilization of bandwidth. There is insufficient work evolved to preserve the chain of custody to secure provenance. There is no ideal solution for cybercrime scene reconstruction and preservation of evidence. Another critical issue is based on the modification of existing forensic tools that may lose evidence. Some researchers have proposed logging as a service to provide confidentiality, integrity, and authentication . This solution is not suitable for IaaS cloud.
7. Case study
This section introduces a hypothetical forensic case study related to a cloud storage service and also describes a forensic investigation of the case.
7.1 Case study: cloud storage
The organization “X” found that their document named as “X_new.pdf” about the new release of a product has been leaked to their competitor [21, 22, 23, 24]. “Mr. Morgan” was managing the credential files of the document stored in the cloud. At the initial stage of the investigation process, the suspect of the leaked file case was “Mr. Morgan.” The forensic investigator has to identify the suspect by checking the organization network, or by the analysis of log files, or by collecting the trace of relevant file in the network. Mr. Morgan’s network does not have any clue about the secrets since he uses only the personal computer (PC) and Android phone for business. To identify the suspects, the forensic investigator seized the PC and Android phone since these are the target devices used by the adversary. From the suspected devices, the leaked file has not been detected. Later, the investigator started analyzing the unallocated area in the file system, operating system, external devices such as hard drive, tablets, etc., and the Web service, but no evidence was found in the investigation. The investigator found that the Dropbox was installed in the PC and five files of config.db have been accessed recently. The forensic investigator issued the search warrant and identified the evidence in Dropbox by accessing Morgan’s Dropbox storage with the username and password. It was observed that Morgan recently uploaded five files in Dropbox and identified that one of the files named as “XYZ_new.pdf” had the same contents as “X_new.pdf.” Later, he deleted the traces of uploading or downloading the contents in PCs. The investigator found that Mr. Morgan has deleted the traces of the file contents and shared the evidence stored as “XYZ_new.pdf” with the competitor through an external SD card.
7.2 Case study: online railway ticket fraud
An online railway ticket booking service provider claimed that some unknown user had used the internet ticket booking facility to book 44 railway tickets using the stolen credit card details . It has been charged back from the credit card companies for all transactions which led to a huge loss to the service provider. It is inferred from the investigation that the suspects have booked 44 tickets with different names of a person through the website at different locations. Later on, through investigation, the investigator found that the suspects arrived from a particular IP address, thereby seized the contents of the user accounts with the password, and the stolen credit cards were recovered from the suspects.
7.3 Case study: morphed photographs
The user got threatening pornographic emails from the adversary that one photograph was posted on the popular website . The IP address for posting such threatening emails on the website was retrieved and was traced to a company. During an investigation, it was observed that the emails were sent from the company premises from one of the terminals. The log records and cookies were examined from the seized system and the morphed photographs were found in one of the systems used by the suspect. The mirror image of the hard disk was collected and analyzed using disk imaging and forensic analysis tools to recover all the data files required for the case. At the end of the investigation, it was found that the suspect was an ex-colleague of the company.
7.4 Case study: malicious insiders
Mr. X is an intruder who intends to exploit victims by sending malicious Web page in the cloud . He uses a vulnerability to exploit the cloud presence of Buzz Coffee, a legitimate company. He installs a rootkit that injects a malicious payload into Web pages displayed and hides the malicious activity from the operating system. It redirects victims to the website, which infects them with malware. The users complain to the legitimate company that they are being infected, so the company went to the investigator to investigate the case by finding all the traces of the malicious Web page to identify the malicious user.
7.5 Case study: ransomware attack
A securities and brokerage firm became a victim of a ransomware attack . The hacker demanded a ransom of two Bitcoins for each system that was infected. During the investigation process, it was observed that several other critical systems were infected with the same ransomware. Emails with malicious attachments appeared to be originating from a foreign location and were identified as the source of infection. The organization decided to take a proactive approach toward security with the focus on real-time monitoring to thwart such attacks in the future.
Cloud computing offers on-demand services (CPU, memory, network bandwidth, storage, applications, etc.) to users by allocating virtual instances and software services. Security is a major concern in the cloud wherein investigation of security attacks and crimes are very difficult. Due to the distributed nature of attacks and crimes in cloud, there is a need for efficient security mechanism. As cloud logs are spread across different virtual/physical machines (VM instances), switches, routers, etc., and also the customer (end user) is not aware of the activities of VM instances, cybercriminals exploit these sources to exhaust all the resources running in the cloud. Hence, evidence collection plays a crucial role to identify the suspects. However, collecting logs from the cloud infrastructure is extremely difficult because the investigator/security analyst has to depend on CSPs for collecting the logs and they have little control over the infrastructure. So, in order to identify the suspicious activity involved in the cloud, this chapter surveys the various forensic processes, evidence collection techniques for cloud forensics and the various challenges faced in cloud environment for forensic investigation.
Conflict of interest
The author does not have any conflict of interest.
Thankaraja Raja Sree and Somasundaram Mary Saira Bhanu
Submitted: June 13th, 2018 Reviewed: October 15th, 2018 Published: September 30th, 2020
© 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Which technique can be used for extracting evidence from large systems?
Forensics - M Choice.
Which technique is used during computer forensics investigations?
Digital Forensic Techniques Cybercriminals use steganography to hide data inside digital files, messages, or data streams. Reverse steganography involves analyzing the data hashing found in a specific file. When inspected in a digital file or image, hidden information may not look suspicious.
What kind of evidence can be collected from devices?
Computer documents, emails, text and instant messages, transactions, images and Internet histories are examples of information that can be gathered from electronic devices and used very effectively as evidence.
What are the 4 processes where special tools are needed in the collection of electronic evidence?
There are four phases involved in the initial handling of digital evidence: identification, collection, acquisition, and preservation ( ISO/IEC 27037 ; see Cybercrime Module 4 on Introduction to Digital Forensics).