How to use OpenCSV to write CSV files to S3

In the previous post about How to read csv files from S3 using OpenCSV, we have seen how to open and files on S3 and read the comma separated data into list of hashmaps. In this article, we will see how to perform the reverse, writing data to the files.

Most of the fundamental concepts do not change. You need to create S3 client if you want to do anything with S3. The AWS profile configured should be of the user or role that has permissions to write to S3. The bucket policy should allow writing files. I’m not going to cover how to setup aws credentials and IAM policies in this post. The method getS3() in the complete code snippet below is going to return an S3 client just like in the previous post.

In order for us to write CSV files using OpenCSV, you need to create a CSVWriter object. It allows us to write text to a stream and redirect them to a csv file. Here is how you can create a writer.

1
2
3
private CSVWriter buildCSVWriter(OutputStreamWriter streamWriter) {
return new CSVWriter(streamWriter, ',', Character.MIN_VALUE, '"', System.lineSeparator());
}

In this code, we create a CSVWriter by using the constructor that accepts an output stream writer, comma as separator, \u00000 or Character.MIN_VALUE as quote character, " as escape character and new line as the separator between lines. If you want to use a different separators such as TAB instead of Comma, you can do so by changing the second parameter. Once the writer is created, we can send the data to output stream and the data will be automatically written to the file. Following snippet achieves the same.

1
2
3
4
5
6
7
public void writeRecords(List<String[]> lines) throws IOException {
ByteArrayOutputStream stream = new ByteArrayOutputStream();
OutputStreamWriter streamWriter = new OutputStreamWriter(stream, StandardCharsets.UTF_8);
try (CSVWriter writer = buildCSVWriter(streamWriter)) {
writer.writeAll(lines);
}
}

In above code snippet, you can see that we accept list of string arrays. If you want to write headers as well, the first string array of the list must include headers. Once all the data has been written, you can use putObject of the S3 client to upload the content as a file to the given path and bucket. However the putObject method of S3 client doesn’t accept an output stream, it accepts an input stream so you can just wrap the previously created output stream into an input stream. You also need to set metadata such as Content length yourself. Following snippet provides you code for how to do that.

1
2
3
ObjectMetadata meta = new ObjectMetadata();
meta.setContentLength(stream.toByteArray().length);
getS3().putObject(BUCKET, PATH, new ByteArrayInputStream(stream.toByteArray()), meta);

Putting everything together, following code code combines all the snippets we have seen before.

Read More

How to read S3 CSV files into hashmaps using OpenCSV

In this world where large amounts of data is becoming a norm, it is very frequently stored in S3 in csv format for consumption through serverless database layers such as Athena. However, you often have to read the csv files without using Athena. In such cases, you can use ever useful libraries such as OpenCSV to read csv files.

This example shows how to use opencsv to quickly read the S3 files without the need to download them first. This helps when you do not have a way to save files locally of if you don’t have enough hard disk space. The solution is quite simple. You just have to create an InputStream from an S3 object using getObject method on S3 client. Once the input stream is created, we can use this to create a CSVReader from it.

Assuming that the CSV files have a header row, you can use CSVReaderHeaderAware class to create a list of hashmaps by reading each record iteratively using readMap method. If readMap method returns null, this means that you have reached end of file. Here is a complete solution for your reference.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import java.io.*;
import java.util.*;

import com.opencsv.*;
import com.opencsv.exceptions.CsvValidationException;

import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.s3.model.S3Object;

public class S3CSVReader {

public static void main(String... args) throws IOException, CsvValidationException {
// Example Usage
S3CSVReader reader = new S3CSVReader();
List<Map<String, String>> records = reader.getS3Records("my-bucket", "input-data/file1.csv");
System.out.println(records);
}

public List<Map<String, String>> getS3Records(String bucket, String key) throws IOException, CsvValidationException {
List<Map<String, String>> records = new ArrayList<>();
try (CSVReaderHeaderAware reader = getReader(bucket, key)) {
Map<String, String> values;

while ((values = reader.readMap()) != null) {
records.add(values);
}
return records;
}
}

private CSVReaderHeaderAware getReader(String bucket, String key) {
CSVParser parser = new CSVParserBuilder().build();
S3Object object = getS3().getObject(bucket, key);
var br = new InputStreamReader(object.getObjectContent());
return (CSVReaderHeaderAware) new CSVReaderHeaderAwareBuilder(br)
.withCSVParser(parser)
.build();
}

private AmazonS3 getS3() {
return AmazonS3ClientBuilder.standard()
.withCredentials(new ProfileCredentialsProvider("aws-profile"))
.withRegion(Regions.US_WEST_2)
.build();
}
}

Read More

Expiring Local Storage Objects in JavaScript

All the modern browsers have multiple types of storage mechanisms for using in your web applications. You may have already heard of cookies which are small bits of information you can store and they will be automatically expired.

However, cookies can only store small amounts of information. The other kind of storage is sessionStorage, where you can store big chunks of information. However all the data stored in this will be lost as soon as you close the browser tab.

Browser Storages

Local storage provides an intermediary option, it can store large amount of information and it will not be lost after the user closes the tab or browser. The data is persisted across sessions. However, we may want a better solution. We want to store large chunks of data across sessions, but we still want to have an option of invalidating after a certain period of time.

Read More

A Definitive Guide to AWS Application Integration

Last year has been a roller coaster ride for me. Adjusting to the new team, new technologies, new country, moving across continents and many more stressful scenarios. However something good came of 2019 by end of it. We have published our book, The Definitive Guide to AWS Application Integration. You can buy it from amazon any many more stores.


Creating Alexa Skill using Java and AWS Lambda

alexa

The Goal

Gone are the days where we build applications and just think about graphical user interfaces, look and feel etc., There is a new interface that is gaining popularity. As Amazon, Google and Apple are bringing in voice assistants, it has become extremely important for us to learn how to build voice-activated applications. Voice based commands are much more complex than a GUI, user actions on our application are limited to button clicks, combo box selections, typing in text fields comprise of the majority of instructions. With touchscreen, we can see some more actions users can perform like swipe, pinch, zoom, rotate etc., However, with voice a single user may ask our application to do specific tasks in wide variety of ways, to increase the complexity even further different people may use and each have their own way of speaking. Alexa provides a simple framework to build these skills. In this article I will show you how you can build your own Alexa Skills.

Read More