Extracting images from text mail archives

When you back up or save emails, one format of doing so is in plain text. The attachment to emails are then stored as base64 encoded data in the file. I wrote this script to find known signatures of emails in base64 attachments and write the images out to the disk.

Simply pass it in the file name you want to read from, or it will read from stdin. This just goes to show Ruby still has very high performance, on my system it was processing a test file at 50+ MB/sec.

require 'base64'
require 'zlib'

in_base64 = false
attachment = []
ext = ""
crcs = {}
file_base = 0
line_count = 0

signatures = {
  "TU0AKgAKzHiAN1oM" => ".tiff",
  "R0lGODlh" => ".gif",
  "iVBORw0KGgoAAAANSUhEU" => ".png",
  "/9j/4AAQSkZJRgA" => ".jpg"
}


ARGF.each do |line|
  line_count+=1
  #If we aren't in the middle of a file, look
  #for signatures of images
  if not in_base64
    signatures.each do |sig,extension|
      if line.start_with? sig
        in_base64 = true
	attachment = []
        attachment << line
        ext = extension
      end
    end
  else 
    if line.start_with? "--"
      #end of base64, write out the file
      attachment = attachment.join("")
      length = attachment.length
      crc = Zlib::crc32(attachment)
      if crcs[crc] and crcs[crc] == length
        puts "Duplicate file, skipping"
      else
        puts "Writing file #{file_base}#{ext}"
        File.open("#{file_base}#{ext}",'wb') do |f|
          f.write(Base64.decode64(attachment))
          file_base += 1
        end
        crcs[crc] = length
      end
      in_base64 = false
      ext = ""
    else
      #middle of a base64 block, save the line
      attachment << line
    end
  end
end

How to recover data from .7z files

Suppose you have a .7z file, and the archive is “corrupt”. If it is not corrupt, but missing the end of the archive you will get an error. Here is how you can recover at least some of the data from that.

First step is to take the bad file and see how long the archive is supposed to be. The header has a pointer to the end header in it. At offset 0x0C in the start header is the offset of the end header, stored in 8 bytes.  The next 8 bytes is the length of the end header, which means the end of the file.

Now that you know how big the file was supposed to be, you can recover all of the data that you have but all of the files will be concatenated together. You don’t know the length of any of them, or even how many separate files there are. The important thing though is that you can extract the data.

The next step is to create a new archive that when compressed is larger than the original archive. You must have a single file as the source and can easily do this by using random data, it will not compress and you wont have to guess at how big to make it. The important part is to have a dictionary size that is larger than the original archive. Watch out for just using the maximum dictionary size, the higher the dictionary size, the longer it will take to compress. The compression type needs to match as well, LZMA, LZMA2, etc.

Now you will have two archives, the bad one and a larger good one. To extract the data you need to take the header from the good data, the compressed stream from the bad file, and the remaining data from the good file and put them into their own files.

To do this first take the first 32 bytes of the good file and make it into its own file. Then remove the first 32 bytes of the bad file and save the remaining bytes as the compressed data stream. Finally, add the size of the header and compressed datastream, and skip that far into the good file, saving the remaining into a third file.

You now have  a valid header, a valid data stream, and the remaining data which has a broken data stream, but a valid dictionary and end header. Name the files with the same name, extensions being .7z.001, .7z.002, .7z.003. Use p7zip(or your favorite) to open the .001 file. It will extract a file with the filename of the good archive, but the data from the bad file. When it reaches the broken compression stream in the .003 file you will get a CRC error but you will be left with all of your missing data in a single file.

Windows 10 IoT

I myself am not much of a Windows developer. I have messed around with winapi code a while ago, and some VB.net course in college, but otherwise have developed in other environments. This may change shortly with this new Windows 10 IoT that has been released. Especially seeing that it has Python and Node.js support, I am excited to see if I can try it out.

https://blogs.windows.com/buildingapps/2015/08/10/hello-windows-10-iot-core/

Debugging

I recently came across an article, Writing a Primitive Debugger. It goes through what a debugger does and the basics of how one functions. I was reminded of a debugger I created for Rhomobile’s Rhodes applications.

There are some key differences between the two. As mentioned in the linked article, a debugger needs a few basic functions to be useful. My goal was to have a useful debugger that could stop at a breakpoint, inspect variables, and execute arbitrary Ruby commands. I had read an article about using GDB to debug Ruby and started initially with that approach. At this point, I really wasn’t writing a debugger but a UI for GDB and Ruby. This worked great when running on the iPhone simulator which runs as a process on the local machine, but did not work on mobile devices or any other platform other than the iPhone simulator.

My approach to this was to work at a little higher level. One interesting feature I had come across in Ruby is set_trace_func. When set, the execution steps of the Ruby VM also trigger your trace function. While this does significantly slow the program down, it allows you to accomplish nearly everything you need to do for a debugger. I adapted this to allow debugging of the application via a TCP connection. I may go into detail later, but you can see the original code that I wrote for this here. It has since been expanded to the full debugger that is being used today, adding complexity and integration with an existing IDE.

 

Carmel Apple Cheesecake Bars

10710749_10152858698518132_708917998363024725_n
Ingredients:
Crust:

  • 2 cups all-purpose flour
  • 1/2 cup firmly packed brown sugar
  • 1 cup (2 sticks) butter, softened

Cheesecake Filling:

  • 3 (8-ounce) packages cream cheese, softened
  • 3/4 cup sugar, plus 2 tablespoons, divided
  • 3 large eggs
  • 1 & 1/2 teaspoons vanilla extract

Apples:

  • 3 Granny Smith apples, peeled, cored and finely chopped
  • 1/2 teaspoon ground cinnamon
  • 1/4 teaspoon ground nutmeg

Streusel Topping:

  • 1 cup firmly packed brown sugar
  • 1 cup all-purpose flour
  • 1/2 cup quick cooking oats
  • 1/2 cup (1 stick) butter, softened

Drizzle:

  • 1/2 cup caramel topping for drizzling after baked

(more…)

Don’t reinvent the wheel

When working on a development project, there are often tasks or features that are common for that type of project. For example, a website may need user management, or a batch processing system needs a queuing system. One thing you should always keep in mind is to not reinvent the wheel.

There are many libraries available depending on the choice of language and environment. Ruby has a plethora of gems that tackle almost everything imaginable. Python has pip which is very similar to ruby gems. Before you decide to roll your own implementation from scratch, you should always take some time and see what is available. You may be able to find a package that fits your needs and can be reused saving valuable development time.

Avacado baked with egg

avocado3

Here’s an easy twist on omelets. Preheat the oven to 350° F. Halve an avocado, remove the pit, and scoop out some of the green flesh. Break an egg into a bowl and carefully place the yolk in an avocado half, followed by the white (there might not be enough room for all of it). Repeat with the other half. Make sure to season and add your own toppings before baking for 15 to 20 minutes.

From here.

New Job

After nearly 8 years in mobile development and related work, I am moving on. My passion is for development and while I have many skills in other areas like System Administration, QA, and Support Management, I find that writing code is what really makes me happy.

I have recently joined Monetize Solutions as a Sr. Product and Monetization Engineer working on advanced monetization products. Basically we are creating products that will work with our customers raw data and provide meaningful information that directly affects bottom line revenue. This could mean determining the optimal product inventory configuration or the right cross sell relevant recommendation given the time of year, location demographics, geography, historical sales profile buying behavior, and social sentiment.

I am looking forward to getting back into full time development, and I am excited to be creating products that have direct positive impacts for our customers.

Content Finally…

There has been lots of changes in my life recently and life is slowly returning to “normal”. With this I have decided to bring my blog to life. It has languished long enough with nothing here and I this time I plan to keep it up and running.

I will be posting many things, from technical content, photos I have taken, and even things I just want to save for later. This is my personal space and contains my own ideas and thoughts, independent of any business, company, or work I may be doing in my professional career.