PDF Search Site

An Introduction

So, I’m moving into an apartment in the near future. Following that, I will have to learn how to manage my documents and ensure I can find them back in time. I’ve thought of a few solutions, one of them being what I do now – where I simply have a folder with PDF’s of my documents (I digitalize everything). This works fine now.. (I have like … 20 documents).. But it may not work in 2 years when I have a hundred documents spread over many different corporations and years.

I then thought of making the system searchable. This means, that I have to have a method of indexing all documents, preferably long-term. I thought of making a website, where I attach metadata to each uploaded document (that I make searchable by using OCR). Metadata will then be data like the start and end date, a descriptive title and perhaps one or two categories.
Continue reading

LINQ vs. Foreach

I’ve asked around, whilst I was trying to learn how to use LINQ. LINQ is a form of SQL directly in .Net languages, so I can do “select”’s directly on objects in memory. A simple select could be:

List<int> test = new List<int>();

test.Add(1);
test.Add(3);
test.Add(5);

List<int> testnew = (from a in test where a == 5 select a).ToList();

So. The argument was that using LINQ for a general select was faster than using a foreach and ‘manually’ pulling out respective items. I will test this:

static List<int> testlist = new List<int>();

static void Main(string[] args)
{
    DateTime b, a;
    List<double> regularTimes = new List<double>();
    List<double> LINQTimes = new List<double>();

    for (int x = 0; x < 1000000; x++)
    {
        testlist.Add(x * 5);
    }

    // Select normal
    for (int x = 0; x < 1000; x++)
    {
        b = DateTime.Now;     // Before
        TestRegular();
        a = DateTime.Now;     // After
        regularTimes.Add((a - b).TotalMilliseconds);
    }
    Console.WriteLine("Regular Avg.: {0}", regularTimes.Average());

    // Select LINQ
    for (int x = 0; x < 1000; x++)
    {
        b = DateTime.Now;     // Before
        TestLINQ();
        a = DateTime.Now;     // After
        regularTimes.Add((a - b).TotalMilliseconds);
    }
    Console.WriteLine("LINQ    Avg.: {0}", regularTimes.Average());

    Console.ReadLine();
}

private static void TestRegular()
{
    List<int> newlist = new List<int>();

    foreach (int item in testlist)
    {
        if (item % 2000 == 0) newlist.Add(item);
    }
}
private static void TestLINQ()
{
    var newlist = (from a in testlist where a % 2000 == 0 select a).ToList();
}

Using this, I make 1000 tests on each method, selecting out of 1 000 000 rows. My results are then shown as the average time in milliseconds it takes to execute the select. Results are followed here on my machine.

Regular Avg.: 81,6416701

LINQ     Avg.: 84,3083223

It can be discussed that the difference is so small that it’s insignificant and too affected by other variables such as what my pc was doing when the regular system ran vs. when the LINQ system ran.

I will say that it’s some very genuine criticism, since I’m not able to conclude anything based on my relatively small result set. But based on my result set, I’d say that they’re pretty much even – Regular and LINQ compared.

Bencoding Library–Part two

Previously, I described as Bencoding Library I made, to simplify the Bencoding structure and .Net. I also put this up on Codeplex. However, I discovered a flaw – If a string containing a null-byte, like most .Torrent files unavoidably do, it will fail. So I made a fix, which has been uploaded to Codeplex, as changeset 1102.

Continue reading

Bencoding – A C# Library

I’ve long been wanting to write a Bencoding library. Bencoding is an encoding format used to encode objects, like text, lists, dictionaries and so forth down into a single piece of text. It’s useful for transporting configuration files back and forth.

Bencoding is best known for its use in the Bittorrent protocol, where it serves as the basis for the .Torrent files. Anyhow. What I want to do, is code a bencoding library in C#, supporting all the known objects of bencoding. You can read more about the encoding here.

As another bonus. I’ll be trying out the Team Foundation Server supplied by Codeplex. When I code for the project that I’ve created here.

Continue reading

Wardriving – The triangulation algorithm

For my previous post, where I find the signal strength of WiFi spots, I will also need an algorithm to actually locate the spot. I will use Triangulation as my beginning point, since this is a method to do exactly that – locate a given point, using three or more known points. Triangulation is often used in wireless communications, to identify the location of cell phones, walkie talkies and so forth.

Continue reading

C# Wifi Scanner

So – I’ve long been wanting to make a Wifi based application, one that might actually also record location from a GPS unit later on, for use with Wardriving. So I went searching the Internet for quick solutions – manageable ones too, preferably something completely integrated with C# (.Net).

After looking through a lot of WMI tutorials, and how to use the Management class to ones advantage, I found that I lack (Windows 7) the Wifi classes in the WMI root namespace “MSNdis_80211_ServiceSetIdentifier”, or simply just anything remotely named “80211”, “802” or “MSN”.

Continue reading

Danish CPR Numbers

I was reading several articles around the new NemID system in Denmark. Recently about how people fear that organized groups can generate a list of valid CPR numbers and brute force passwords – rendering accounts locked due to wrong passwords 5 times.

Rendering many accounts locked, could essentially paralyze great parts of the Denmark – as NemID is meant to be used for everything Digital from the Treasure Department to applying for School.

To test this, I decided to create code that could generate valid CPR numbers. Valid, since there are rules as for when a number is valid. Rules I’ll describe later.
Continue reading

Mobifinance – Loaning organized

So, I was on a trip to Jutland, when I had this splendid idea. A loaning system to help me track who I lent money to, and

if I’d ever receive them back. The system was meant to run on mobile phones, and in a web-manner, so I made it a small-sized webpage.

I set out to use an MSSQL database, that my hosting provider (addnet.dk) provided for me in my webpackage, and the .Net language C#.

Continue reading

SMS System

As part of the OEP platform at my previous College, we thought of an SMS subsystem to handle SMS Messages to Danish Mobile Phones. The idea was a system that could expedite messages about account changes (such as being locked out, or forgotten password), simple and easy. I decided from start, to code it in Python.
Continue reading

Linux incremental hardlink backup system

While reviewing my old colleges IT class’ backup system, since disk space was running low. I found what I deem a terrible flaw … A weekly 1:1 backup, of EVERY users files, close to 40 GB each week. I thought that this could’ve been better. I know for a fact, that many of the files backup up again and again … They never change. Ever.

Therefore. Incremental!.
I then sought out on a challenge, to code my own incremental backup system – using only Python and the Linux filesystem, extfs.
Continue reading