A simple program to search large text files

September 27, 2006

A few years ago I took over webmaster duties for a very active community forum. About a month into the job the unthinkable happened, we lost the table that contained all of the posts!

I had backups (7 day rolling system) but the backups were a good 24 hours old and the server kept dropping lines while it was chugging away at restoring the database table.

I needed to go into the sql file and pull out all of the lines for the post table but I had one very large problem; The sql file was 524MB and my computer only had 128MB of RAM!

I tried to cajole my computer into opening the file but it was simply too big. Rather then curse my poor under-powered computer I got an idea.

Since I knew the structure of the sql commands I could write a program to open the file, line-by-line, and take the lines I needed and put them in a new file.

Now, it is important to note that I was unaware of the program Grep at the time. If I had known of Grep back then I would have just used that…

So, using some very simple C programming language functions I built a program that would search any text file for a given search string and output it all to a new file. Even though the post table itself was still huge, the shear number of sub-forums allowed me to break it down into files for each sub-forum.

Now I had about 124 sql files, one for each area of the forums. It was a simple process to load each one up onto the server and load the table data back into the mySQL server. Then I had only to a do a quick check of the number of items in the table compared to the number of lines in the file to be sure all the statements were executed.

I give the following code to you to use, modify, etc… I ask only that you follow my Creative Commons Attribution-ShareAlike 2.5 License.

It would be a very easy modification to change this program to search an entire directory of text files for a given string. I imagine some of those “spy on your kids” program do just that…

Please note that this was thrown together in about 5 minutes and is not optimized nor does it have any safegaurds written into it.

#include <fstream>
#include <iostream>
#include <string.h>
#include <stdio.h>
using namespace std;
const int MAX_LEN = 102400 ;
int main(int argc, char *argv[])
{
    int i = 0;
    if(argc!=3)
    {
        // give info on how to run program
        // Creative Commons Attribution-ShareAlike 2.5 License
        printf("sqlquery copyright 2005 Stephen De Chellis \n");
        printf("sqlquery takes 2 arguments\n\n");
        printf("sqlquery [filename] [search string]\n\n");
        printf("[filename]      = name of file to open for searching\n");
        printf("[search string] = string to search the file for\n\n");
        printf("sqlquery will search the file line by line and store the results\n");
        printf("in a new file named [search string].sql\n");
        return 0;
    }
    char line[MAX_LEN];
    char filename[256];
    char filename2[256];
    i = 0;
    strcpy(filename2,argv[2]);
    strcat(filename2,".sql");
    strcpy(filename,argv[1]);    

    ifstream infile;
    ofstream outfile;
    infile.open (filename, ifstream::in);
    if(!infile)
    {
        printf("unable to find file: %s",filename);
        return 0;
    }
    outfile.open (filename2, ofstream::out);
    if(!outfile)
    {
        printf("unable to create file: %s",filename2);
        return 0;
    }
    printf("\n working");
    do
    {
        infile.getline(line,MAX_LEN);

        if(strstr(line,argv[2])!=NULL)
        {
            outfile << line << endl;
            i++;
            printf(".");
            strcpy(line,"");
        }
    }while(infile.eof()==0);
    printf("Found %i matches",i);
    infile.close();
    outfile.close();
    return i;
}

Re-usable Code: Populating Listboxes in Windows

September 11, 2006

One thing most programmers who program for windows will have to deal with is the Listbox.

The Listbox is a very useful item in the programmers arsenal. It’s cousin, the ComboBox is a close second. You can use a Listbox to display a list of items to a user and allow them to select and item (or multiple items) and then do something with those items.

I’m not going to get too detailed into the how’s and why’s of Listboxes but will instead show you some very simple re-usable code that I release to you under a Creative Commons Attribution-ShareAlike 2.5 License . All I ask is that you do not claim the code as your own and give me proper credit for it. If you have some good ideas on improving this small code snippet then you can leave a comment below and I will update the article to show your changes.

I first wrote this snippet back in 2002 and it was written while coding for Windows 98 machines.

The first thing every bit of code needs is a header file:

/*—————————————————-
    Basic Listbox Code (re-usable)
    (c) Stephen De Chellis 2002
    Creative Commons Attribution-ShareAlike 2.5 License
    Listbox.h - header file for Listbox.cpp
  —————————————————-*/

#ifndef LISTBOX_H
#define LISTBOX_H

struct BigBuffer{
    char Buffer[256];
};

/* Prototypes for Listbox.c */
extern int FillListBox (HWND hwnd, int ID, struct BigBuffer buffer[], int s);
/* end of Listbox.h */

#endif

Now that we have the header file we can go forward with the cpp file. Before I go ahead, please note the pre-proccesor statements I put in the listbox.h file. They are very important as once your program gets large enough you may inadvertantly include the same header file twice. The if statement, followed by declaring the constant insures that an include file will not be added twice. That’s a headache you want to avoid!

Now let us look at the contents of listbox.cpp

/*—————————————————-
    Basic Listbox Code (re-usable)
    (c) Stephen De Chellis 2002
    Creative Commons Attribution-ShareAlike 2.5  License
    Listbox.cpp - Contains code for filling a Listbox
  —————————————————-*/

#include <windows.h>
#include “Listbox.h”

int FillListBox (HWND hwnd, int ID, struct BigBuffer buffer[], int s)
{
   /********************************************************************
      hwnd         - handle to window with the listbox
      ID           - ID number of listbox to be filled with data
      buffer[]     - String Array that will be written to listbox
      s            - how many elements of string array to write
      This function fills listboxes and then returns an int denoting
      the quantity of items listed.
   *********************************************************************/
   int i=0;
   SendDlgItemMessage (hwnd, ID, LB_RESETCONTENT, 0,0);
   do
   {
      SendDlgItemMessage(hwnd, ID, LB_ADDSTRING, i, (LPARAM)  buffer[i].Buffer);
      i++;
   } while(i<s);
   return i;
}

I do appologize for some of the formating issues on this blog. I try to get things to look as I like them and then the backend throws in a ton of code that I do not want…

The core of the above function is to take a structure and fill it with a series of strings. You then send the structure and an integer telling the function how many items (in that structure) you want to have sent to the Listbox. The function then returns i, which should be equal to the number of items sent. If it does not return the same number as the number of items sent then you know an error has occured and you can write some code to fix it.

I have since re-written this code for wxWidgets and included much better error handling.

The above code is very simple and you will probably not use it as is unless you are new to programming.

The first line in the FillListBox function is a call to SendDlgItemMessage (hwnd, ID, LB_RESETCONTENT, 0,0). This line simply tells the program to clear all of the contents out of the Listbox (ID) in question. If you do not clear the contents first when the program executes the next block of code you will find yourself adding entries to the entries already there. You do not always clear a listbox before sending items to it, but in this basic example we are clearing it and fully re-populating each time.

The following loop then adds strings to the Listbox, one-by-one, until it has added as many items as it was told to add. when the loop ends this function returns the number of items added.

This is only the first in a series of programming articles. If there are any requests for further help in dealing with Listboxes please comment below.

I used the services of Code Colorizer to colorize the above code snippets.