Posts

Showing posts from May, 2017

Data Modeling Student Retention Series

I am going to be doing a series of blog posts on data modeling student retention in higher education.  I recently saw a presentation which made me think a lot about data modeling in general and more specifically about student retention.  This series of blog posts will document the process my team is using to determine if the data model I observed is appropriate for our institution. The first step in creating a data model is understanding the business process.  I will start by laying out the rules and vocabulary when it comes to student retention. Student retention in higher education is based on an academic year.  New students starting in Fall are "retained" if they attend school the following Fall.  If a first time in college (FTIC) student attended school for the first time in Fall of 2010 and also attended in the Fall 2011, they were retained. The next important business concept and term is cohort .  For the purpose of student retention a cohort is a group of FTIC s

Simple While Loop Example - Without a Cursor

The first thing I will do is create a System-Versioned  table in SQL Server 2016.  Before creating the table, I will determine if the object already exists and if true I will drop the table. If you have an interest in system-versioned (temporal tables) note the alter table statement before I drop the object. IF OBJECT_ID('WHILELOOPTEST') is not null BEGIN ALTER TABLE  [dbo].[WHILELOOPTEST] SET ( SYSTEM_VERSIONING = OFF  ) DROP TABLE   [dbo].WHILELOOPTEST END Next, I create the system versioned table so we can see the number changes caused by the following WHILE loop. Create TABLE WHILELOOPTEST ( ProductPK int identity(1,1) primary key, Product_Cost money, Start_Date datetime2 generated always as row start hidden NOT NULL , End_Date   datetime2 generated always as row end hidden NOT NULL, PERIOD FOR SYSTEM_TIME(Start_Date, End_Date) ) WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.WHILELOOPTESTHISTORY)); /** Insert Test Values  **/ INSERT INTO WHIL

Data Warehouse vs Data Mart

I have been asked many times the difference between a data mart and a data warehouse. I have often heard the terms used interchangeably. But let's face it, we can't all be right all the time. So, below is my break down of data warehouses and data marts. Data Warehouse A data warehouse is simply a database.  The exact kind of database technology you use for business applications can be used for a data warehouse; you can use Oracle or SQL Server.  But stop before going to tell your boss all of your databases are data warehouses.  The difference between your application database and your data warehouse is how the schema is designed and the source of the data.  A data warehouse is specifically designed to make reporting easier and to ensure reporting does not place an undue burden on the transactional source system. Data warehouses can include data from many sources all brought together for convenience.  Note: A data warehouse is not a data graveyard for old data you no longe

Conceptual, Logical, and Physical Data Models: A Simple Example

Image
This post is a very simple, possibly over simplified, example of conceptual, logical, and physical data models.  The conceptual model includes business objects and the cardinality between the objects.  The logical model includes the cardinality, attributes, and addresses potential primary and foreign keys.  The physical model includes table names, column names, and data types.  The problem with hiding the potential complexity of this process is giving the impression that data  modeling (aka modeling business processes) is easy and fast and therefore trivial.  The modeling process is not trivial.  Devoting adequate time to designing, implementing, testing, and redesigning the data models in a project can be the difference between a impactful project  and having to start over after the business objectives are not met. Conceptual Logical  Physical

Percentage of Data and Index Page Use In Buffer Pool

Image
Below is a query you can use to find out how much of the pages in the memory buffer pool actually contain data.  SQL Server does not read parts of  8 KB pages into memory, so if the page fill is poor based on poor table and index design, then you may be holding up a bunch of memory with nothing in it. SELECT DB_NAME(DATABASE_ID) Database_Name,         PAGE_TYPE,                 COUNT(DATABASE_ID) / 128 as MB_USED, SUM(cast(FREE_space_in_bytes as bigint))/(1024.0)/(1024.0) as MB_EMPTY, (SUM(cast(FREE_space_in_bytes as bigint))/(1024.0)/(1024.0)/ (COUNT(DATABASE_ID) / 128)) * 100.0 as Percentage_Empty FROM Sys.dm_os_buffer_descriptors WHERE database_ID = --<YOUR DATABASE ID> and page_type in ('data_page', 'index_page') GROUP BY DB_NAME(database_id), page_type; Example Results: The source for the buffer pool data is the DMV  sys.dm_os_buffer_descriptors . The function  DB_NAME()  with take the DATABASE_ID and return the name of the database

SQL Server: Get the Current Size and Free Space of Data File and Log File

I recently discovered our application monitoring tool only looks at the total size of the data and log file.  In our case we had excessive database growth but the monitoring tool only knew how much the data file grew, not how much space the runaway process used up inside of the data file before the data file was forced to expand. Below is a simple query which will allow you to get the current size of the data and log files along with the space still available.  USE <YOUR DATABASE> SELECT DB_NAME() AS Database_Name, NAME FileName, SIZE/128.0 Current_Size_MB, SIZE/128.0 - FILEPROPERTY(NAME, 'SPACEUSED')/128.0 Free_Space_MB, GETDATE() DateTime FROM SYS.DATABASE_FILES; First, use the function DB_NAME()  with no parameter to get the name of the current database. Next, you will need the NAME and SIZE of the files from DMV  SYS.DATABASE_FILES . Name is the name of the file. Size is the number of 8KB pages in the file.  Since a MB is 1024 KB, you divide the nu

Things I Learned While Rebuilding Seriously Fragmented Indexes on SQL Server

I recently inherited some administration duties for a SQL Server instance.  The database was growing out of control and performing poorly.  The following is what I learned while trying to improve the state of the indexes. The primary application database is ~ 85 GB in size and of that total amount there is a single table which is ~34 GB.  The single table includes an additional ~6 GB of indexes.  The total space on the data drive was ~90 GB. Needless to say, the performance of the application depends entirely on the performance of that single table.  The rules were, I could not add, delete, or disable any of the indexes and I had zero control over the design the database.  It was a vendor product and any manipulation at that level would have violated our licensing and support agreement. There were two goals in de-fragmenting the indexes: 1) Improve Performance and 2) Reclaim misused disk space. Number 1: If you have the licensing for Enterprise Edition, use it.  SQL Serve